Posts Tagged ‘RNA Biology’

The Rfam 10.1 release is out!

June 16, 2011

The crowd of people behind Rfam are proud to announce a new release of the Rfam database. This is version 10.1 and is mostly an increase in the number, size and quality of families.

Rfam now has 1973 families, 528 more families than the 10.0 release. We are just one prokaryotic RNA-seq project away from hitting 2000 families! In fact, we have passed 2000 in terms of Rfam accession (RF02031 is the Escherichia coli sRNA, tpke11). My selfish attempt to claim the coveted RF02000 accession was snatched from my grasp by Chris Boursnell who added RF02000 which now corresponds to the rice microRNA MIR1846 from the miRBase database. I did claim RF01999 and RF02001 with two
sub-types of Group II catalytic intron domains 1-4 that Zasha Weinberg kindly provided to Rfam.

The new families included nearly 100 novel elements inferred by Zasha Weinberg and colleagues in his recent Genome Research article [1]. Zasha kindly provides Rfam with the alignments and writes Wikipedia articles for each notable element, greatly easing the burden on Rfam for incorporating these into the database.

Our new recruit, Ruth Eberhardt, originally from the UniProt group at EBI, has also made a significant mark on the new release. Ruth has been busily incorporating “domains” derived from long messenger-like non-coding RNAs (a.k.a. lncRNAs). These are regions within each transcript that are unusually well conserved and there is some evidence that secondary structure within the regions is evolutionarily constrained. The new families include: MEG3, MALAT1, MIAT, PRINS, XistTUG1, HSR-omega, Evf1, HOTAIR, KCNQ1OT1, SOX2OT, NEAT1, EGOT, H19 and HOTAIRM1.

This summer we had the pleasure of hosting another talented summer student, Ben Moore. Ben is a prolific Wikipedian and rapidly made his mark on the RNA Wikipedia entries and continues to do so while working on a MRes in Computational Biology at the University of York. One stunning feat Ben accomplished was passing the article for “Toxin-antitoxin system” through Wikipedia’s peer-review process for “good articles”. This process appears to be at least as rigorous as scientific peer-review and is quite an achievement. He also built a number of families for RNA anti-toxins including Sok, RNAII, IstR, RdlD, FlmB, Sib, RatA, SymR and PtaRNA1. PtaRNA1 is a newly discovered RNA antitoxin that was first published by Sven Findeiss and colleagues in the RNA families track at RNA Biology. This track has provided very useful updates and expansions for Rfam directly from the RNA community.

A guest Rfam rogue, Chris Boursnell, who has been visiting from Andrew Firth’s group has also been busy building new families. With permission from the good people at the Recode-2 database [3] Chris has added a number of new frame-shift elements and has also updated a number of microRNA families based on the latest release of miRBase [4].

An enormous achievement for the database is the inclusion of full-length small subunit ribosomal RNA families. Previously Rfam had just one truncated model that covered all three kingdoms of life. Thanks to the hard work of Eric Nawrocki and colleagues in Sean Eddy’s lab on the Infernal software and related package ssu-align which can now deal with much larger datasets than were previously possible. The three new alignments now cover bacteria, archaea and eukaryotes. These are all derived from the highly accurate and excellent alignments from the work of Robin Gutell and colleagues who run the Comparative RNA Website.

Thanks to the exciting work by Stefan Washietl and colleagues on the RNAcode software package we now have good evidence that the “RNA” family, C0343 (RF00120), is in fact protein-coding and most-likely is not functioning as a RNA other than in a mRNA-sense [5]. Therefore C0343 has been removed for the 10.1 release.

Our SRP families have all been rebuilt and supplemented with additional families thanks to the work of Magnus Rosenblad and colleagues [6]. This is another excellent contribution to the RNA families track at RNA Biology. Based on this work the existing two SRP families were replaced and supplemented by 7 new families: Metazoa_SRP (RF00017), Bacteria_small_SRP (RF00169), Fungi_SRP (RF01502), Bacteria_large_SRP (RF01854), Plant_SRP (RF01855), Protozoa_SRP (RF01856) and Archaea_SRP (RF01857). These new models should improve the specificity of Rfam annotations and reduce the number of pseudogenes incorporated.

We have continued to work on the Rfam clans and have added 3 new clans. These are U3 (CL00100), Cobalamin (CL00101) and group-II-D1D4 (CL00102). Also, the membership of the clans tRNA (CL00001), RNaseP (CL00002) and SNORA62 (CL00040) have been updated.

Finally, several problematic microRNA families mir-544 (RF01045), mir-1302 (RF00951), mir-1255 (RF00994), mir-548 (RF01061), mir-649 (RF01029), mir-562 (RF00998) and spliceosomal U13 (RF01210) were rethresholded to remove the excessive number of pseudogene annotations in the full alignments. This rethresholding along with the rebuilding of our SSU models have removed approximately 600,000 annotations from Rfam.

There are countless other changes that have made, if I’ve forgotten to include any that are significant to you or to mention your name then I apologise profusely.

This release could not have happened without the invaluable help of Jen Daub and John Tate who have worked tirelessly and enthusiastically on this release. This was made particularly challenging by the fact that I have recently relocated to my homeland in New Zealand to take up a position as a Rutherford discovery fellow and senior lecturer at the University of Canterbury in Christchurch. I hope to continue to contribute to Rfam and the wider RNA community from here. This is also a good moment to welcome the new Czars of Rfam, Sarah Burge and Eric Nawrocki who will now face the exciting and challenging task of managing the day-to-day work of maintaining Rfam. I wish them the best of luck in their new roles. I hope they enjoy it as much as I have.

Paul Gardner.

References

[1] Weinberg Z, Wang JX, Bogue J, Yang J, Corbino K, Moy RH, Breaker RR. (2010) Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes. Genome Biology. 11(3):R31.

[2] Findeiss S, Schmidtke C, Stadler PF, Bonas U (2010). A novel family of plasmid-transferred anti-sense ncRNAs. RNA Biology. 7 (2): 120–4.

[3] Bekaert M, Firth AE, Zhang Y, Gladyshev VN, Atkins JF, Baranov PV. (2010) Recode-2: new design, new search tools, and many more genes. Nucleic Acids Res. 38(Database issue):D69-74.

[4] Kozomara A, Griffiths-Jones S. (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Research. Jan;39(Database issue):D152-7.

[5] Washietl S, Findeiss S, Müller SA, Kalkhof S, von Bergen M, Hofacker IL, Stadler PF, Goldman N. (2011) RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. RNA. 17(4):578-94.

[6] Rosenblad MA, Larsen N, Samuelsson T, Zwieb C. (2009) Kinship in the SRP RNA family. RNA Biology. 2009 Nov-Dec;6(5):508-16.

Plans for Rfam 2010-2011

June 30, 2010

Besides running RNA informatics courses Rfam peoples will be working on the usual summer family-building exercise together with a bright summer student. Priority number one will be building the published RNA Biology articles that we ran out of time to do for the Rfam 10.0 release:

SmY MRP Yfr2 tmRNA
Trypanosomal H/ACA ncRNAs GIR1 U3 SRP
influenza pseudoknot ptaRNA1 RsaOG rsmX

An exciting trend we’re starting to see is groups appending machine parsable alignments to their papers and writing Wikipedia articles for their families outside of the RNA families track. Of particular note are the 81 families from Zasha Weinberg’s latest papers [1-3] (see the table below). Also, Daniel Gautheret’s and Wade Winkler’s groups are supporting this effort with their respective CsfG RNA [4] and EAR motif [5] articles. Rightly or wrongly we are crediting some of this to the increased exposure of Rfam’s requirements from the RNA families track at the journal RNA Biology. This, by the way, for the first time has an impact factor, which is a fantastic 5.559. RNA Biology is punching well above its weight for now, long may this continue. Once our super summer student has finished with these (much easier families) he’ll be moving on to our terrifyingly long list of potential Rfam families that are waiting to be built. If you see anything on this list you might be interested in writing an RNA biology article for then please let us know as soon as possible.

In other news, we’re still hunting for good people to join a revamped Xfam group. In a few days we’ll be advertising Curator positions and we’re still looking for a Senior Computational Biologist. Check the Sanger Jobs site over the next few days.

[1] Weinberg Z, Wang JX, Bogue J, Yang J, Corbino K, Moy RH, Breaker RR (2010) Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes. Genome Biology

[2] Weinberg Z, Perreault J, Meyer MM, Breaker RR (2009) Exceptional structured noncoding RNAs revealed by bacterial metagenome analysis. Nature.

[3] Meyer MM, Ames TD, Smith DP, Weinberg Z, Schwalbach MS, Giovannoni SJ, Breaker RR. (2009) Identification of candidate structured RNAs in the marine organism ‘Candidatus Pelagibacter ubique’. BMC Genomics.

[4] Marchais A, Duperrier S, Gautheret D, Stragier P. (2010). A sporulation-specific, small noncoding RNA highly conserved in endospore formers. In preparation.

[5]  Irnov I & Winkler WC (2010) A regulatory RNA required for antitermination of biofilm and capsular polysaccharide operons in Bacillales. Mol Microbiol.

6S-Flavo Acido-1 Acido-Lenti-1 Actino-pnp
AdoCbl-variant Bacillaceae-1 Bacillus-plasmid Bacteroid-trp
Bacteroidales-1 Bacteroides-1 C4-a1b1 C4
Chlorobi-1 Chlorobi-RRM Chloroflexi-1 Clostridiales-1
Collinsella-1 Cyano-1 Cyano-2 Dictyoglomi-1
Downstream-peptide Flavo-1 Gut-1 JUMPstart
L17DE Lacto-rpoB Lacto-usp Lnt
Methylobacterium-1 Moco-II Ocean-V Pedo-repair
PhotoRC-I PhotoRC-II Polynucleobacter-1 Pseudomon-1
Pseudomon-Rho Pseudomon-groES Pyrobac-1 Rhizobiales-2
SAM-Chlorobi SAM-I-IV-variant SAM-II_long_loops SAM-SAH
STAXI Termite-flg Termite-leu TwoGGAY
asd atoC crcB EAR
flg-Rhizobiales flpD gabT glnA
gyrA hopC lactis-plasmid leu-phe_leader
livK manA mraW msiK
nuoG pan pfl potC
psaA psbNH radC rmf
rne-II sbcD sucA-II sucC
traJ-II wcaG whalefall-1 yjdF
ykkC-III

Alex wins the Benjamin Franklin award!

April 1, 2010

Our very own Alex Bateman has been awarded the prestigious Benjamin Franklin award! This is an annual award presented to someone in the community who has made significant contributions to promoting open access in the life sciences.

Nominations are made by at least two members of the community and then votes are collected by the good people at bioinformatics.org. Alex faced some stiff competition from many greats in the field yet still managed to win. He is the third Xfam associate who has won the award, joining Ewan Birney and Sean Eddy.

Naturally, all of the Xfam members are very happy with this result and are currently glowing in the reflected glory (or is that the result of the celebratory bubbly or the unseasonal weather).

Posted by: Rob and Paul.

Rfam, RNA Biology and Wikipedia in the news

February 20, 2009

Some of you may have noticed the recent attention that the unholy alliance between Rfam, RNA Biology and Wikipedia has been receiving recently. I thought it might be worthwhile posting a more detailed overview of how this happened, what we’re planning and dealing with the major criticisms.
Read the rest of this entry »