Rfam 11.0 is out!

August 14, 2012

The team behind Rfam is pleased to announce the release of Rfam 11.0. This release represents a major update from 10.1, primarily due to the upgrade of our underlying sequence database, Rfamseq.

Rfam 11.0 is based on the January 2012 release of EMBL-Bank, whilst Rfam 10.1 was based on the June 2009 release. The updated Rfamseq contains around 88 million sequences from the STD and WGS data classes, compared with 55 million in Rfamseq 10.  As part of the sequence update we have searched all our families against the new sequence database. Consequently, many of our families have increased significantly in size; for example the cspA thermoregulator has grown from 434 to 6179 sequences, mainly as this family now identifies new homologs from actinobacteria, alpha- and beta-proteobacteria. 77 families have shrunk in size, primarily due to rethresholding. We’ve also killed 11 families (see the README for more details).

Some important changes

The large size of RF00005, RF00177, RF01959, RF01960 and RF02271 has meant that we have had to change our annotation strategy in order to produce alignments of manageable size. For these families, the FULL alignment on the website contains only sequences from the SEED alignment and sequences from whole genomes. This also means that the release files Rfam.fasta, Rfam.full and Rfam_full.tree are also based on these smaller alignments. The FULL alignments for these families (based on matches to all sequences in Rfamseq) are available from the FTP site; be warned that they are very large! As our alignments continue to grow, we are proposing to adopt this genome-based alignment strategy for all Rfam families in future releases.

New Families

Rfam 11.0 contains 246 new families. Ruth has added an additional 144 long non-coding RNA families, and this release also sees the introduction of the “Gene;lncRNA;” type for this category of Rfams. We’ve also introduced the type “Gene;antitoxin” and now have 5 entries classified as antitoxins. We’re continuing to see new families from the RNA Biology New Families track, such as rsmX, and we have some nice new hammerhead ribozyme families thanks to Marcos de la Pena, amongst others.

Annotation

As part of Rfam 11.0 we have provided annotations to the ncRNA category of non-redundant RefSeq database. These matches may be found in the Genomes tab, as well as in the Sequences tab for each family. In this release we have provided 21283 annotations to RefSeq 53.

Our summer student Eleanor has been expanding and improving our annotation of microRNA families, in conjunction with miRBase, who are now also using Wikipedia to annotate their microRNA families. In addition to creating stub articles for many miRNA families, Eleanor has also been fleshing out many existing articles, such as the miR-8 precursor family. We’ve also updated our GO annotation and now provide 2,750 GO terms associated with our families.

Access improvements

This release sees a number of improvements in the Rfam website. You can now query Rfam 11.0 through our Biomart. The Biomart interface allows much more sophisticated querying of Rfam data; for instance, you can now download all families containing rat cis-regulatory elements, as well as obtaining the actual sequences. John has also been hard at work including the Sunburst visualisation into the Rfam family species tab (there’s a good example here); what’s more, you can now construct custom alignments for a given family using the Sunburst. Each taxonomic group in the Sunburst is selectable, and once you’ve made your selection, a cmalign job is submitted and returns the alignment of your chosen sequences to the CM model of that family.

Rfam is now released under the Creative Commons Zero licence, in keeping with Pfam. This pretty much means you are free to do as you please with Rfam data.

This post is just a brief run-down of all the new goodies in Rfam 11.0 – we’ll be publishing a more detailed look at what’s in Rfam 11.0 in the forthcoming Nucleic Acids Research Database issue. We’ll also be talking about the upcoming move to Infernal 1.1 as well – bear in mind that whilst we have provided Infernal 1.1 – compatible CMs as part of this release, the models have not yet been rethresholded so these are used AT YOUR OWN RISK! If you’re interested in what goes in to making an Rfam release, we’ll be writing a series of blog posts describing some of the new features and how they were built in the near future.

As ever, Rfam is only possible due to the hard work of many people. Special thanks goes to Guy Coates and Pete Clapham for their invaluable assistance with the compute involved in getting a new Rfam release done, as well as to David Harper and the Sanger DBA team. And last, but by no means least, our very own Jen has worked tirelessly over many months to bring you a shiny new Rfam release.

I hope you enjoy using Rfam – please submit any comments, complaints or compliments to rfam-help@sanger.ac.uk or as comments to this blog post. As always, we appreciate user feedback and are especially interested in new families and updates to our existing alignments.

Posted by Sarah

4 Responses to “Rfam 11.0 is out!”

  1. John Burke Says:

    Congratulations and gold medals to you all.

  2. Evert-Jan Blom Says:

    Very nice. I was wondering if this release is compatible with new Infernal 1.1rc1 tool? Cryptogenomicon mentions that the new RFAM db will be computed entirely natively with Infernal RNA structure comparison.

    • Sarah Says:

      Hi Evert-Jan,
      Future versions of Rfam will indeed use v1.1 of Infernal. We’ve provided 1.1-compatible CMs with this release but they have not yet been rethresholded so use them very much at your own risk, as we detailed in the blog post.


Leave a comment