Two new releases
Rfam 9.0 was primarily an update to the underlying sequence database. The previous version of RFAMSEQ (8.1) was based upon EMBL 84, which was three years out of date. The updated sequences also included, for the first time, the whole genome shotgun (WGS) and environmental sequence (ENV) divisions of EMBL. This resulted in an approximately 10-fold increase in the number of sequences that Rfam now searches.
There were also many improvements to existing Rfam models. This included cleaning up the most glaring problems in the consensus secondary structure annotations and alignments. However, the biggest changes came from Jennifer Daub iterating more than 370 of the smaller Rfam families. Iterations are where good sequences from the full alignments are pushed into the seed alignments and the new model used to re-search RFAMSEQ.
Rfam 9.1 was primarily an update to the number of models in Rfam. Our summer student, Adam Wilkinson, built a phenomenal number of new families. The new families included 408 miRNAs, 144 CD box snoRNAs, 50 H/ACA box snoRNAs, 65 CRISPRs, 57 Cis-regulatory elements, 30 sRNAs, 5 riboswitches, 1 partial LSU rRNA and 9 miscellaneous RNA genes.
A new website
John Tate spent a great deal of time creating the snazzy new Rfam website. The old site was not going to scale well with the masses of new sequences and families that we planned to add. Also, by updating the Rfam website we can now use a common core of website code for both Rfam and its sister site Pfam.
Rfam and Wikpedia
There have been a lot of interesting developments with Rfam and Wikipedia. I’ll write a separate blog entry containing more details later. Rfam is one of the few bioinformatic databases to draw textual annotations of our entries directly from Wikipedia. In the last release we moved away from writing individual entries for each family, which had resulted in hundreds of very repetitive and short entries. Instead we chose to use generic entries covering several families. For example, most of the new miRNA and snoRNA families point to only the miRNA and snoRNA Wikipedia entries respectively. Those individual families that become notable will eventually get their own entries. This decision meant that many more new Rfam families could be built, as writing these short articles does take a significant amount of our time.
Mappings to PDB
In Rfam 9.1 we now provide mappings between Rfam and PDB. These mappings are still experimental and will take a little while to mature. Rob Finn as part of his iPfam work kindly pulled out all the RNA sequences from PDB for us. Since many of these sequences were truncated relative to the Rfam models I had to use a combination of BLAT and Infernal to map from these sequences to the Rfam families. This only affects 20 Rfam families which are listed below:
We have more than 1,140 genome annotations now in GFF3 format. To produce these we used the mappings between EMBL and genomes provided on the EMBL website. However, these mapping have not kept up with ENSEMBL genomes (probably due to genome assemblers not submitting their assemblies to EMBL). This caused a significant headache for us. Jen Daub, with a lot of help from the ENSEMBL people, managed to get mappings for four ENSEMBL species (human, mouse, cow & C. elegans). Otherwise, I’m afraid your favourite large eukaryotic genome may be missing from your list. Until these problems get sorted out, I’m afraid Rfam simply doesn’t have the human resources to hunt down assemblies and sequences of genomes not in EMBL.
Rfam does DAS
Prasad Gunasekaran has made a DAS source from Rfam annotations of EMBL sequences. You can see an example annotation here.
 P. P. Gardner, J. Daub, J. G. Tate, E. P. Nawrocki, D. L. Kolbe, S.
Lindgreen, A. C. Wilkinson, R. D. Finn, S. Griffiths-Jones, S. R. Eddy & A. Bateman (2008): Rfam: updates to the RNA families database.
Nucl. Acids Res.
 J. Daub, P. P. Gardner, J. Tate, D. Ramskold, M. Manske, W. G. Scott, Z. Weinberg, S. Griffiths-Jones & A. Bateman
(2008): The RNA WikiProject: Community annotation of RNA
families. RNA. RNA.