We recently held our annual Rfam Next Big Things (NBT®) meeting. This is the meeting where we decide what the big changes for our various projects will be. Since Sean asked for it (and I think it’s a good idea), I thought I would discuss the biggest NBT® here. This is Rfam adopting Infernal 1.0 for release 10.0.
The current release of Rfam 9.1 was built using Infernal 0.72, so Rfam has indeed fallen some way behind with respect to Infernal versions. Currently, we only provide calibrated covariance models (CMs) for Infernal 1.0 on the FTP site. In 2009 we plan to update all of the Rfam base-code to support Infernal 1.0 and re-threshold every Rfam family, so that we still get the right balance of sequence sensitivity and specificity. We’re then going to update RFAMSEQ to include the latest and greatest sequences from the EMBL nucleotide sequence database. Next, we will have to spend a considerable amount of time resolving overlaps. This is where any two families that have overlapping hits need to be fixed either by fiddling the score thresholds or trimming the seed alignments.
Another new thing we plan to introduce in Rfam 10.0 is the concept of an Rfam clan. Which means that any two families in the same clan may now have overlaps. For example, Nuclear RNase P, Bacterial RNase P class A, Bacterial RNase P class B and RNase MRP will all live in the same clan, since they are evolutionarily related, and will be allowed to overlap. This should make resolving overlaps significantly easier for Rfammers, though it may also generate some confusion, since annotators could occasionally see two different families annotating a single region of sequence. Regardless, we believe that the benefits of this strategy will outweigh the disadvantages.
Since all of this is going to create a lot of extra work for the Rfam team, we’re only planning one release of Rfam this year, instead of the “usual” two (a point release corresponding to a sequence update and a point-one release corresponding to a major family building drive).
We discussed several other ideas at NBT but I’m not brave enough to share them yet, as there is a good chance that we won’t get many of them finished.
Posted by Paul.