We are pleased to announce that we’ve released Dfam 1.2. This version represents a few important changes from 1.1, including increased sensitivity for many families, a new plot on the model page, and an improved Relationships tab.
Archive for the 'Releases' Category
We are happy to announce that TreeFam 9 is online and you can find it under http://www.treefam.org.
TreeFam 9 now has 109 species (vs. 79 in TreeFam 8) and is based on data from Ensembl v69, Ensembl Genomes v16, Wormbase and JGI.
This release marks an important step for TreeFam as it is the first release build since TreeFam has been resurrected.
Here is a list of the most important changes in TreeFam 9:
- New website layout (adopting the Pfam/Rfam/Dfam layout)
- Infrastructure move of web servers and databases to the EBI
- Sequence search against the library of TreeFam family profiles
- Pairwise homology download
We hope you find all the information you are looking for. If you don’t, please let us know so that we can include the information you want. The old website will remain online here.
If you have questions, suggestions or find bugs, don’t hesitate to contact us through our new forum here.
the TreeFam team
In a blog post published just over a year ago, I proposed a number of changes to the content of Pfam to improve scalability and usability of the database. These changes came into effect a few days ago, when we released Pfam 27.0. This release of Pfam contains a total of 14831 families, with 1182 new families and 22 families killed since release 26.0. 80% of all proteins in UniProt contain a match to at least one Pfam domain, and 58% of all residues in the sequence database fall within a Pfam domain. Read the rest of this entry »
We are pleased to announce that the Dfam paper (“Dfam: a database of repetitive DNA based on profile hidden Markov models“) is now available in the 2013 NAR Database issue, and has been selected as a “featured article“ (meaning the NAR editorial board thinks it is among “the top 5% of papers in terms of originality, significance and scientific excellence”).
In other exciting news, two members of the Dfam consortium, Arian Smit and Robert Hubley (Institute for Systems Biology, Seattle), just released RepeatMasker 4.0. This is a major update that, among other important improvements, adds support for searching with Dfam and nhmmer. Go get yourself a copy at http://www.repeatmasker.org/
Posted by Travis
We have recently produced a new release of AntiFam, release 3.0. AntiFam has grown in size, and release 3.0 contains 54 entries – compared to just 23 when we last blogged about AntiFam (release 1.1). Over 80 % of these new entries arise from translations of non-coding RNAs, including several families from translations of rRNA, tmRNA and RNaseP.
We are pleased to introduce Dfam 1.0, a database of profile HMMs for repetitive DNA elements. Repetitive DNA, especially the remnants of transposable elements, makes up a large fraction of many genomes, especially eukaryotic. Accurate annotation of these TEs both simplifies downstream genomic analysis and enables research into their fascinating biology and impact on the genome.
The team behind Rfam is pleased to announce the release of Rfam 11.0. This release represents a major update from 10.1, primarily due to the upgrade of our underlying sequence database, Rfamseq.
The current Pfam release, version 26.0, took approximately 4 months to nurse through the various stages of updating the sequence database, resolving overlaps between families, rebuilding the MySQL database and performing all of the post-processing that constitutes the ‘release’. The production team strives to make two releases a year, but I really do not fancy spend two thirds of a year on Pfam releases. Thus, with my colleagues, I have been reviewing what we do and why we do it and, probably more importantly, assessing how much different sections of the Web site are used. Below is a list of changes that are going to happen in the next release, release 27.0.
The crowd of people behind Rfam are proud to announce a new release of the Rfam database. This is version 10.1 and is mostly an increase in the number, size and quality of families.
Rfam now has 1973 families, 528 more families than the 10.0 release. We are just one prokaryotic RNA-seq project away from hitting 2000 families! In fact, we have passed 2000 in terms of Rfam accession (RF02031 is the Escherichia coli sRNA, tpke11). My selfish attempt to claim the coveted RF02000 accession was snatched from my grasp by Chris Boursnell who added RF02000 which now corresponds to the rice microRNA MIR1846 from the miRBase database. I did claim RF01999 and RF02001 with two
sub-types of Group II catalytic intron domains 1-4 that Zasha Weinberg kindly provided to Rfam.
The new families included nearly 100 novel elements inferred by Zasha Weinberg and colleagues in his recent Genome Research article . Zasha kindly provides Rfam with the alignments and writes Wikipedia articles for each notable element, greatly easing the burden on Rfam for incorporating these into the database.
Our new recruit, Ruth Eberhardt, originally from the UniProt group at EBI, has also made a significant mark on the new release. Ruth has been busily incorporating “domains” derived from long messenger-like non-coding RNAs (a.k.a. lncRNAs). These are regions within each transcript that are unusually well conserved and there is some evidence that secondary structure within the regions is evolutionarily constrained. The new families include: MEG3, MALAT1, MIAT, PRINS, Xist, TUG1, HSR-omega, Evf1, HOTAIR, KCNQ1OT1, SOX2OT, NEAT1, EGOT, H19 and HOTAIRM1.
This summer we had the pleasure of hosting another talented summer student, Ben Moore. Ben is a prolific Wikipedian and rapidly made his mark on the RNA Wikipedia entries and continues to do so while working on a MRes in Computational Biology at the University of York. One stunning feat Ben accomplished was passing the article for “Toxin-antitoxin system” through Wikipedia’s peer-review process for “good articles”. This process appears to be at least as rigorous as scientific peer-review and is quite an achievement. He also built a number of families for RNA anti-toxins including Sok, RNAII, IstR, RdlD, FlmB, Sib, RatA, SymR and PtaRNA1. PtaRNA1 is a newly discovered RNA antitoxin that was first published by Sven Findeiss and colleagues in the RNA families track at RNA Biology. This track has provided very useful updates and expansions for Rfam directly from the RNA community.
A guest Rfam rogue, Chris Boursnell, who has been visiting from Andrew Firth’s group has also been busy building new families. With permission from the good people at the Recode-2 database  Chris has added a number of new frame-shift elements and has also updated a number of microRNA families based on the latest release of miRBase .
An enormous achievement for the database is the inclusion of full-length small subunit ribosomal RNA families. Previously Rfam had just one truncated model that covered all three kingdoms of life. Thanks to the hard work of Eric Nawrocki and colleagues in Sean Eddy’s lab on the Infernal software and related package ssu-align which can now deal with much larger datasets than were previously possible. The three new alignments now cover bacteria, archaea and eukaryotes. These are all derived from the highly accurate and excellent alignments from the work of Robin Gutell and colleagues who run the Comparative RNA Website.
Thanks to the exciting work by Stefan Washietl and colleagues on the RNAcode software package we now have good evidence that the “RNA” family, C0343 (RF00120), is in fact protein-coding and most-likely is not functioning as a RNA other than in a mRNA-sense . Therefore C0343 has been removed for the 10.1 release.
Our SRP families have all been rebuilt and supplemented with additional families thanks to the work of Magnus Rosenblad and colleagues . This is another excellent contribution to the RNA families track at RNA Biology. Based on this work the existing two SRP families were replaced and supplemented by 7 new families: Metazoa_SRP (RF00017), Bacteria_small_SRP (RF00169), Fungi_SRP (RF01502), Bacteria_large_SRP (RF01854), Plant_SRP (RF01855), Protozoa_SRP (RF01856) and Archaea_SRP (RF01857). These new models should improve the specificity of Rfam annotations and reduce the number of pseudogenes incorporated.
We have continued to work on the Rfam clans and have added 3 new clans. These are U3 (CL00100), Cobalamin (CL00101) and group-II-D1D4 (CL00102). Also, the membership of the clans tRNA (CL00001), RNaseP (CL00002) and SNORA62 (CL00040) have been updated.
Finally, several problematic microRNA families mir-544 (RF01045), mir-1302 (RF00951), mir-1255 (RF00994), mir-548 (RF01061), mir-649 (RF01029), mir-562 (RF00998) and spliceosomal U13 (RF01210) were rethresholded to remove the excessive number of pseudogene annotations in the full alignments. This rethresholding along with the rebuilding of our SSU models have removed approximately 600,000 annotations from Rfam.
There are countless other changes that have made, if I’ve forgotten to include any that are significant to you or to mention your name then I apologise profusely.
This release could not have happened without the invaluable help of Jen Daub and John Tate who have worked tirelessly and enthusiastically on this release. This was made particularly challenging by the fact that I have recently relocated to my homeland in New Zealand to take up a position as a Rutherford discovery fellow and senior lecturer at the University of Canterbury in Christchurch. I hope to continue to contribute to Rfam and the wider RNA community from here. This is also a good moment to welcome the new Czars of Rfam, Sarah Burge and Eric Nawrocki who will now face the exciting and challenging task of managing the day-to-day work of maintaining Rfam. I wish them the best of luck in their new roles. I hope they enjoy it as much as I have.
 Weinberg Z, Wang JX, Bogue J, Yang J, Corbino K, Moy RH, Breaker RR. (2010) Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes. Genome Biology. 11(3):R31.
 Findeiss S, Schmidtke C, Stadler PF, Bonas U (2010). A novel family of plasmid-transferred anti-sense ncRNAs. RNA Biology. 7 (2): 120–4.
 Bekaert M, Firth AE, Zhang Y, Gladyshev VN, Atkins JF, Baranov PV. (2010) Recode-2: new design, new search tools, and many more genes. Nucleic Acids Res. 38(Database issue):D69-74.
 Kozomara A, Griffiths-Jones S. (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Research. Jan;39(Database issue):D152-7.
 Washietl S, Findeiss S, Müller SA, Kalkhof S, von Bergen M, Hofacker IL, Stadler PF, Goldman N. (2011) RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. RNA. 17(4):578-94.
 Rosenblad MA, Larsen N, Samuelsson T, Zwieb C. (2009) Kinship in the SRP RNA family. RNA Biology. 2009 Nov-Dec;6(5):508-16.