Pfam 31.0 contains a total of 16712 families and 604 clans. Since the last release, we have built 415 new families, killed 9 families and created 11 new clans. We have also been working on expanding our clan classification; in Pfam 31.0, over 36% of Pfam entries are placed within a clan. Read the rest of this entry »
We are happy to announce a new release of Rfam (version 12.2) which includes 115 new families, introduces R-scape secondary structure visualisations, and restores missing families to multiple Rfam clans.
This release adds 115 new Rfam families bringing the total number of families to 2,588. Notable additions include Pistol, Hatchet, Twister-sister and several other riboswitches contributed by Zasha Weinberg. We are always looking for new RNA families, so please feel free to get in touch with your suggestions.
Testing covariation with R-scape
R-scape is a new method for testing whether covariation analysis supports the presence of a conserved RNA secondary structure. In order to check the quality of Rfam structures, we ran R-scape on all Rfam seed alignments and added R-scape visualisations to the secondary structure galleries. For example, here is R-scape analysis of the SAM riboswitch:
According to R-scape, the secondary structure from the Rfam seed alignment, shown on the left, has 19 statistically significant basepairs (highlighted in green). R-scape can also use statistically significant basepairs as constraints to predict a new secondary structure that is consistent with the seed alignment. Using this approach, R-scape increased the number of statistically significant basepairs from 19 to 27 while also adding 9 new basepairs that are consistent with the seed alignment (structure on the right). This visualisation gives an idea about the quality of the Rfam structure and indicates that in this case it may need to be updated. To find out more about R-scape have a look at a recent paper by Rivas et al.
Tip: R-scape visualisations are interactive, so you can pan and zoom the structures and get additional information by hovering over nucleotides and basepairs.
R-scape analysis suggests that many existing Rfam secondary structures can be improved (for example, FMN riboswitch or 5S rRNA). In other families secondary structures are not supported by the R-scape covariation analysis (for example, oxyS RNA) which indicates that either their seed alignments need to be expanded or that these RNA families do not have a conserved secondary structure. Lastly, there are also cases where the R-scape structures do not show significant improvement compared to the current secondary structure (for instance, Metazoa SRP).
In future releases we will begin to improve existing Rfam seed alignments by using R-scape in the family building pipeline. In the meantime, Rfam users can get an indication of the quality of the structure using R-scape visualisations.
Recovering lost clan members
Since Rfam 10.0, related Rfam families have been organised into clans. The clans are manually curated and clan membership is checked using automated quality control steps (for example, to make sure that a family cannot belong to more than one clan). However, under certain circumstances these quality control procedures silently removed families from the clans. This bug was introduced in Rfam 11.0, and over time, more than 30 families were dropped from 20 clans, so that some clans did not have any families at all. The problem has now been fixed and proper clan membership has been restored using Rfam releases from the FTP archive. You can explore Rfam clans and let us know if you have any feedback.
- Rfam sequence search and the cmscan web service were updated to the latest version of Infernal (version 1.1.2).
- We now provide track hubs for both hg38 and hg19 human genome assemblies.
- The source code for the Rfam website is now available on GitHub and can be run locally using Docker.
How to access the data
As well as revisiting Rfam seed alignments, work is underway on the next major Rfam release (13.0) which will be based on a new sequence database built from complete genomes. We plan to make the new data available in late 2017.
Get in touch
We now have an online Quick Tour that provides a brief introduction to the Pfam protein families database. It provides a basic description of Pfam, as well as advice on how to search the database and discover protein-related information. The tour also showcases various tools that allow users to visualize data in Pfam, and explains where to find out more about the resource. We recommend taking the tour to learn how to use Pfam effectively.
Pfam 30.0, our second release based on UniProt reference proteomes, is now available. The new release contains a total of 16,306 families, with 22 new families and 11 families killed since the last release. The UniProt reference proteome set has expanded and now includes 17.7 million sequences, compared with 11.9 million when we made Pfam 29.0. In this release, we have updated the annotations on hundreds of Pfam entries, and renamed some of our Domains of Unknown Function (DUF) families.
DUFs are protein domains whose function is uncharacterised. Over time, as scientific knowledge increases and new data about proteins comes to light, more information about the function of a domain may become available. As a result, DUFs can be renamed and re-annotated with more meaningful descriptions. As part of Pfam 30.0, we have re-annotated 116 DUFs based on updated information in the UniProtKB database, the scientific literature, and feedback from Pfam and InterPro users. Examples of some our DUF updates in Pfam 30.0 are given below:
- PF10265, created in release 23.0 and originally named DUF2217, has been renamed to Miga, a family of proteins that promote mitochondrial fusion.
- PF10229, created in release 23.0 and originally named DUF2246, has been renamed as MMADHC, as it represents methylmalonic aciduria and homocystinuria type D proteins and their homologues. The structure of this domain is shown below.
- PF12822, created in release 25.0 and originally named DUF3816, has been renamed to ECF_trnsprt, since it contains proteins identified as the substrate-specific component of energy-coupling factor (ECF) transporters.
Please note that we may change the identifier for a family (e.g. DUF2217), but we never change the accession for a family (e.g. PF10265).
If you find any more DUFs that can be assigned a name based on function, or any other annotation updates, please get in touch with us (email@example.com).
We are happy to announce a new release of Rfam. Version 12.1, based on the same sequence dataset as Rfam 12.0, features over 20 new families, a new clan competing algorithm, a publicly accessible MySQL database, and many website fixes.
Pfam 29.0, our second release of 2015, contains 16295 entries and 559 clans. We have made some major changes to our underlying sequence database and the data that are displayed on the website, which we’ve outlined below. Full details can be found in our Nucleic Acids Research paper, which is available here. Read the rest of this entry »
Dfam is growing up. This is the first major expansion of the database since it’s inception. We’ve added repeat families from four new organisms: mouse, zebrafish, fruit fly, and nematode. In total, this release includes 2,844 new familes ( 4,150 total ).
We are pleased to announce the return of the Rfam Track Hub for the UCSC Genome Browser. This hub is available on our ftp site. The hub prodives annotation for the most recent assemblies eight different species at present: Human (hg38), Mouse (mm10), C.elegans (ce10), Chicken (galGal4), C. intestinalis (ci2), Zebrafish (danRer7), Drosophila (dm6) and S. cerevisiae (sacCer3).
With Dfam, we are striving to build models of repeat families that yield high sensitivity without undue false annotation. In this release of Dfam, we have improved our model building strategy to reduce the potential for false annotation, especially in the context of overextending alignments around true interspersed repeat instances.