We have recently produced a new release of AntiFam, release 3.0. AntiFam has grown in size, and release 3.0 contains 54 entries – compared to just 23 when we last blogged about AntiFam (release 1.1). Over 80 % of these new entries arise from translations of non-coding RNAs, including several families from translations of rRNA, tmRNA and RNaseP.
Posts Tagged ‘pfam’
After 15 great years at the Sanger Institute we are on the move. On the 1st November, the Cambridge Xfam group will be taking up residence at the European Bioinformatics Institute on the other side of the Wellcome Trust Genome Campus. We’ll keep running the websites at Sanger for a bit longer, but eventually we’ll get them migrated over to EBI webspace. We’re hoping that the move will not cause any disruption to our users, but we might be a little bit slower at responding to your questions and bug reports.
We’ll keep you posted on updates to the website and database locations using the blog and our Twitter account.
We are pleased to introduce Dfam 1.0, a database of profile HMMs for repetitive DNA elements. Repetitive DNA, especially the remnants of transposable elements, makes up a large fraction of many genomes, especially eukaryotic. Accurate annotation of these TEs both simplifies downstream genomic analysis and enables research into their fascinating biology and impact on the genome.
We’ve had a few helpdesk tickets in the last few months asking how to download all of the Pfam-A domains for a particular species. This information can be quite difficult to obtain: getting it requires either downloading and installing a sub-set of the tables in our MySQL database, or else searching all of the sequences from the species of interest against Pfam, probably using our batch search.
Two related questions that we are often asked via the Pfam helpdesk is ‘Which families have a known three-dimensional structure?’ and ‘Why is a particular a PDB structure not found in Pfam’. You may think that there are obvious answers to these questions – but as with many things in life the answer is not necessarily as straight forward as you would have thought. In this joint posting between Andreas Prlic (senior scientist at RCSB Protein Data Bank) and myself (Rob Finn, Pfam Production Lead), we will elaborate on the way the PDB and Pfam cross referencing occurs, why discrepancies occurred in the past and describe the pipeline that the RCSB PDB has implemented using the HMMER web services API, which should provide the most current answer to these questions. Read the rest of this entry »
As some of you will already be aware, the Xfam family has recently gained a new member: the TreeFam database.
TreeFam aims to provide phylogenetic trees and orthology predictions for all animal genes.
AntiFam  is the newest addition to the Xfam brand. It is a database of hidden Markov models (HMMs) designed to identify spurious open reading frames (ORFs). It is available now on our ftp site:
The current Pfam release, version 26.0, took approximately 4 months to nurse through the various stages of updating the sequence database, resolving overlaps between families, rebuilding the MySQL database and performing all of the post-processing that constitutes the ‘release’. The production team strives to make two releases a year, but I really do not fancy spend two thirds of a year on Pfam releases. Thus, with my colleagues, I have been reviewing what we do and why we do it and, probably more importantly, assessing how much different sections of the Web site are used. Below is a list of changes that are going to happen in the next release, release 27.0.
Since releasing the new Pfam website four years ago, we’ve had a steady trickle of mails from users who would like to install and run the site within their own local environment. It used to be possible to do just that, given a following wind, if you were ready to install the site from its source code. Unfortunately, after some internal changes and as the list of Perl module dependencies grew and grew, the process got harder and more complex and eventually we stopped supporting it entirely. We’ve been actively discouraging people from trying this for far too long, all the while promising to make the process easier. Finally we’ve managed to get around to building a virtual machine (VM) that should make the whole thing possible again. Read the rest of this entry »
Some users have been contacting us about the new families that are appeared in Pfam release 26.0.
As pointed out by one of our users:
Pfam v26 includes, in addition to DDE_Tnp_1, the following new families:
These extra new families with the name_2, name_3, name_4 etc, have been constructed to increase the coverage of Pfam. Many of our existing large diverse families are not well modelled by a single HMM and there are many true members that are not matched. So by building multiple models we can match more things. Each of these models will be in the same Pfam clan, the RNaseH clan in this case. For the most part these models do not represent any particular subfamily or classification group. Essentially you should think of a match to any of the above seven DDE_TnP_1 families as being the same thing. Because of the way Pfam is built any particular region of a protein may only belong to one of these families. We have a step in building clans called competition which means that if a region of a protein matches to both DDE_Tnp_1 and DDE_Tnp_1_2 for example then the region will be assigned to the family with the highest score. This means that a match to DDE_Tnp_1 in release 25.0 may now end up in a different family such as DDE_Tnp_1_2. You shouldn’t read too much into these changes.
The reason that many of these new families are appearing in Pfam release 26.0 is due to a change in strategy in how we are building many new Pfam families. The new strategy consists of taking complete genomes and taking each protein that does not match Pfam and using it as a starting point for a Jackhmmer search. Jackhmmer is an iterative search tool like PSI-blast. If we find that the Jackhmmer search finds lots of homologues but has some overlaps with an existing family then we may build one of these new additional families to increase coverage of known sequences. Rather than give these families completely new names we simply call them the same as the existing family and append a number to them to show that they are closely related to each other.
Posted by Alex