No, seriously, we’ve made a release

April 1, 2011

Well, it should have been out about 6 months ago, but finally the long awaited Pfam release 25.0 is here! Release 25.0 contains a total of 12273 families, with 384 new families and 21 families killed since the latest release.  Pfam 25.0 is based on UniProt release 2010_05. Those of you who follow Pfam closely will be familiar with the fact the sequence coverage (the number of sequences in Pfamseq containing at least one Pfam match) has hovered at or just below 75%.  Despite the addition of only a modest number of new families in this release, the sequence coverage is now 76.69% of all proteins in Pfamseq contain a match to at least one Pfam domain.  53.86% of all residues in the sequence database fall within Pfam domains.

The increase in coverage has in part come from the use of jackhmmer, which is now used routinely when curating new Pfam entries. Jackhmmer, a program that is new to HMMER3, allows the iterative searching of a sequence database, starting from a single sequence, similar to PSI-BLAST.  Using this approach, the curators have been systematically going through proteomes, trying to plug the gaps not covered by Pfam.  Many of the new families are being added to clans, as the existing clan members just do not cover the entire protein space for that ‘superfamily’.

We have changed the copyright from GNU GPL to the creative commons zero (CC0) license.  Without getting in to the details, this basically removes any restrictions on the use of Pfam and does away with the viral GPL license.  Pfam is for the public domain and we waive any rights!

So what’s new?

Well, the answer is not that much, but a major change is that Pfam annotation is now beginning to be co-ordinated via Wikipedia. Unlike Rfam, where every entry has a Wikipedia entry, we expect this to be a more gradual transition for Pfam, so not all entries currently have a corresponding Wikipedia article. For a more detailed discussion, check the help page.  We actively encourage the addition of new/updated annotations via Wikipedia as they will appear far quicker than waiting for a Pfam release.  If there are articles in Wikipedia that you think correspond to a family, then please mail us!

As well as updating the help documentation and bringing back tools such as the domain graphic generator, we have also developed a new way for visualizing the species distribution for a family.  Rather than using a tree like visualization, we have used a radial layout, termed sunburst.  The concentric rings of the representation indicate different taxomomic levels, with species being represented in the outer ring. Using this approach means that all families, regardless of size, are displayed in the same amount of space.  This representation is also much quicker to render than the previous tree view, even for large trees, which we hope will make the species tree information more usable..

Why has it taken so long to get this release out?

Both Jaina Mistry and myself have moved on to pastures new.  We both were working on the release up until we departed in July 2010 and if it had not been for two power outages that wiped out the Sanger compute farm, we might have completed the release before we left.  Since leaving Sanger, I have been setting up the HMMER web servers at Janelia Farm, which allow sequence searches (phmmer, hmmscan and hmmsearch) via a web interface… but more about that in another post.  The move across the pond and settling in to life in the US with my family and the new job start took precedence – hence the delay.  In the last month John Tate and I have had a big push to get the release out!  We think that we have got everything in place, but if you notice anything is broken, then please mail us.

What next?

Well, we will pause to catch our breath, but we will endeavour to get another release out in the not too distant future –  the sequence database that Pfam 25.0 is running on is already 10 months old and our curators, Penny Coggill and Ruth Eberhardt are screaming out for more sequences.  For this release, the HMM data files are already on the FTP site but the database dump files are not yet available there.  We’ll be adding them as soon as possible.  Marco Punta has now joined the group in Sanger as Pfam team leader, and he and Alex Bateman are going to lead a more focused curation effort from Pfam-UK, while I will continue to be responsible for making the releases from Pfam-US.

Posted by Rob (and John)

8 Responses to “No, seriously, we’ve made a release”


  1. […] Not actually a joke, from Pfam: No, seriously, we’ve made a release. […]


  2. […] (hopefully) improve the annotation of Pfam families, which has in many cases been rather poor. The Xfam blog post related to Pfam release 25 says the change will be happening gradually, which might actually be […]

  3. alexbateman Says:

    Thanks for blogging this Johan. A user already pointed out that the Profilin article in Wikipedia was not linked by Pfam. We have added that in:

    http://pfam.sanger.ac.uk/family/Profilin

    I also write a new article for the S1 domain and added it:

    http://pfam.sanger.ac.uk/family/S1

    We encourage our users to find articles we missed linking to as well as creating new protein family articles in Wikipedia!

    Alex

  4. Royden Clark Says:

    Is there a scheduled date for when the Pfam25 Database dump will be available? Thank you.
    Royden


  5. I just discovered the beauty of the sunburst plots. What software do you use to render them? I can’t find any software package that does it as good as the Pfam implementation…

    • johntate Says:

      Hi Johan,

      I’m glad you like the sunbursts. They’re drawn using a javascript class that we wrote, rather than a specific software package. We haven’t really made it available for download as a nice neat bundle, but if you’d be interested in that you can drop us a mail at our help-desk account (pfam-help@sanger.ac.uk) and we can certainly give you some more information about how the sunbursts are drawn.

      John.


Leave a comment