Pfam 27.0 is now available!

March 22, 2013

In a blog post published just over a year ago, I proposed a number of changes to the content of Pfam to improve scalability and usability of the database.  These changes came into effect a few days ago, when we released Pfam 27.0.  This release of Pfam contains a total of 14831 families, with 1182 new families and 22 families killed since release 26.0. 80% of all proteins in UniProt contain a match to at least one Pfam domain, and 58% of all residues in the sequence database fall within a Pfam domain.

So what has changed?  To the user, hopefully, not a great deal has changed! Nevertheless, there has been a considerable amount of reorganization of the database production pipeline. The most notable loss of information is that we are no longer providing neighbour-joining trees for the Pfam full alignments.  If you want or care about this type data,  it will now be up to you to calculate it – however, if this data was important to you it should be recalculated using a more precise method anyway.  On the flip side, there are many new features that have been integrated into Pfam 27.0.  Below is a brief list of the new developments that are now available:

  • Real time searches of DNA  sequences for matches to Pfam models
  • Use of Representative proteomes (1) sequence sets used to provide redundant views of the Pfam-A full alignments
  • Addition of disorder predictions to the repertoire of sequence feature annotations
  • AntiFam (2) has been applied to the underlying sequence database to remove sequences believed to be spurious translations
  • Selectable sunbursts in the Pfam-A ‘species’ distribution tab, allowing the generation of alignments or visualisation of sequences from a user defined taxonomic range.
  • New, faster keyword search using Apache Lucy

Blog posts describing these news developments in more detail will follow in the coming weeks.  In addition to the changes listed above, there have been many improvements to existing families, be it improving domain boundaries, expanding members or the generation of Wikipedia entries.  Many of the  new entries in Pfam have been built with the purpose of improving Human coverage.

Enjoy the release!

Posted by Rob

References

1: Chen C, Natale DA, Finn RD, Huang H, Zhang J, Wu CH, Mazumder R. Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation. PLoS One. 2011 6(4):e18910

2: Eberhardt RY, Haft DH, Punta M, Martin M, O’Donovan C, Bateman A. AntiFam: a tool to help identify spurious ORFs in protein annotation. Database 2012:bas003.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s