Posts Tagged ‘hmmer3’

No, seriously, we’ve made a release

April 1, 2011

Well, it should have been out about 6 months ago, but finally the long awaited Pfam release 25.0 is here! Release 25.0 contains a total of 12273 families, with 384 new families and 21 families killed since the latest release.  Pfam 25.0 is based on UniProt release 2010_05. Those of you who follow Pfam closely will be familiar with the fact the sequence coverage (the number of sequences in Pfamseq containing at least one Pfam match) has hovered at or just below 75%.  Despite the addition of only a modest number of new families in this release, the sequence coverage is now 76.69% of all proteins in Pfamseq contain a match to at least one Pfam domain.  53.86% of all residues in the sequence database fall within Pfam domains.

Read the rest of this entry »

Advertisements

Xfam consortium meeting

June 4, 2010

The annual Xfam consortium meeting was held on the 10-11th May 2010 and we have the photographic evidence to prove it.

We spent the two days listening to talks from everyone about the latest developments.  We were particularly interested to hear about new developments in HMMER3 and INFERNAL – fundamental pieces of software that Xfam rely on.  Nucleotide enabled HMMER3 is in development and will be great for Rfam, hopefully replacing the current BLAST pre-filters.  We are also had updates on how the HMMER software scales using multiple threading and/or MPI.

We also had a number of wide ranging discussions.  Erik Sonnhammer unfortunately wasn’t present this time so the usual discussion on Stockholm alignment format was avoided.  However, we had a fulsome discussion of Pfam family naming nomenclature.  It was generally agreed that although there were rules followed for Pfam short names, no one else in the world understood them.  So we will endeavour to add a new section to our documentation about it.  We discussed how much information is actually required before a DUF (domain of unknown function) is renamed to something more meaningful.

We were blessed because the Icelandic ash cloud didn’t intervene.  But one of our number did leave their passport in a car bound for Oxford causing a delay home. We would like to thank all the members of the Pfam and Rfam consortia for coming and also to our other EBI attendees.

Janelia Farm Research Campus: Sean Eddy, Eric Nawrocki, Travis Wheeler, Tom Jones, Diana Kolbe, Michael Farrar

Stockholm Bioinformatics Center: Kristoffer Forslund, Dave Messina

Wellcome Trust Sanger Institute: Alex Bateman, Paul Gardner, Lars Barquist, Jaina Mistry, John Tate, Prasad Gunasekaran, Penny Coggill, Rob Finn

University of Manchester: Sam Griffiths-Jones

University of Oxford: Andreas Heger

University of Helsinki: Liisa Holm

Other friends from EBI: Sarah Hunter, Phil Jones, Craig McAnulla  and  Javier Herrero.

Pfam, HMMER3 and the next release

March 23, 2010

The Xfam blog has been fairly quiet since the release of Pfam 24.0, so I thought I would give you a quick update on what we have been up to in the Pfam team. Read the rest of this entry »

Update Pfam searches to HMMER3.0 beta 3

December 16, 2009

As most of you are probably aware, Sean released HMMER3.0b3 last month.  The beta 3 version of HMMER3.0 contains a few bug fixes and the four HMMER3 search programs now allow multi-core parallelisation. We’ve just updated all of the Pfam sequence search tools to use the new HMMER3.0 beta 3 release, so we thought we’d update you on what these changes mean for Pfam. Read the rest of this entry »

Imminent Release of Pfam 24.0

October 2, 2009

We are now on the brink of releasing Pfam 24.0.  This release of Pfam, version 24.0, will be a landmark release as it will be the first to be built using the the new version of the HMMER package, HMMER3. We are well aware that we have been claiming this release as imminent for some time, but we are now at the point of flicking the big switch.  There are numerous changes that users need to know about and we will briefly summarise them here. Read the rest of this entry »

pfam_scan.pl – part II

September 11, 2009

Back in May we wrote a blog post about the new version of pfam_scan.pl. We asked if there was anyone out there who was willing to help us test our new script, and we were pleasantly surprised at the number of people who got in contact with us – so a big thank you to all those who have helped. Since releasing the alpha version of pfam_scan.pl to our testers we have made some internal changes to the script that are worth mentioning: Read the rest of this entry »

pfam_scan.pl

May 21, 2009

We’re currently working on a new version of one of our core scripts, ‘pfam_scan.pl’. This script searches a set of protein sequences (in FASTA format) against Pfam’s library of HMMs. The original code was written nearly a decade ago but, since then, features have been added, bugs have been fixed and the code has evolved into something that is far from elegant. The re-write is something that we’ve been planning to do for a while and, as the code needs updating to use the new HMMER3 software, now seems like the perfect time to do it. Read the rest of this entry »

HMMER3 migration: resolving overlaps

March 19, 2009

It has been a little quiet on the Pfam blog recently, but behind the scenes we’ve been working hard on the migration to HMMER3.

We have built HMMER3 models for all of the Pfam alignments, and searched them against the sequence database. This part was super quick, as HMMER3 is ~100 times faster than HMMER2. Due to the increased sensitivity of HMMER3, many of our Pfam families have grown in size, and we have found that ~80,000 sequences in the sequence database now have overlapping matches to more than one Pfam family.

Within Pfam we have a rule that states that our families should not overlap; this means that any one amino acid can belong to only a single Pfam family.  The exception to this rule applies to families within a clan – clans are Pfam’s collections of related families – where overlaps between clan members are allowed. Over the last few weeks we’ve been working through and resolving the list of 80,000 overlaps. Read the rest of this entry »

Early adoption of HMMER3

January 21, 2009

As the first post suggested, this blog will partly describe the progress and issues faced with the migration of Pfam to HMMER3.  We’ve been waiting for the mercurial HMMER3 for well over a year now, watching all the while its ever receding release date.  However, it has finally been released, albeit in alpha phase! Given Sean Eddy’s past record on HMMER2, particularly his attention to detail and his hatred of bugs in his software, we (Pfam) are already confident enough to be looking at migrating to HMMER3. This post will set out the rationale for moving Pfam to HMMER3 quickly and look at some of the issues that will inevitably follow such a move. Read the rest of this entry »

Welcome to the Xfam blog

January 19, 2009

Welcome to the new blog for the Xfam databases ! Xfam is our shorthand for the combination of Pfam and Rfam databases, which we note will also future-proof us, in case we add any further databases to the brand.

We hope that this blog will become a useful point of reference, where our users can learn about what is going on behind the scenes at Xfam central. We will be announcing some important changes that are coming with the eagerly awaited release of HMMER 3. As well as announcing new releases of the data and website, we’ll also try to discuss our philosophy on protein/RNA domains and sequence classification. If there are other topics that you would like to hear more about, why not leave us a comment.