Archive for the 'HMMER3 migration' Category

Imminent Release of Pfam 24.0

October 2, 2009

We are now on the brink of releasing Pfam 24.0.  This release of Pfam, version 24.0, will be a landmark release as it will be the first to be built using the the new version of the HMMER package, HMMER3. We are well aware that we have been claiming this release as imminent for some time, but we are now at the point of flicking the big switch.  There are numerous changes that users need to know about and we will briefly summarise them here. Read the rest of this entry »

Advertisements

pfam_scan.pl – part II

September 11, 2009

Back in May we wrote a blog post about the new version of pfam_scan.pl. We asked if there was anyone out there who was willing to help us test our new script, and we were pleasantly surprised at the number of people who got in contact with us – so a big thank you to all those who have helped. Since releasing the alpha version of pfam_scan.pl to our testers we have made some internal changes to the script that are worth mentioning: Read the rest of this entry »

pfam_scan.pl

May 21, 2009

We’re currently working on a new version of one of our core scripts, ‘pfam_scan.pl’. This script searches a set of protein sequences (in FASTA format) against Pfam’s library of HMMs. The original code was written nearly a decade ago but, since then, features have been added, bugs have been fixed and the code has evolved into something that is far from elegant. The re-write is something that we’ve been planning to do for a while and, as the code needs updating to use the new HMMER3 software, now seems like the perfect time to do it. Read the rest of this entry »

HMMER3 migration: resolving overlaps

March 19, 2009

It has been a little quiet on the Pfam blog recently, but behind the scenes we’ve been working hard on the migration to HMMER3.

We have built HMMER3 models for all of the Pfam alignments, and searched them against the sequence database. This part was super quick, as HMMER3 is ~100 times faster than HMMER2. Due to the increased sensitivity of HMMER3, many of our Pfam families have grown in size, and we have found that ~80,000 sequences in the sequence database now have overlapping matches to more than one Pfam family.

Within Pfam we have a rule that states that our families should not overlap; this means that any one amino acid can belong to only a single Pfam family.  The exception to this rule applies to families within a clan – clans are Pfam’s collections of related families – where overlaps between clan members are allowed. Over the last few weeks we’ve been working through and resolving the list of 80,000 overlaps. Read the rest of this entry »

Early adoption of HMMER3

January 21, 2009

As the first post suggested, this blog will partly describe the progress and issues faced with the migration of Pfam to HMMER3.  We’ve been waiting for the mercurial HMMER3 for well over a year now, watching all the while its ever receding release date.  However, it has finally been released, albeit in alpha phase! Given Sean Eddy’s past record on HMMER2, particularly his attention to detail and his hatred of bugs in his software, we (Pfam) are already confident enough to be looking at migrating to HMMER3. This post will set out the rationale for moving Pfam to HMMER3 quickly and look at some of the issues that will inevitably follow such a move. Read the rest of this entry »