We’ve just updated the Pfam website again. This update comes fairly soon after the major, Pfam 24.0 release and it’s intended to fix some of the more annoying bugs and omissions that we’ve found in the last week or so. Read the rest of this entry »
Posts Tagged ‘pfam’
Website update
October 29, 2009Pfam release 24.0
October 13, 2009We have just released the latest update to Pfam. Release 24.0 contains a total of 11,912 families, with 1,808 new families and 236 families killed since the last release. 75.15% of all proteins in Pfamseq contain a match to at least one Pfam domain. 53.18% of all residues in the sequence database fall within Pfam domains. Read the rest of this entry »
Imminent Release of Pfam 24.0
October 2, 2009We are now on the brink of releasing Pfam 24.0. This release of Pfam, version 24.0, will be a landmark release as it will be the first to be built using the the new version of the HMMER package, HMMER3. We are well aware that we have been claiming this release as imminent for some time, but we are now at the point of flicking the big switch. There are numerous changes that users need to know about and we will briefly summarise them here. Read the rest of this entry »
pfam_scan.pl – part II
September 11, 2009Back in May we wrote a blog post about the new version of pfam_scan.pl. We asked if there was anyone out there who was willing to help us test our new script, and we were pleasantly surprised at the number of people who got in contact with us – so a big thank you to all those who have helped. Since releasing the alpha version of pfam_scan.pl to our testers we have made some internal changes to the script that are worth mentioning: Read the rest of this entry »
pfam_scan.pl
May 21, 2009We’re currently working on a new version of one of our core scripts, ‘pfam_scan.pl’. This script searches a set of protein sequences (in FASTA format) against Pfam’s library of HMMs. The original code was written nearly a decade ago but, since then, features have been added, bugs have been fixed and the code has evolved into something that is far from elegant. The re-write is something that we’ve been planning to do for a while and, as the code needs updating to use the new HMMER3 software, now seems like the perfect time to do it. Read the rest of this entry »
DUFs: families in need of function
April 20, 2009Domains of Unknown Function, or DUFs, is a large set of families found in the Pfam database. Examples would be “DUF26” or “DUF282“. The DUF naming scheme was introduced by Chris Ponting, through the addition of DUF1 and DUF2 to the SMART database. These two domains were found to be widely distributed in bacterial signalling proteins. Subsequently, the functions of these domains were identified and they have since been renamed as the GGDEF and EAL domains respectively (structures shown in Figures 1 and 2). These families were added to Pfam in 1997, and little did Chris know that he was starting a trend that would see thousands of uncharacterised families being added to the domain databases. Read the rest of this entry »
HMMER3 migration: resolving overlaps
March 19, 2009It has been a little quiet on the Pfam blog recently, but behind the scenes we’ve been working hard on the migration to HMMER3.
We have built HMMER3 models for all of the Pfam alignments, and searched them against the sequence database. This part was super quick, as HMMER3 is ~100 times faster than HMMER2. Due to the increased sensitivity of HMMER3, many of our Pfam families have grown in size, and we have found that ~80,000 sequences in the sequence database now have overlapping matches to more than one Pfam family.
Within Pfam we have a rule that states that our families should not overlap; this means that any one amino acid can belong to only a single Pfam family. The exception to this rule applies to families within a clan – clans are Pfam’s collections of related families – where overlaps between clan members are allowed. Over the last few weeks we’ve been working through and resolving the list of 80,000 overlaps. Read the rest of this entry »
What Pfam did in 2008
January 27, 2009I thought it would be useful to give a quick overview of some of the major things that have been going on behind the scenes at Pfam during 2008. Overall it may have seemed like a quiet year for our users as we only made one public release of data in July, release 23.0. However, like a paddling duck, the calmness viewed from above belies some furious paddling below. Read the rest of this entry »
Early adoption of HMMER3
January 21, 2009As the first post suggested, this blog will partly describe the progress and issues faced with the migration of Pfam to HMMER3. We’ve been waiting for the mercurial HMMER3 for well over a year now, watching all the while its ever receding release date. However, it has finally been released, albeit in alpha phase! Given Sean Eddy’s past record on HMMER2, particularly his attention to detail and his hatred of bugs in his software, we (Pfam) are already confident enough to be looking at migrating to HMMER3. This post will set out the rationale for moving Pfam to HMMER3 quickly and look at some of the issues that will inevitably follow such a move. Read the rest of this entry »
Welcome to the Xfam blog
January 19, 2009Welcome to the new blog for the Xfam databases ! Xfam is our shorthand for the combination of Pfam and Rfam databases, which we note will also future-proof us, in case we add any further databases to the brand.
We hope that this blog will become a useful point of reference, where our users can learn about what is going on behind the scenes at Xfam central. We will be announcing some important changes that are coming with the eagerly awaited release of HMMER 3. As well as announcing new releases of the data and website, we’ll also try to discuss our philosophy on protein/RNA domains and sequence classification. If there are other topics that you would like to hear more about, why not leave us a comment.