Archive for January, 2012

The Pfam website in a virtual machine

January 26, 2012

Since releasing the new Pfam website four years ago, we’ve had a steady trickle of mails from users who would like to install and run the site within their own local environment. It used to be possible to do just that, given a following wind, if you were ready to install the site from its source code. Unfortunately, after some internal changes and as the list of Perl module dependencies grew and grew, the process got harder and more complex and eventually we stopped supporting it entirely. We’ve been actively discouraging people from trying this for far too long, all the while promising to make the process easier. Finally we’ve managed to get around to building a virtual machine (VM) that should make the whole thing possible again. Read the rest of this entry »

What are these new families with _2, _3, _4 endings?

January 19, 2012

Some users have been contacting us about the new families that are appeared in Pfam release 26.0.

As pointed out by one of our users:

Pfam v26 includes, in addition to DDE_Tnp_1, the following new families:

DDE_Tnp_1_2
DDE_Tnp_1_3
DDE_Tnp_1_4
DDE_Tnp_1_5
DDE_Tnp_1_6
DDE_Tnp_1_7

These extra new families with the name_2, name_3, name_4 etc, have been constructed to increase the coverage of Pfam.  Many of our existing large diverse families are not well modelled by a single HMM and there are many true members that are not matched. So by building multiple models we can match more things.  Each of these models will be in the same Pfam clan, the RNaseH clan in this case.  For the most part these models do not represent any particular subfamily or classification group.  Essentially you should think of a match to any of the above seven DDE_TnP_1 families as being the same thing.  Because of the way  Pfam is built any particular region of a protein may only belong to one of these families.  We have a step in building clans called competition which means that if a region of a protein matches to both DDE_Tnp_1 and DDE_Tnp_1_2 for example then the region will be assigned to the family with the highest score.  This means that a match to DDE_Tnp_1 in release 25.0 may now end up in a different family such as DDE_Tnp_1_2.  You shouldn’t read too much into these changes.

The reason that many of these new families are appearing in Pfam release 26.0 is due to a change in strategy in how we are building many new Pfam families.  The new strategy consists of taking complete genomes and taking each protein that does not match Pfam and using it as a starting point for a Jackhmmer search.  Jackhmmer is an iterative search tool like PSI-blast.  If we find that the Jackhmmer search finds lots of homologues but has some overlaps with an existing family then we may build one of these new additional families to increase coverage of known sequences. Rather than give these families completely new names we simply call them the same as the existing family and append a number to them to show that they are closely related to each other.

 

Posted by Alex

In Support of Wikipedia

January 17, 2012

Many of you will be aware of the proposed web blackout in response to the Stop Online Piracy Act which is currently going through the U.S. House of Representatives (you can read the BBC’s explanation of the Act here). If this Act is enforced, it has far-reaching consequences for the overall freedom of the internet. Editors of the English Wikipedia have taken the decision to close the English Wikipedia for 24 hours, starting at 0500 hrs on Wednesday 18th January. To respect this protest, we will also be making our Wikipedia content unavailable during this time.

You’ll still be able to access all the non-Wikipedia content – that is, all the covariance models and HMMs describing families, domain graphics, full and seed alignments, as well as our species trees.

Posted by Sarah and Alex

The new NAR paper is out!

January 15, 2012

Dear Pfam-mers,

As you surely have noted the highly anticipated new Pfam paper is out as part of the 2012 NAR database issue! We were delighted to be listed as a featured article. The paper covers the new release 26.0 (more on this from Rob soon) and presents some novel analysis that may be of interest to Pfam addicts like you. We quite extensively discuss our use of family-specific bit score gathering thresholds (GAs), hoping to bring clarity to an issue that seems to have been a source of confusion in the past (a.k.a. stop sending us tickets asking what GAs are and how to use them! :-)). Also, we extend and update the analysis of DUF families that was presented in a previous publication hoping to push more people into the de-DUF a DUF game. So, enjoy reading the paper and send us comments and suggestions, your support and advice is as always invaluable to us!!

Posted by Marco

Follow

Get every new post delivered to your Inbox.

Join 139 other followers