Getting all Pfam-A domains for a proteome

June 21, 2012

We’ve had a few helpdesk tickets in the last few months asking how to download all of the Pfam-A domains for a particular species. This information can be quite difficult to obtain: getting it requires either downloading and installing a sub-set of the tables in our MySQL database, or else searching all of the sequences from the species of interest against Pfam, probably using our batch search.

We thought it would be useful to simplify the process and add the domain information directly to our proteome pages, so we’ve just done exactly that.

If you go to the proteome page for a particular species, for example Plasmodium falciparum, and click on the ‘Domain Composition‘ tab, you’ll now find a link above the table that will let you download a text file with the list of all regions for that proteome. We’ve only added these links in the Pfam website at Sanger so far but they’ll appear in the other Pfam sites soon. The data files are all available directly from our FTP site too, indexed by NCBI taxonomy ID.

We hope you’ll find this feature useful.

Posted by Jaina and John.

6 Responses to “Getting all Pfam-A domains for a proteome”

  1. Paul Says:

    Don’t you really want to use a BioMart for this sort of thing?


  2. This is a great feature that was missing from Pfam. Thanks a lot for adding this. Definitely useful.

  3. Niall Says:

    Is there a version using the complete proteomes from uniprot i.e. leaving out the trEmbl stuff and just including the reviewed proteins?

  4. jainamistry Says:

    Dear Niall,

    The file contains all proteins in a proteome, including those from TrEMBL. As John said above, our longer-term plan is to have a BioMart to serve this sort of data. When we have this in place, you will be able to select the subset of proteins in a proteome that are from the Swiss-Prot section of UniProt.

    Jaina


Leave a reply to Paul Cancel reply