Following on from Jaina and Marco’s blog post last week about conserved Human regions not in Pfam, I would like to give you some examples of how we have used the regions identified to improve existing Pfam families, and to create new ones. When available, we use three-dimensional structures to guide the boundary definitions of our families. In cases where there is no available structure, either for the protein in question or for other proteins in the same Pfam family, we base boundary decisions on sequence conservation. The following paragraphs give three examples of cases I have looked at recently.
Posts Tagged ‘proteome’
Recently, we have been looking at how much of the human proteome is covered by Pfam (release 27.0), and ways in which we can improve this coverage. We have even written an open access paper about it that you can read here  that is part of the proceedings of the 2013 Biocuration conference. We used the human proteins in UniProtKB/Swiss-Prot  (~20,000 sequences) as our human proteome set, and found that while most of the sequences in this set have some Pfam annotation (90% have at least one Pfam domain), there is still much ground to cover before we have a complete map of all (conserved) human regions (HRs). Here, rather than repeating what we presented in the paper (did we mention it is open access? ), we would like to tell you more about the impact this study is having on our strategies for selecting target regions to be added to Pfam.
We’ve had a few helpdesk tickets in the last few months asking how to download all of the Pfam-A domains for a particular species. This information can be quite difficult to obtain: getting it requires either downloading and installing a sub-set of the tables in our MySQL database, or else searching all of the sequences from the species of interest against Pfam, probably using our batch search.