We now have an online Quick Tour that provides a brief introduction to the Pfam protein families database. It provides a basic description of Pfam, as well as advice on how to search the database and discover protein-related information. The tour also showcases various tools that allow users to visualize data in Pfam, and explains where to find out more about the resource. We recommend taking the tour to learn how to use Pfam effectively.
Archive for the 'Pfam' Category
Pfam 30.0, our second release based on UniProt reference proteomes, is now available. The new release contains a total of 16,306 families, with 22 new families and 11 families killed since the last release. The UniProt reference proteome set has expanded and now includes 17.7 million sequences, compared with 11.9 million when we made Pfam 29.0. In this release, we have updated the annotations on hundreds of Pfam entries, and renamed some of our Domains of Unknown Function (DUF) families.
DUFs are protein domains whose function is uncharacterised. Over time, as scientific knowledge increases and new data about proteins comes to light, more information about the function of a domain may become available. As a result, DUFs can be renamed and re-annotated with more meaningful descriptions. As part of Pfam 30.0, we have re-annotated 116 DUFs based on updated information in the UniProtKB database, the scientific literature, and feedback from Pfam and InterPro users. Examples of some our DUF updates in Pfam 30.0 are given below:
- PF10265, created in release 23.0 and originally named DUF2217, has been renamed to Miga, a family of proteins that promote mitochondrial fusion.
- PF10229, created in release 23.0 and originally named DUF2246, has been renamed as MMADHC, as it represents methylmalonic aciduria and homocystinuria type D proteins and their homologues. The structure of this domain is shown below.
- PF12822, created in release 25.0 and originally named DUF3816, has been renamed to ECF_trnsprt, since it contains proteins identified as the substrate-specific component of energy-coupling factor (ECF) transporters.
Please note that we may change the identifier for a family (e.g. DUF2217), but we never change the accession for a family (e.g. PF10265).
If you find any more DUFs that can be assigned a name based on function, or any other annotation updates, please get in touch with us (firstname.lastname@example.org).
Pfam 29.0, our second release of 2015, contains 16295 entries and 559 clans. We have made some major changes to our underlying sequence database and the data that are displayed on the website, which we’ve outlined below. Full details can be found in our Nucleic Acids Research paper, which is available here. Read the rest of this entry »
Back in November 2012 we announced that the Xfam team in the UK was moving from the Wellcome Trust Sanger Institute to the European Bioinformatics Institute (EMBL-EBI), just next door on the Wellcome Trust Genome Campus. On Tuesday we completed that move by switching off the Pfam and Rfam websites inside Sanger and redirecting all traffic to our shiny new home at xfam.org. You can now find the Pfam and Rfam websites at pfam.xfam.org and rfam.xfam.org respectively. Read the rest of this entry »
We have just advertised a 9-month maternity cover position in Pfam. We are looking for a skilled Bioinformatician to help us take Pfam into its next phase of development as we become more integrated into the European Bioinformatics Institute (EMBL-EBI).
Essential knowledge, skills and experience:
- Degree in Science with relevant experience
- Computer literacy (unix experience)
- Programming skills in Perl, including OO Perl
- Familiarity with writing production software
- MySQL, or similar, expertise
- Experience working with biological sequence data
- Good communications skills
See all the details on the EBI jobs page.
We’ve had a lot of questions from users recently, wondering why our pfam_scan.pl script doesn’t work with the latest release of the HMMER package, version 3.1b. This is a quick post to explain why that is, and what we’ve done about it. Read the rest of this entry »
Following on from Jaina and Marco’s blog post last week about conserved Human regions not in Pfam, I would like to give you some examples of how we have used the regions identified to improve existing Pfam families, and to create new ones. When available, we use three-dimensional structures to guide the boundary definitions of our families. In cases where there is no available structure, either for the protein in question or for other proteins in the same Pfam family, we base boundary decisions on sequence conservation. The following paragraphs give three examples of cases I have looked at recently.
Recently, we have been looking at how much of the human proteome is covered by Pfam (release 27.0), and ways in which we can improve this coverage. We have even written an open access paper about it that you can read here  that is part of the proceedings of the 2013 Biocuration conference. We used the human proteins in UniProtKB/Swiss-Prot  (~20,000 sequences) as our human proteome set, and found that while most of the sequences in this set have some Pfam annotation (90% have at least one Pfam domain), there is still much ground to cover before we have a complete map of all (conserved) human regions (HRs). Here, rather than repeating what we presented in the paper (did we mention it is open access? :-)), we would like to tell you more about the impact this study is having on our strategies for selecting target regions to be added to Pfam.
In a blog post published just over a year ago, I proposed a number of changes to the content of Pfam to improve scalability and usability of the database. These changes came into effect a few days ago, when we released Pfam 27.0. This release of Pfam contains a total of 14831 families, with 1182 new families and 22 families killed since release 26.0. 80% of all proteins in UniProt contain a match to at least one Pfam domain, and 58% of all residues in the sequence database fall within a Pfam domain. Read the rest of this entry »