Pfam 34.0 is released

March 24, 2021

Pfam 34.0 contains a total of 19,179 families and 645 clans. Since the last release, we have built 935 new families, killed 15 families and created 11 new clans. UniProt Reference Proteomes has increased by 21% since Pfam 33.1, and now contains 47 million sequences. Of the sequences that are in reference proteomes, 74.5% have at least one Pfam match, and 48.8% of all residues fall within a Pfam family.

Structural models

In our previous blog post, we announced the release of ~6,000 structural models in Pfam and InterPro. Many of the new families that we have created since the last release are large enough to be suitable for structure prediction. We have sent the alignments for new and modified Pfam families to the Baker group, who are currently generating structural models for them using their pipeline. We will release the next set of structural models when Pfam 34.0 is integrated into InterPro.

Collaboration with Google Research

We have been working with Dr Lucy Colwell’s research team at Google Research to expand Pfam coverage using deep learning methods. The deep learning approach, trained on Pfam HMMER matches, has found many additional matches which can be found in a new file called Pfam-N. There is another Pfam blog post which describes the work in more detail here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s