Naming by numbers

July 21, 2010

A user recently asked us why two highly similar sequences that contain a PAS domain are in different Pfam families within the PAS clan.  The PAS domain clan (CL0183) currently contains seven different families: PAS, PAS_2, PAS_3, etc up to PAS_6, as well as the MEKHLA family.  We thought we would take the opportunity to explain some of the rationale behind the way in which we construct and name our families and clans.At Pfam we always attempt to maximise coverage of our families and domains across known sequences.  For large divergent families, we often find that a single model is unable to capture all of the true members. In such cases we create a series of related families that are somewhat overlapping in their matches, but that find more members than any one single model. Often the families in a clan are built at widely different times, and they do not always reflect the functional groupings within these large families.

A question we’re often asked is, when I see the following naming schemes family_1, family_2, family_3, can I assume that they are all related families?  Well, mostly the answer is yes. Examples of clans with similar families using this numbering system include AAA, AB_hydrolase, EF_hand, Glyco_hydro_tim, and the LRR.  Pfam has also used this sort of naming scheme when proteins with the same function are not related to each other, such as Acetyltransf_1 and Acetyltransf_2. Further examples occur with the Lipoprotein_X families, of which there are more than thirteen examples, only two of which appear in the same clan.

We realise that there are further inconsistencies within the database. For example there are families called eRF1_1, eRF1_2 and eRF1_3, which represent distinct structural domains within the eRF1 protein. We will be addressing this instance by re-naming them eRF1_d1, eRF1_d2, eRF1_d3.

Other inconsistencies have occurred with the Transposases; however, see the last blog post for our reasons for altering these names to reflect the nature of the Insertion Elements – transposons – that make them up.

If you find two families in the same clan, you can be pretty certain they are related.  If you think there are two (or more) families that are related and that are not currently in the same clan, do write to us (and include any supporting information) so that we can update our classification.

In summary, we try and name our families in a logical and systematic way, but this is not always easy!  Note, however, that we are happy to change our family names when there is good reason to.

Penny and Alex


