A few weeks ago the Pfam team was visited by the curators of the ISfinder resource, a specialist database that classifies eubacterial and archaeal transposases and insert sequences. The ISfinder database has a very different role to Pfam: it focuses on this specific set of biological sequences and is the naming authority in the field for insertion sequences (ISs). Pfam’s role is not to name transposases, but to identify the domains contained within these sequences. Below we describe the outcomes of this meeting.
Despite the latest closures of UK airspace due to ash Mick Chandler, Patricia Siguier, Edith Gourbeyre and Alessandro Varani travelled from the Laboratory of Microbiology and Molecular Genetics, CNRS Toulouse, France to Hinxton for a two day intensive meeting. The purpose of the visit was to try to address some of the conflicting Pfam nomenclature and domain assignment issues that we have with transposases and insertion elements (compared with those in ISfinder). Paramount in our meeting was to alter the Pfam names and annotations such that they are less likely to be used to incorrectly name insertion sequences. The classification of transposases relies on more than simply protein sequence conservation. We cannot stress this strongly enough! The IS finder database is carefully curated, with families defined not only by domain architecture, but genetic organisation, target DNA sequence and the chemistry of the transposition process itself. ISfinder should be used to name the transposases and insertion sequences, while Pfam can be used to identify and classify the domains contained with the sequences.
Historically, Pfam has arbitrarily labelled families that are largely comprised of transposase sequences Transposase_X, where X is an incrementing number – this system was used regardless of whether the family was a catalytic transposase domain (e.g. a classical DDE-motif containing domain), a DNA binding domain or even associated ORFs. However, such naming issues were only the tip of the iceberg. Some of the Pfam transposase families contained multiple domains, such as Ins_element1, which actually includes two domains, a Zn-finger and a Helix-Turn-Helix. In other cases, entries in Pfam that have been classed as “Transposase”, when the family is not restricted to Transposase sequences and is also found elsewhere such as in viruses. Thus, the Pfam name is misleading.
So what have we done? To address the naming issue, we have systematically gone through all bacterial transposase families in Pfam and tried to give them more meaningful names. When the family is broad in terms of the sequences we are matching, the name has been changed so that it no longer contains the term “transposase”, for example Mu_transposase has been renamed DDE_2. Where a Pfam entry specifically matches an ISfinder family, this information has been included in the new Pfam name, as in the case of Transposase_27, the catalytic endonuclease domain essential for transposition of the IS1 family, has been renamed to DDE_Tnp_IS1. Overall, there are more than 50 name changes, additions to existing clans and a new clan. We have also identified a number of new families that are missing in Pfam. We hope to have added all of these changes and new families in Pfam in time for release 25.0. Thus, if you are interested in this area of biology, expect some major changes – Thanks to Mick and the members of his group.
We plan to continue to collaborate further with the ISfinder database to maintain the consistency/accuracy of annotations in Pfam.
Posted by Rob and Mick