Introducing AntiFam

March 21, 2012

AntiFam [1] is the newest addition to the Xfam brand. It is a database of hidden Markov models (HMMs) designed to identify spurious open reading frames (ORFs). It is available now on our ftp site:
ftp://ftp.sanger.ac.uk/pub/databases/Pfam/AntiFam/

The sequence databases now contain over 20 million protein sequences. But not all of these proteins are what they seem. Errors in annotated genomes proteins may arise from sequencing errors, which can give rise to frameshifts or the introduction of incorrect stop codons, and also from annotation errors. Annotation errors include errors in gene identification, leading to the prediction of completely spurious proteins, and errors in functional annotation [2]. Once a spuriously predicted protein appears in the sequence databases this misannotation may be transferred to proteins from other genomes.

We have occasionally been alerted to Pfam families that consisted solely of such spuriously predicted proteins. These families have now been deleted from Pfam. One such case was PF10695, which was annotated as a cell wall hydrolase. James Tripp et aland colleagues identified that this family was composed of translations of rRNA sequences [3]. These families have been deleted from Pfam.

AntiFam contains 23 families. This includes several families which that have been deleted from Pfam, plus families arising from translation of non-coding RNAs and a family of proteins which that are frequently misannotated resulting in a translation in the wrong reading frame. We hope that AntiFam will become a useful quality control tool for genomic and metagenomic studies. Over time we hope to reduce the number of spurious proteins that make their way into the protein sequence databases.

Over the coming months we aim to expand the number of families in AntiFam. We welcome any feedback and suggestions of new spurious ORFs which we may be able to include in AntiFam.

Posted by Ruth And Alex

References:

[1] Eberhardt RY, Haft DH, Punta M, Martin M, O’Donovan C, BatemanA (2012) AntiFam: a tool to help identify spurious ORFs in protein annotation. Database 2012:Bas003 .

[2] Poptsova MS, Gogarten JP. (2010) Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiology 156(7):1909-1917.

[3]Tripp HJ, Hewson I, Boyarsky S, Stuart JM, Zehr JP.(2011) Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies. 39(20):8792-8802.

This entry was posted on March 21, 2012 at 11:25 am and is filed under AntiFam, News, Pfam.

Tags: antifam, pfam, rfam

3 Responses to “Introducing AntiFam”

Pfam team aims at cleaning erroneous protein families - Microbiology, Metagenomics and Bioinformatics Says:

April 18, 2012 at 10:27 am
[…] guys at Pfam recently introduced a new database, called AntiFam, which will provide HMM profiles for some groups of sequences that seemingly formed […]

Reply
Run AntiFam over FASTA files | BioTools Says:

June 1, 2012 at 12:29 pm
[…] information: https://xfam.wordpress.com/2012/03/21/introducing-antifam/ « Previous Entries antifam, bioinformatics, perl, […]

Reply
Moving to xfam.org | Xfam Blog Says:

May 1, 2014 at 12:43 pm
[…] a showcase for the now fairly large number of resources that are run by the Xfam consortium, from AntiFam to Dfam, iPfam to […]

Reply

Xfam Blog

Pages

Twitter

Related blogs

Recent Posts

Archives

Categories

Meta