Dfam: A database of repetitive DNA elements

September 6, 2012

We are pleased to introduce Dfam 1.0, a database of profile HMMs for repetitive DNA elements. Repetitive DNA, especially the remnants of transposable elements, makes up a large fraction of many genomes, especially eukaryotic. Accurate annotation of these TEs both simplifies downstream genomic analysis and enables research into their fascinating biology and impact on the genome.

Dfam upgrades TE annotation by moving from searching with a single sequence to searching with a profile HMM. This is possible now that the HMMER3 project has made profile HMM homology search of DNA fast enough for regular application to whole genomes, with the new tool nhmmer (a part of the future HMMER3.1, with a snapshot release available here). Dfam is a collaboration between Jerzy Jurka and his Repbase resources (Genetic Information Research Institute), Arian Smit and his RepeatMasker software (Institute for Systems Biology, Seattle), the HMMER3 development team at Janelia Farm (particularly Travis Wheeler, leading nhmmer development), and the Xfam database consortium (particularly Rob Finn, here at Janelia).

Repbase is the result of many years of expert biological TE curation, and lies at the heart of the state-of-the-art tool for genome annotation of TEs, RepeatMasker. In essence, the Dfam collaboration has resulted in a procedure for upgrading a Repbase consensus sequence into a multiple alignment, which is used in turn to produce a profile HMM. RepeatMasker has been overhauled to incorporate the combination of Dfam and nhmmer, with a corresponding release coming soon.

This first release of Dfam consists of profile HMMs for TEs known to exist in human. Going forward, we expect to expand Dfam to cover many more species, continuing the collaboration between the groups involved in this release. Using Dfam and nhmmer, TE search sensitivity is increased such that the fraction of the human genome annotated as TE is more than 3.5% higher than using state-of-the-art methods.

The database itself is available for use at dfam.janelia.org.

Posted by Travis and Rob

One Response to “Dfam: A database of repetitive DNA elements”

  1. yuan xu Says:

    great work! it is very useful in the annotation of TEs in the genome


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 153 other followers