Archive for the 'Dfam' Category

Dfam 3.2 Release

July 9, 2020

Dfam is proud to announce the release of Dfam 3.2.  This release represents a significant step in the expansion of Dfam by providing early access to uncurated, de novo generated families.  As a demonstration of this new capability, we imported a set of 336 RepeatModeler generated libraries produced by Fergal Martin and Denye Ogeh at the European Bioinformatics Institute (EBI).  Also in this release, Dfam now provides family alignments to the RepeatMasker TE protein database aiding in the discovery of related families and in the classification of uncurated TEs.

Uncurated Family Support

In addition to the fully curated libraries for the model organisms human, mouse, zebrafish, worm and fly, Dfam also includes curated libraries for seven other species.  While a fully curated library is the ultimate goal, support for uncurated families has become an essential aspect of a TE resource due to the increasing rate at which new species are being sequenced and the need to have at least a simple TE masking library available.

By standardizing the storage and tracking of uncurated families, it becomes possible to use these datasets to crudely mask an assembly, provide a first approximation of the TE content, and create a starting point for community curation efforts.  Due to the redundancy and fragmentation inherent in these datasets, we do not compute genome-specific thresholds or generate genome coverage plots for these families.  The latest update to the web portal includes new interfaces for uncurated families and some existing interfaces now include an option to include/omit uncurated families.

In this release, Dfam now contains RepeatModeler de novo-produced libraries for an additional 336 species as the result of the collaboration with EBI researchers (denoted with the new uncurated accession prefix “DR”).  Notable taxa expansions include sauropsida (lizards and birds) and fishes (bony and cartilaginous) (Table1). Also included are Amphibia, Viridiplantae and additional species in Mammalia. 

Table 1. De novo-identified TE families from additional species

SpeciesNumber (species)RetrotransposonsDNA transposonsOther
Actinopterygii (bony fishes)116275205136177006
Chondrichthyes (cartilaginous fishes)516711982273
Viridiplantae (green plants)28964121687

Aligned Protein Features

In previous versions of Dfam, hand-curated coding regions were provided for a select set of families.  The protein products of these curated sequences were placed in the RepeatMasker TE protein database for use with the RepeatProteinMask tool.  In this release we have used this database with BLASTX to produce alignments to all Dfam families including the uncurated entries.  The resulting alignments are displayed alongside the curated coding regions as the new “aligned” feature track (Figure 1).

Figure 1. Feature track and details for BLASTX alignments to TE protein database.

Website improvements

Several minor improvements have been made to the interface since the previous release.  The browse page now provides links to download the families selected by the query/filter options as HMM, EMBL or FASTA records.  The Seed tab of the Families page now displays the average Kimura divergence of the seed alignment instances to the consensus.

Curation with Dfam: new data and platform updates

March 17, 2020

DNA transposon termini signatures

The Dfam consortium is excited to announce the generation and release of terminal repeat sequence signatures for class II DNA transposable elements. The termini of class II elements are crucial for movement, and as such, can be used to classify de novo DNA transposable element families in new genomic sequences (Figure 1).

Figure 1. Major subgroups of class II DNA transposons.

The LOGOs of the termini can be viewed on the “Classifications” tab on the Dfam website and are organized by class II subclasses (e.g., Crypton, Helitron, TIR, etc.) (Figure 2). This allows for easy visualization of the base conservation at each position in the terminal sequences and comparisons between the 5’ and 3’ termini (Figure 2). In addition, the termini profiles are available for download as a .HMM file.

Figure 2. Termini signature visualization on the Dfam website ( sample. Base conservation can be seen via the LOGOs of the 5’, 3’ and combined edge (termini) HMMs. The movement type can be seen preceding DNA transposons that move via a common mechanism (e.g. “Circular dsDNA intermediate). The number of families used to generate the LOGOs are indicated, as well as the subclass named (e.g. “Crypton_A”). Additional notes on the termini, when relevant, are also available.

Community data submissions

We have taken the first small step towards a community-driven data curation platform by developing a new data submission system.  At the start this will facilitate the process of uploading data to the site for processing by the curators. As we move forward, further aspects of the curation process will be made available to the community.  Upon creating an account and logging in, users can submit files to Dfam using our web-based upload page. Here you will also find information about submission requirements and how different levels of library quality are handled in Dfam.

Dfam 3.0 is out

March 6, 2019


The Dfam consortium is excited to announce the release of Dfam 3.0.  This release represents a major transition for Dfam from a proof-of-concept database into a funded open community resource. Central to this transition is a major infrastructure and technology update, enabling Dfam to handle the increasing pace of genome sequencing and TE library generation. Equally important, we merged Dfam_consensus with Dfam to produce a single resource for transposable element family modeling and annotation. In doing so, Dfam serves the needs of a broader research community while maintaining a high standard for family characterization (seed alignments), and TE annotation sensitivity. Finally, and most importantly, we are working on making Dfam a community driven resource through the development of online curation tools and direct user engagement.

Infrastructure updates

Dfam has undergone a major infrastructure upgrade since the last release including faster servers and storage systems, a new software stack and improved website features. Together these updates will allow Dfam to greatly expand the number of families and the species represented. The new software stack includes a publicly accessible REST API, which provides the core functionality used by the redesigned website and is available for use in community developed applications and workflows. The new website is based on the Angular framework, supporting both a traditional web portal to the Dfam database as well as the use of interactive tools for data management and curation.

Dfam_consensus merger

The merger of Dfam_consensus with Dfam created a combined database of 6,235 TE families in 9 organisms, each characterized by a seed alignment of representative family members. Seed alignments constitute a rich dataset for generating sequence models such as consensus sequences, or profile Hidden Markov Models (HMMs).

Consensus sequence databases have traditionally not preserved the sequence alignment from which the consensus was generated. This omission has made it difficult to evaluate the strength of the consensus, to make incremental improvements by adding/removing members, or to regenerate models using improved methodologies. By adding support for consensus sequences to Dfam, the provenance is preserved in the seed alignment. In addition, the positions within the consensus can be directly related to the corresponding match states within the profile HMM.

Improved interfaces and metadata

The new Dfam website contains several features borrowed from Dfam_consensus including: the seed alignment visualization, the TE classification system and visualization, and per-family and full-database EMBL exports for consensus sequences.

TE classification tree visualization with search facility:


In addition, we have improved the family browsing interface, and added the ability to store/visualize family features such as coding sequences, target site preferences, binding sites, as well as ad-hoc sequence annotation.

Coding regions and target site duplication details for Kolobok-1_DR:


Dfam has adopted the recently developed (for Dfam_consensus) classification system for repetitive sequences and applied it to all of the Dfam-2.x families. This system combines concepts from established systems (Wicker et. al., Piegu et. al., Curcio et. al., Smit et. al., and Jurka et. al.) with phylogenies based on reverse transcriptase and transposases. Classification names were chosen to be as descriptive as possible while still honoring the most widely used acronyms for well-defined classes.

Dfam families may be queried using the new browse form:



Community engagement

We are embarking on an effort to greatly expand the database using de-novo repeat identification pipelines, data sharing with other open-databases, and most importantly from direct community submissions. If you have existing TE libraries or plan to develop one for a newly sequenced organism, consider making it a part of the Dfam database. We can offer assistance with importing legacy datasets and are working on tools to facilitate direct community curation of the database. Please contact us at

Dfam project seeks postdoctoral fellow

August 20, 2018
We are excited to announce the opening of a postdoctoral fellowship within the Dfam project and located at the Institute for Systems Biology (ISB) in Seattle.  At ISB, the Smit lab is focused on the study of Transposable Element (TE) biology, and evolution using the latest developments in sequence modeling, phylogenetic reconstruction, and homology detection.  We have developed some of the most widely used tools and databases for the study of TEs including RepeatMasker, RepeatModeler, the Repeat Protein Database, and Dfam.
The position offers an opportunity to help shape the future of the new Dfam community resource in collaboration with the Wheeler lab at the University of Montana, and through a partnership with the NIH.  The project will involve investigating and advancing de novo methods for the generation of TE libraries, development of improved methods for classifying new TE families, design of quality metrics and standards for TE modeling, providing TE family curation assistance to the research community and building/studying TE families in a unique set of newly sequenced species.
Applicants should hold (or will shortly be awarded) a PhD degree and have experience in TE biology, and genomics.  Prior experience with TE library generation/curation, genome biology, and genome evolution are considered a major advantage.  Candidates should have strong communication and data analysis skills, and an established record of principal authorship in peer-reviewed publication(s).  The successful candidate is passionate about science, motivated, proactive and able to work in a team.
To apply please visit the career website at:

Introducing Dfam_consensus – Dfam’s consensus sequence twin

May 18, 2017

Since its inception in 2012, Dfam has demonstrated the promise of using profile hidden Markov Models (HMMs) to improve the detection sensitivity and annotation quality of Transposable Element (TEs) families in human[1] and subsequently for four additional reference organisms[2].  Despite these advances, the tools used to discover new families ( de-novo repeat finders ), improve families ( extend, defragment, subfamily clustering ), and classify TE families continue to depend on consensus sequence models.  This discordance between methodologies is a direct impediment to Dfam’s expansion.

Read the rest of this entry »

Meet Dfam2.0

October 30, 2015

Dfam is growing up. This is the first major expansion of the database since it’s inception. We’ve added repeat families from four new organisms: mouse, zebrafish, fruit fly, and nematode. In total, this release includes 2,844 new familes ( 4,150 total ).

Read the rest of this entry »

Say hello to Dfam1.4

May 13, 2015

With Dfam, we are striving to build models of repeat families that yield high sensitivity without undue false annotation.  In this release of Dfam, we have improved our model building strategy to reduce the potential for false annotation, especially in the context of overextending alignments around true interspersed repeat instances.

Read the rest of this entry »

Dfam 1.3 released

January 7, 2015

We are pleased to announce the release of Dfam 1.3. This release includes almost 200 new repeat families and updates the underlying human genome to hg38.

Read the rest of this entry »

We’ve moved, now the websites

January 30, 2014

In November 2012, we announced that the Xfam groups were moving the few tens of metres from the Wellcome Trust Sanger Institute to the European Bioinformatics Institute. We warned you then, that the websites would also eventually move. Read the rest of this entry »

Dfam 1.2 released

May 31, 2013

We are pleased to announce that we’ve released Dfam 1.2. This version represents a few important changes from 1.1, including increased sensitivity for many families, a new plot on the model page, and an improved Relationships tab.

Read the rest of this entry »