We are pleased to announce that we’ve released Dfam 1.1. This version represents a few important changes from 1.0, including updated hit results, a new tab for each entry page showing relationships to other entries, and improved handling of redundant profile hits.
New Hit Results
The underlying database and set of entries have not changed from Dfam 1.0, but the seed alignments have been tweaked to add in some additional hits from chromosome Y and the sequence matches (and match boundaries) have changed slightly as a result of using a new version of nhmmer (snapshot here) that better handles regions of low complexity. This allowed us to mask fewer low-complexity regions in models and decrease gathering thresholds (increasing sensitivity) while still slightly reducing false positive rates.
In many cases, a single locus of DNA sequence might be matched by multiple Dfam models, due to similarity between models. For example, there are 37 Alu subfamily models, all representing minor variants of the prototypical Alu, so any Alu instance will likely be found by nearly all Alu models. Many models have a more complicated relationship, as is the case with Ricksha (which long ago picked up the 3′ end of an ERVL, including its LTR, MLTB2), and SVA (which carries copies of both a portion of a HERVK LTR and two Alus in reverse orientation). To indicate these sorts of relationships between models, we have created a Relationships tab for each model, where we graphically represent related models, and provide a dot plot for each relationship that depicts the alignment between the related models.
Redundant profile hit resolution
We call these cases of a single locus matching multiple profile HMMs, “Redundant Profile Hits” (RPHs). When presenting hit counts and hit distributions both over the model (Coverage Plot on the Model tab) and over the genome (Hit tab), Dfam 1.0 only showed the total number of hits, so that an RPH would impact multiple model tallies. We now resolve RPHs, so that each locus is associated with only the model that best explains it (though in some cases, the resolution choice is unclear, and a couple of models might be assigned to the locus).
We have already started to receive positive feedback from some early adopters of the databases. We really appreciate this and suggestions for new features, so if you have any suggestions or would like to contribute models to the database, please get in contact. In the next release of Dfam, we aim to continue making improvements to the existing set of entries, before we pause and determine the next major milestone.
Posted by Travis and Rob