Dfam 3.6 release

April 21, 2022

We are pleased to announce the latest data release of the Dfam database! This latest release approximately doubles the number of species from the Dfam 3.5 release (595 to 1,109), and increases the number of transposable element (TE) families by ~2.5x (285,542 to 732,993. A more detailed summary of the species included can be seen in Table 1, and in the Dfam 3.6 release notes.

Community-submitted libraries

A huge thank you to the TE community for submitting your data to us! In this release, we have: 1) 3,360 curated rice weevil TE models, submitted by Clément Goubert and Rita Rebollo1; 2) 22 SINE families obtained from 15 moth species (Lepidoptera insects) submitted by Guangjie Han et al.2; 3) 120 Penelope-classified families – something about how they span several kingdoms/orders? submitted by Rory Craig et al.3; and 4) 41 repeat families generated as part of the T2T human assembly project4 – not including the 22 “composite” repetitive families, which will be available as part of a later Dfam release. To read more about the studies associated with these submissions, please see the references below.

Rice weevil: an agricultural pest

(Background copied from paper): The rice weevil Sitophilus oryzae is one of the most important agricultural pests, causing extensive damage to cereal in fields and to stored grains. S. oryzae has an intracellular symbiotic relationship (endosymbiosis) with the Gram-negative bacterium Sodalis pierantonius and is a valuable model to decipher host-symbiont molecular interactions. In the paper (see below), the authors show that many TE families are transcriptionally active, and changes in their expression are associated with insect endosymbiotic state.

Moth SINEs: high diversity

(Conclusions copied from paper): Lepidopteran insect genomes harbor a diversity of SINEs. The retrotransposition activity and copy number of these SINEs varies considerably between host lineages and SINE lineages. Host-parasite interactions facilitate the horizontal transfer of SINE between baculovirus and its lepidopteran hosts.

Penelope elements: far-reaching impacts

The authors investigate the Penelope (PLE) content of a wide variety of eukaryotes. (copied from paper): This paper uncovers the hitherto unknown PLE diversity, which spans all eukaryotic kingdoms, testifying to their ancient origins. 

T2T entries: previously hidden genomic content

A new human genome assembly has been released! The new assembly (T2T or chm13) has sequenced and assembled the remaining 10% of the human genome that was previously unattainable. The entries described in the manuscript are part of this newly-analyzed sequence.

EBI libraries

In collaboration the European Bioinformatic Institute (EBI), we processed and imported RepeatModeler runs on 444 additional species, resulting in the addition of 440,543 families. Additional extension and re-classification sites were run on each models and fate final consensus and HMMs were produced. Please note that the relationship data is not available on these uncreated imports at this time.

References associated with community submissions

