Posts Tagged ‘xfam’

Dfam 3.7 : ~3.4 million TE models across 2346 taxa

January 12, 2023

We at Dfam are pleased to announce the latest data release! The Dfam 3.7 release includes additional raw and curated datasets, resulting in a ~4.5x increase in the number of families compared to the previous Dfam 3.6 data release over a wide range of taxa. Please note the large size of the newest release and plan accordingly. It may be beneficial to filter and download the relevant data to your project by utilizing the API. 

EBI dataset contributes to the quadrupling of the Dfam database 

Our continued collaboration with Fergal Martin and Denye Ogeh from the European Bioinformatics Institute (EBI) has provided an additional 771 assemblies and their associated TE models that are now a part of the DR records in Dfam. This brings the total contribution of genomic data from EBI to 1551 species. The new data expands taxa such as Viridiplantae (green plants) and Actinopterygii (bony fishes), and broadens Dfam coverage with the addition of Echinodermata (starfishes, sea urchins/cucumbers) and Petromyzontiformes (lampreys). 

Community submissions – adding diversity to Dfam

Taro (Colocasia esculenta) – a threatened food staple

One of the most ancient cultivated crops, taro is a food staple in the Pacific Islands and the Caribbean, which is currently threatened by taro leaf blight (TLB). Some populations of taro are resistant to TLB, but the genetic basis for this resistance is unknown. As part of an effort to understand the genetic basis of TLB resistance, a taro de novo assembly was generated and the repetitive content was analyzed [1]. The high repetitive content (~82%) of this genome was positively correlated with genome size, with the potential to be linked to TLB resistance. Contributed by M. Renee Bellinger.

Gesneriaceae – understanding angiosperm morphological variation

A member of the plant family Gesneriaceae, the Cape Primrose Streptocarpus rexii has long been studied by evolutionary biologists due to its unique morphological aspects. Genetic resources are critical in order to study the unique meristem evolution of this plant family. As such, a genome annotation pipeline was generated in order to handle the shortcomings of current technical challenges of genome annotation. Part of this effort included generating repeat libraries for not only the Cape Primrose, but also for Dorcoceras hygrometricum and Primulina huaijiensis [2]. Providing these libraries to Dfam will enhance the resources available for future genomic characterization of this plant family.  Contributed by Kanae Nishii.

Mosquito (Anopheles coluzzii) – a human malaria vector

The adaptive flexibility of Anopheles coluzzii, a primary vector of human malaria, allows it escape efforts to control the mosquito population with insecticides. As TEs are integral to adaptive processes in other species, it was hypothesized that TEs could be what is allowing the rapid resistance of A. coluzzii to classic methods of intervention. Analyzing six individuals from two African localities allowed the authors to provide a comprehensive TE library [3]. This effort enhances the resources available to study the genomic architecture and gene regulation underpinning the success of this malaria vector. Contributed by Carlos Vargas and Josefa Gonzalez.

Water flea (Daphnia pulicaria) – a model organism to study climate change

Due to their short lifespans and reproductive capabilities, water fleas are used as a bioindicator to study the effects of toxins on an ecosystem, and are thus useful in studying climate change. A study of two ecological sister taxa – Daphnia pulicaria and Daphnia pulex – analyzed the evolutionary forces of recombination and gene density in driving the differentiation and divergence of the two aforementioned species [4]. TE content was analyzed as part of generating the new Daphnia pulicaria genome assembly.  Contributed by Mathew Wersebe.

601 insects – transposable element influence on species diversity 

TEs are drivers of evolution eukaryotes. However, in some underrepresented taxa, TE dynamics are less well understood. To this end, 601 insect genomes over 20 Orders were analyzed for TE content to analyze the variation between and among insect Orders. This work highlights the need for community-submitted high-quality libraries.  Contributed by John Sproul and Jacqueline Heckenhauer.

Analysis of six bat genomes – evolution of bat adaptations

Bats are an excellent example of complex adaptations, such as flight, echolocation, longevity and immunity. In order to enhance the genomic resources to study the development of complex traits, six high-quality genomes assemblies using long- and short-read technologies were generated (Rhinolophus ferrumequinumRousettus aegyptiacusPhyllostomus discolorMyotis myotisPipistrellus kuhlii and Molossus molossus) [6]. As part of the effort to annotate these new genome assemblies, the TE content was analyzed. These six genomes displayed a wide range of diversity in TE content, perhaps contributing to their complex traits.  Contributed by Kevin Sullivan and David Ray.

LTR7/ERVH – transcriptional regulation in the human embryo

The mechanism by which human endogenous retrovirus type-H (HERVH) exerts regulatory activities fostering self-renewal and pluripotency in the pre-implantation embryo is unknown. In order to elucidate the aforementioned mechanism, the transcription dynamics and sequence signature evolution of HERVH were analyzed [7]. This study not only revealed previously undefined LTR7 subfamilies, but also provided a comprehensive phytoregulatory analysis of all the identified subfamilies against locus-specific regulatory data available in genome-wide assays of embryonic stem cells (ESCs), providing evidence for subfamily-specific promoter activity. The complex evolutionary history of LTR7 is mirrored in the transcriptional partitioning that takes place during early embryonic development.  Contributed by Thomas Carter, Cédric Feschotte, and Arian Smit.

References

1. Bellinger, M. R., Paudel, R., Starnes, S., Kambic, L., Kantar, M. B., Wolfgruber, T., Lamour, K., Geib, S., Sim, S., Miyasaka, S. C., Helmkampf, M., & Shintaku, M. (2020). Taro Genome Assembly and Linkage Map Reveal QTLs for Resistance to Taro Leaf Blight. G3 (Bethesda, Md.)10(8), 2763–2775. https://doi.org/10.1534/g3.120.401367

    2. Nishii, K., Hart, M., Kelso, N., Barber, S., Chen, Y. Y., Thomson, M., Trivedi, U., Twyford, A. D., & Möller, M. (2022). The first genome for the Cape Primrose Streptocarpus rexii (Gesneriaceae), a model plant for studying meristem-driven shoot diversity. Plant direct6(4), e388. https://doi.org/10.1002/pld3.388

    3. Vargas-Chavez, C., Longo Pendy, N. M., Nsango, S. E., Aguilera, L., Ayala, D., & González, J. (2022). Transposable element variants and their potential adaptive impact in urban populations of the malaria vector Anopheles coluzziiGenome research32(1), 189–202. https://doi.org/10.1101/gr.275761.121

    4. Wersebe, M. J., Sherman, R. E., Jeyasingh, P. D., & Weider, L. J. (2022). The roles of recombination and selection in shaping genomic divergence in an incipient ecological species complex. Molecular ecology, 10.1111/mec.16383. Advance online publication. https://doi.org/10.1111/mec.16383

    5. Sproul, J.S., Hotaling, S., Heckenhauer, J., Powell, A., Larracuente, A.M., Kelley, J.L., Pauls, S.U., Frandsen, P.B. (2022). Repetitive elements in the era of biodiversity genomics: insights from 600+ insect genomes. bioRxiv 2022.06.02.494618; doi: https://doi.org/10.1101/2022.06.02.494618

    6. Jebb, D., Huang, Z., Pippel, M., Hughes, G. M., Lavrichenko, K., Devanna, P., Winkler, S., Jermiin, L. S., Skirmuntt, E. C., Katzourakis, A., Burkitt-Gray, L., Ray, D. A., Sullivan, K. A. M., Roscito, J. G., Kirilenko, B. M., Dávalos, L. M., Corthals, A. P., Power, M. L., Jones, G., Ransome, R. D., … Teeling, E. C. (2020). Six reference-quality genomes reveal evolution of bat adaptations. Nature583(7817), 578–584. https://doi.org/10.1038/s41586-020-2486-3

    7. Carter, T. A., Singh, M., Dumbović, G., Chobirko, J. D., Rinn, J. L., & Feschotte, C. (2022). Mosaic cis-regulatory evolution drives transcriptional partitioning of HERVH endogenous retrovirus in the human embryo. eLife11, e76257. https://doi.org/10.7554/eLife.76257

    Moving to xfam.org

    May 1, 2014

    Back in November 2012 we announced that the Xfam team in the UK was moving from the Wellcome Trust Sanger Institute to the European Bioinformatics Institute (EMBL-EBI), just next door on the Wellcome Trust Genome Campus. On Tuesday we completed that move by switching off the Pfam and Rfam websites inside Sanger and redirecting all traffic to our shiny new home at xfam.org. You can now find the Pfam and Rfam websites at pfam.xfam.org and rfam.xfam.org respectively. Read the rest of this entry »

    We’ve moved, now the websites

    January 30, 2014

    In November 2012, we announced that the Xfam groups were moving the few tens of metres from the Wellcome Trust Sanger Institute to the European Bioinformatics Institute. We warned you then, that the websites would also eventually move. Read the rest of this entry »

    Who’s who ?

    March 22, 2011

    It has been some time since we posted a blog, so, to keep you all on your toes, we are going behind the scenes to reveal something about the minds that run Pfam… From the longest-serving member to the newest recruit we have elicited a few key facts in the form of answers to some ‘trivial’ questions. Here are two profiles as they were given.  Can you work out who is who?

    Read the rest of this entry »

    Job opportunities and staff changes at Xfam

    September 1, 2010

    We have been very sad to see a few people leave the group recently. Rob Finn has been the dedicated and hard working project leader of Pfam for many years. In fact as a summer student he is credited with preparing most of the families for Pfam 2.0 [1]! We’re expecting to see great things from him at his new post at HHMI’s Janelia Farm. We’ve also seen Jaina Mistry get married and move to another city, fortunately for us she’s still working part-time for Pfam remotely. Jen Daub after her whirlwind trip around the world will also be working part-time on the Rfam project from her luxurious new abode in France.

    This means we have a number of opportunities for bright and enthusiastic people. We are looking to recruit a new Project Leader to lead the Pfam group. This is an exciting opportunity for a motivated, enthusiastic and experienced computational biologist, and is an influential position working with a high profile bioinformatic resource. We anticipate the candidate will lead the next phase of database development that will include community annotation and the incorporation of new developments based on the HMMER3 software. We would expect the successful candidate to have their own research ideas
    and be able to deliver research outputs with the group.

    We are also looking for two Computational Biologists to join the group. The successful candidate will ideally a MSc in bioinformatics or equivalent experience and a strong background in molecular biology, biochemistry, genetics or similar.

    We would also like to take the opportunity to welcome Professor John Burke from the University of Vermont. John is taking a one year sabbatical with Rfam to learn about all things bioinformatic. He is already an expert on all things to do with ribozymes and RNA structure, so we expect some major improvements in Rfam in these areas.

    Last but not least, we have Chris Boursnell, a refugee from the banking world, who is working us and the fine Recode database
    to improve our coverage of frame-shift elements.

    Xfam consortium meeting

    June 4, 2010

    The annual Xfam consortium meeting was held on the 10-11th May 2010 and we have the photographic evidence to prove it.

    We spent the two days listening to talks from everyone about the latest developments.  We were particularly interested to hear about new developments in HMMER3 and INFERNAL – fundamental pieces of software that Xfam rely on.  Nucleotide enabled HMMER3 is in development and will be great for Rfam, hopefully replacing the current BLAST pre-filters.  We are also had updates on how the HMMER software scales using multiple threading and/or MPI.

    We also had a number of wide ranging discussions.  Erik Sonnhammer unfortunately wasn’t present this time so the usual discussion on Stockholm alignment format was avoided.  However, we had a fulsome discussion of Pfam family naming nomenclature.  It was generally agreed that although there were rules followed for Pfam short names, no one else in the world understood them.  So we will endeavour to add a new section to our documentation about it.  We discussed how much information is actually required before a DUF (domain of unknown function) is renamed to something more meaningful.

    We were blessed because the Icelandic ash cloud didn’t intervene.  But one of our number did leave their passport in a car bound for Oxford causing a delay home. We would like to thank all the members of the Pfam and Rfam consortia for coming and also to our other EBI attendees.

    Janelia Farm Research Campus: Sean Eddy, Eric Nawrocki, Travis Wheeler, Tom Jones, Diana Kolbe, Michael Farrar

    Stockholm Bioinformatics Center: Kristoffer Forslund, Dave Messina

    Wellcome Trust Sanger Institute: Alex Bateman, Paul Gardner, Lars Barquist, Jaina Mistry, John Tate, Prasad Gunasekaran, Penny Coggill, Rob Finn

    University of Manchester: Sam Griffiths-Jones

    University of Oxford: Andreas Heger

    University of Helsinki: Liisa Holm

    Other friends from EBI: Sarah Hunter, Phil Jones, Craig McAnulla  and  Javier Herrero.

    A season of Xfam courses

    May 28, 2010

    It seems this year Xfam is involved in a lot of courses this year. Here are a few of the dates when you can see Xfam-ers in the flesh! Read the rest of this entry »

    Xfam consortium meeting – Have your say

    May 5, 2010

    The Xfam consortium will be having their annual meeting in Cambridge on the 10th-11th May. Members of the Rfam and Pfam consortia including the developers of HMMER and INFERNAL will be getting together along with some friends from InterPro, ADDA and Ensembl.

    This is a time of great planning, strategising, and playing kubb. Its also an opportunity for you to influence the future direction of Rfam and Pfam.  Please let us know if you have great ideas for how you would like to see Xfam develop in the future.

    Posted by Alex

    Alex wins the Benjamin Franklin award!

    April 1, 2010

    Our very own Alex Bateman has been awarded the prestigious Benjamin Franklin award! This is an annual award presented to someone in the community who has made significant contributions to promoting open access in the life sciences.

    Nominations are made by at least two members of the community and then votes are collected by the good people at bioinformatics.org. Alex faced some stiff competition from many greats in the field yet still managed to win. He is the third Xfam associate who has won the award, joining Ewan Birney and Sean Eddy.

    Naturally, all of the Xfam members are very happy with this result and are currently glowing in the reflected glory (or is that the result of the celebratory bubbly or the unseasonal weather).

    Posted by: Rob and Paul.

    Welcome to the Xfam blog

    January 19, 2009

    Welcome to the new blog for the Xfam databases ! Xfam is our shorthand for the combination of Pfam and Rfam databases, which we note will also future-proof us, in case we add any further databases to the brand.

    We hope that this blog will become a useful point of reference, where our users can learn about what is going on behind the scenes at Xfam central. We will be announcing some important changes that are coming with the eagerly awaited release of HMMER 3. As well as announcing new releases of the data and website, we’ll also try to discuss our philosophy on protein/RNA domains and sequence classification. If there are other topics that you would like to hear more about, why not leave us a comment.