Archive for the 'News' Category

Pfam SARS-CoV-2 special update

April 2, 2020

The SARS-CoV-2 pandemic has mobilised a worldwide research effort to understand the pathogen itself and the mechanism of COVID-19 disease, as well as to identify treatment options. Although Pfam already provided useful annotation for SARS-CoV-2, we decided to update our models and annotations for this virus in an effort to help the research community. This post explains what was done and how we are making the data available as quickly as possible.

What have we done?

We assessed all the protein sequences provided by UniProt via its new COVID-19 portal (https://covid-19.uniprot.org/), identified those which lacked an existing Pfam model, and set about building models as required. In some cases we built families based on recently solved structures of SARS-CoV-2 proteins. For example, we built three new families representing the three structural domains of the NSP15 protein (Figure 1) based on the structure by Youngchang Kim and colleagues (http://europepmc.org/article/PPR/PPR115432). In other cases, such as Pfam’s RNA dependent RNA polymerase family (PF00680), we took our existing family and extended its taxonomic range to ensure it included the new SARS-CoV-2 sequences.

Figure 1. The structure of NSP15 (PDB:6VWW) from Kim et al. shows the three new Pfam domains. (1) CoV_NSP15_N (PF19219) Coronavirus replicase NSP15, N-terminal oligomerisation domain in red, (2) CoV_NSP15_M (PF19216) Coronavirus replicase NSP15, middle domain in blue and (3) CoV_NSP15_C (PF19215) Coronavirus replicase NSP15, uridylate-specific endoribonuclease in green.

We have also stratified our ID nomenclature and descriptions of the families to ensure they are both correct and consistent. The majority of the family identifiers now begin with either CoV, for coronavirus specific families, or bCoV for the families which are specific to the betacoronavirus clade, which SARS-CoV-2 belongs to. We have also fixed inconsistencies in the naming and descriptions of the various non-structural proteins, using NSPx for those proteins encoded by the replicase polyprotein, and NSx for those encoded by other ORFs. We are grateful to Philippe Le Mercier from the Swiss Institute of Bioinformatics who gave us valuable guidance for our nomenclature.

Where are the data?

You can access a small HMM library (Pfam-A.SARS-CoV-2.hmm) for all the Pfam families that match the SARS-CoV-2 protein sequences on the Pfam FTP site:

ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam_SARS-CoV-2_1.0/

You can also find a file (matches.scan) showing the matches of the models against the SARS-CoV-2 sequences in the same FTP location. These updates are not yet available on the Pfam website. We anticipate making them available in 6-8 weeks.  We hope you find our SARS-CoV-2 models useful for your research, and as always we welcome your feedback via email at pfam-help@ebi.ac.uk.

How to use this library?

This library is not compatible with the pfam_scan software that we normally recommend to reproduce Pfam matches, as this library only contains a small subset of models.  If you wish to compare these models to your own sequences, please use the following HMMER commands:

$ hmmpress  Pfam-A.SARS-CoV-2.hmm

This only needs to be performed once. Then to compare your sequences (in a file called my.fasta) to this special Pfam profile HMM library, then:

$ hmmscan --cut_ga --domtblout matches.scan Pfam-A.SARS-CoV-2.hmm my.fasta

The –domtblout option enables you to save the matches in a more convenient tabular form, if you do not want to parse the HMMER output.

And finally

We will be making Pfam alignments available during the next week and will produce another blog post describing them.

Posted by The Pfam team

Rfam 12.1 has been released

April 27, 2016

Rfam 12.1 announcement

We are happy to announce a new release of Rfam. Version 12.1, based on the same sequence dataset as Rfam 12.0, features over 20 new families, a new clan competing algorithm, a publicly accessible MySQL database, and many website fixes.

Read the rest of this entry »

Rfam 12.0 is out

September 24, 2014

We are pleased to announce the release of Rfam 12.0! Read the rest of this entry »

Moving to xfam.org

May 1, 2014

Back in November 2012 we announced that the Xfam team in the UK was moving from the Wellcome Trust Sanger Institute to the European Bioinformatics Institute (EMBL-EBI), just next door on the Wellcome Trust Genome Campus. On Tuesday we completed that move by switching off the Pfam and Rfam websites inside Sanger and redirecting all traffic to our shiny new home at xfam.org. You can now find the Pfam and Rfam websites at pfam.xfam.org and rfam.xfam.org respectively. Read the rest of this entry »

Visualising & exploring TreeFam gene families

February 19, 2014

The latest TreeFam release 9 has 15,736 gene families. These families vary significantly in size (number of family members), conservation (alignment conservation) and taxonomic diversity (younger families that are only found in e.g. Vertebrates vs. older ones that were present in the last common ancestor of Metazoa).

Visualising & exploring gene families

We have always wanted to find a way to visualise our families according to the above mentioned criteria.
Wouldn’t it be nice if you could easily see all highly conserved families or all families with >= 400 genes? Read the rest of this entry »

We’ve moved, now the websites

January 30, 2014

In November 2012, we announced that the Xfam groups were moving the few tens of metres from the Wellcome Trust Sanger Institute to the European Bioinformatics Institute. We warned you then, that the websites would also eventually move. Read the rest of this entry »

TreeFam: new Orthology-on-the-fly feature

September 17, 2013

The identification of orthologs in related organism is a routine task and many databases/tools are available to do that. Some of the databases can be installed locally, which is not ideal in cases where the target is to find orthologs for a single/few genes only. To fill this gap, we developed a quick orthology-on-the-fly prediction tool that is built on top of the HMMER search we introduced in release 9 and can be used here: www.treefam.org. Read the rest of this entry »

The Rfam NAR paper is now available!

November 23, 2012

For some light weekend reading, have a look at the latest Rfam paper, Rfam 11.0: 10 years of RNA Families.  It’s part of the 2013 Nucleic Acids Research Database issue, and you’ll find all the latest developments to Rfam mentioned, including the sunbursts, the Biomart and an update on the Wikipedia annotation effort.

Dfam 1.1 released

November 15, 2012

We are pleased to announce that we’ve released Dfam 1.1. This version represents a few important changes from 1.0, including updated hit results, a new tab for each entry page showing relationships to other entries, and improved handling of redundant profile hits.

Read the rest of this entry »

What’s new in AntiFam?

November 13, 2012

We have recently produced a new release of AntiFam, release 3.0. AntiFam has grown in size, and release 3.0 contains 54 entries – compared to just 23 when we last blogged about AntiFam (release 1.1).  Over 80 % of these new entries arise from translations of non-coding RNAs, including several families from translations of rRNA, tmRNA and RNaseP.

Read the rest of this entry »