DUFs: families in need of function

April 20, 2009

Domains of Unknown Function, or DUFs, is a large set of families found in the Pfam database. Examples would be “DUF26” or “DUF282“. The DUF naming scheme was introduced by Chris Ponting, through the addition of DUF1 and DUF2 to the SMART database. These two domains were found to be widely distributed in bacterial signalling proteins. Subsequently, the functions of these domains were identified and they have since been renamed as the GGDEF and EAL domains respectively (structures shown in Figures 1 and 2). These families were added to Pfam in 1997, and little did Chris know that he was starting a trend that would see thousands of uncharacterised families being added to the domain databases.

GGDEF domain

Structure of the GGDEF domain (in green), formerly known as DUF1, now known to function as a diguanylate cyclase enzyme. Structure of the EAL domain (in green), formerly known as DUF2, now known to function as a cyclic diguanylate-specific phosphodiesterase enzyme.

EAL domain

Structure of the EAL domain (in green), formerly known as DUF2, now known to function as a cyclic diguanylate-specific phosphodiesterase enzyme.

At least in Britain, the word “duff” conjures up something substandard, with the dictionary definition stating:

duff adj. Brit. slang 1. worthless 2. useless.

However, in reality, DUFs are treated with the same loving care as all other Pfam families. The only difference is that our curators are unable to identify any functional information from the scientific literature.

In Pfam release 23, the DUF number scheme has reached DUF2607, and the fraction of DUF families in Pfam has increased to about 22% of all families (shown in Figure 3).

Growth of DUFs

Growth of DUFs in Pfam

It looks as though the number of DUFs is on the increase. Because DUFs require little annotation, they are often easy families to add to Pfam. We expect that soon the number of DUFs will outnumber the families of known function being added to Pfam.

Identifying functions for Domains of Unknown Function is extremely important if we are to understand biology at a systems level. Essentially there are two ways to find out the function of an uncharacterised domain: the first involves identifying a similarity to a domain of known function, either by sequence comparison or perhaps from a newly solved structure; the second way is good old fashioned molecular biology. Sir Rich Roberts put forward a proposal to stimulate experimentation on uncharacterized proteins [1].

Slowly, momentum is being gained and more functions of DUFs are being identified. Since we started adding DUFs nearly 10 years ago, over 270 of them have been renamed presumably when a function had been identified. Our curators have not yet had time to recheck all of the existing 2,000 or so DUFs to see if new functional information has been identified. Therefore, over the coming year, we hope to recheck all of them, and rename and re-annotate those where function is now known. If you know of any recently identified functions for these families, please do let us know.

In recent years, structural genomics initiatives have solved the structures of literally hundreds of proteins in uncharacterised families. In many cases, this has helped to narrow down the possible function of a family. For example, DUF442 was shown to be a non-classical phosphatase enzyme, see Figure 4 [2].

Active site of DUF442

Active site of the DUF442 phosphatase enzyme in sequence and structure (From Krishna et al)

The DUFs remain a treasure trove of novel biology waiting to be plundered. So, why not get that pioneer spirit and join the gold rush!

Posted by Alex.


[1] R.J. Roberts (2004): Identifying protein function – A call for community action.
Plos. Biol. 2:e42.

[2] S.S. Krishna et al. (2007) Crystal structure of NMA1982 from Neisseria meningitidis at 1.5 Å resolution provides a structural scaffold for nonclassical, eukaryotic-like phosphatases. Proteins. 69:415-421

2 Responses to “DUFs: families in need of function”

  1. Mike Sauder Says:

    We have deposited structures for these DUFs and possibly others: DUF54, 98, 120, 127, 178, 201, 309, 327, 330, 357, 372, 375, 381, 447, 519, 586, 600, 655, 881, 910, 1028, 1105, 1246, 1273, 1281, 1297, 1460, 1528, 1694.

    We have proteins for these as well as many other DUFs. If there are groups interested in trying to functionally characterize these, we have a source of protein. In addition, the plasmids for ALL our Pfam DUF targets will be available through the PSI Material Repository. At least 60% of our Pfam clone collection has already been transferred.

    the NYSGXRC

  2. alexbateman Says:

    Hi Mike,

    Thanks for the message. The work of structural genomics is really valuable in suggesting functions. We really appreciate it! Getting experimental groups to start work on these would be fantastic. Do any of these DUF structures give functional hints that are of particular interest to follow up quickly?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: