Domains of Unknown Function, or DUFs, is a large set of families found in the Pfam database. Examples would be “DUF26” or “DUF282“. The DUF naming scheme was introduced by Chris Ponting, through the addition of DUF1 and DUF2 to the SMART database. These two domains were found to be widely distributed in bacterial signalling proteins. Subsequently, the functions of these domains were identified and they have since been renamed as the GGDEF and EAL domains respectively (structures shown in Figures 1 and 2). These families were added to Pfam in 1997, and little did Chris know that he was starting a trend that would see thousands of uncharacterised families being added to the domain databases.
At least in Britain, the word “duff” conjures up something substandard, with the dictionary definition stating:
duff adj. Brit. slang 1. worthless 2. useless.
However, in reality, DUFs are treated with the same loving care as all other Pfam families. The only difference is that our curators are unable to identify any functional information from the scientific literature.
In Pfam release 23, the DUF number scheme has reached DUF2607, and the fraction of DUF families in Pfam has increased to about 22% of all families (shown in Figure 3).
It looks as though the number of DUFs is on the increase. Because DUFs require little annotation, they are often easy families to add to Pfam. We expect that soon the number of DUFs will outnumber the families of known function being added to Pfam.
Identifying functions for Domains of Unknown Function is extremely important if we are to understand biology at a systems level. Essentially there are two ways to find out the function of an uncharacterised domain: the first involves identifying a similarity to a domain of known function, either by sequence comparison or perhaps from a newly solved structure; the second way is good old fashioned molecular biology. Sir Rich Roberts put forward a proposal to stimulate experimentation on uncharacterized proteins .
Slowly, momentum is being gained and more functions of DUFs are being identified. Since we started adding DUFs nearly 10 years ago, over 270 of them have been renamed presumably when a function had been identified. Our curators have not yet had time to recheck all of the existing 2,000 or so DUFs to see if new functional information has been identified. Therefore, over the coming year, we hope to recheck all of them, and rename and re-annotate those where function is now known. If you know of any recently identified functions for these families, please do let us know.
In recent years, structural genomics initiatives have solved the structures of literally hundreds of proteins in uncharacterised families. In many cases, this has helped to narrow down the possible function of a family. For example, DUF442 was shown to be a non-classical phosphatase enzyme, see Figure 4 .
The DUFs remain a treasure trove of novel biology waiting to be plundered. So, why not get that pioneer spirit and join the gold rush!
Posted by Alex.
 R.J. Roberts (2004): Identifying protein function – A call for community action.
Plos. Biol. 2:e42.
 S.S. Krishna et al. (2007) Crystal structure of NMA1982 from Neisseria meningitidis at 1.5 Å resolution provides a structural scaffold for nonclassical, eukaryotic-like phosphatases. Proteins. 69:415-421