Website update

October 29, 2009

We’ve just updated the Pfam website again. This update comes fairly soon after the major, Pfam 24.0 release and it’s intended to fix some of the more annoying bugs and omissions that we’ve found in the last week or so.
There are various small changes and fixes all over the site, but there are several more significant changes that you might need to be aware of.

Documentation

We’ve started the painful process of updating our documentation for the HMMER3-derived data. You’ll find that most of the tabs in the help page are now up to date, though some do still carry a warning about their content. We’ll be working through these remaining sections and will update them as soon as possible.

Domain architectures search

We now have the domain architectures search working again. The submission form has been tidied up a little and we use the new domain graphics library to render the architecture graphics but, beyond that, the search should work as it did before.

Sequence search restrictions

The previous version of HMMER, v2, accepted “-” as a valid sequence character, but HMMER3 considers that to be an invalid character and returns an error. In the initial 24.0 website release, you could submit a search sequence with “-” and the search would fail with an unhelpful message. With this update, the validation procedure on the website now catches “-” before submission and tells you where the invalid character was found.

GI numbers

Putting a GI number into a jump box now sends you to a page about that NCBI sequence entry, rather than returning errors. The NCBI pages are still “under construction”, but they’re better than the gaping hole that previously existed in the site !

RESTful services

We’ve reinstated the “RESTful” services for the major parts of the site. You should now be able to use the API to get data about Pfam-A families and individual proteins, and, probably after some changes to your code, to submit single-sequence searches again.

The switch to HMMER3 has changed quite a few aspects of the data, as well as the way we run our searches, so there are changes to most of the XML schemas, mostly fairly minor. The documentation on the RESTful interface is now up to date, so check there for information on the new XML formats.

Sequence searches

It’s probably worth giving a little more detail on the changes that we’ve had to make to the RESTful interface to the sequence search system. If you’ve previously used the search interface, you will probably need to update your scripts accordingly.

No “estimated time”

In Pfam 23.0,  searches used HMMER2 and could take on the order of a minute for a Pfam-A search, so we used a “polling page” to give the user something to look at while they waited. It showed a progress bar and gave the user an idea of the estimated run time for the search. Now that we’re using HMMER3, most Pfam-A searches are so fast that it took longer to load the polling page than to run the search. Now we don’t bother calculating an estimated run time and we’ve ditched that intermediate page altogether. Results are now loaded into the results page as they appear on the search system.

When running a search using the API, you can simply add a short delay (a couple of seconds should be fine for most sequences) between submitting the search and retrieving the results. If your search is still running when you try to retrieve the results, you just won’t get results; wait a little longer and hit the same URL again.

When running a search against Pfam 23.0, the procedure was to submit the search and check the estimated run time in the XML that came back. After waiting for that period, you would then retrieve results from a URL in the XML.

Only one job ID

In Pfam 23.0, if you chose to run both a Pfam-A and a Pfam-B search, you would get two job identifiers and you would have to retrieve results for each search separately. Because we use HMMER3 to search for both Pfam-A and Pfam-B matches now, we run the jobs in the same queue, so there’s only a single job ID. Furthermore, the Pfam-A and Pfam-B hits are returned in the same result XML document. You can distinguish them using the “type” attribute on the match element.

Summary

There have been a lots of small changes and a few larger ones in this update, but hopefully nothing too disruptive. If you have any problems using any of the newly added or recently repaired features, do let us know.

Posted by John.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s