Visualising & exploring TreeFam gene families

February 19, 2014

The latest TreeFam release 9 has 15,736 gene families. These families vary significantly in size (number of family members), conservation (alignment conservation) and taxonomic diversity (younger families that are only found in e.g. Vertebrates vs. older ones that were present in the last common ancestor of Metazoa).

Visualising & exploring gene families

We have always wanted to find a way to visualise our families according to the above mentioned criteria.
Wouldn’t it be nice if you could easily see all highly conserved families or all families with >= 400 genes?

How to do that technically?

Recently, there were some interesting JavaScript libraries developed, namely  D3, dc.js and crossfilter.
D3 is the library we use to provide interactive trees (check here for the source code). Basically, D3 allows you to bind your data to svg elements. This could be a bar chart – for example, the following bar chart shows the distribution of alignment conservation of all TreeFam families.
d3_alignment_conservation
Coming back to our goal to visualise our gene families, let’s say for each of the above mentioned categories (family size, alignment conservation, taxonomic origin, etc) you want a bar chart. Well, using D3 you can do that and it would probably look nice (check here for a tutorial on how to build bar charts or click  here for other tutorials).  This is nice, but the visualisation is rather static.

What about interactivity?

Ok, ideally you want to link the different charts in a way that allows you to look at a subset of families by simply using the mouse to select a subset from one the chart, and using that as a filter for the data presenting in all of the other charts on the page.  Fortunately, the people behind dc.js have implemented this. And the best is, that is really easy to use, you don’t even have to know how to plot bar charts yourself, dc does it for you (see  the dc wiki if you are interested to learn more about dc).

D3 + dc + TreeFam gene families

So, we have used this d3 + dc.js library to visualise our families and a prototype can be seen on our dev site (see the following picture for an example).

dc_treefam_families

Overview of TreeFam families. The different charts show alignment conservation, number of gene members, taxonomic root, as well as presence of genes from model organisms (click on the image to get to the TreeFam website)

What you can do: The visualisation should be self-explanatory and will allow you to answer simple queries, e.g.:

  1. How many Vertebrate families are there?
  2. Show me all families with ~1 gene/species
  3. Which are the highly-conserved families (alignment conservation >= 85%)?

But also more complicated ones, e.g.

  1. How many eukaryotic families are highly conserved, have at least one human gene and more than one annotated Pfam family?

We see this visualisation workbench as a proof-of-concept and plan to expand it in the future. The code is available on Github, so feel free to get a copy and use it with your own data.  Let us know what you think and if you would like to see additional information charted. 

Posted by Fabian

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s