2. Architecture of Chipster platform
Authentication Management
service service
Message broker
File broker
Clients
Brokers Computing
services
Loosely coupled, independent components
Message oriented communications
Flexible, scalable, robust
In other words, very cloud like
4. Chipster in the cloud
1) Deploying compute nodes in the cloud
• Easy, because architecture already loosely coupled and based
on message passing
2) Running large parallel jobs in the cloud
• Architecture allows this easily
• Cloud compatible tools can be integrated quickly
3) Using cloud as a back end for interactive
visualisations
• Not maybe so obvious
• So let's dig into this further...
5. Background: Chipster Genome Browser
Interactive Swing-based GUI
Shows reads and analysis results in genomic context
Interactive zooming from chromosome down to nucleotide level
Ensembl annotations for genes and transcripts
Integrated with the rest of the Chipster
Parallel, distributed to some extent
9. Basic idea
Preprocess data with Hadoop / MapReduce
Generate powers of two summaries for the data, like in
Google Earth
• Doubles the data size
Current genome browser samples data to produce
summaries
Now summaries can be read directly
– Accurate results, significantly less disk seeks
Distribute data to scale into massive datasets
• Use messaging to query independent data providers
Aggregate results as/if they appear to the visualiser
10. Work in progress...
Genome browser up and
running
Hadoop based data
processing at very early
stages
Currently trying to get it
scale well
11. What's the point?
Besides items (e.g., reads), visualiser can receive
“superitems” (e.g., summaries of reads)
• Summarises coverage, quality, SNP's etc. of the original reads
All kinds of advanced information can be generated in
the preprocessing step
– Such as features that combine large number of genomes
– Generators should be pluggable
We spend resources on the server side to improve user
experience on the client side
• At server side CPU, memory and disk space required
• But only for a short time (like in large batch jobs)
• Cheap commodity servers can be used
• And the experiment has already been expensive
12. Summary
Use cheap server resources to enable better user
experience
Goal: to make data analysis quicker (and more fun)
Tackle server side unreliability on the client side
Future development
– If this works out, it could be used in other Chipster
visualisers also
– Integrating Hbase queries to interactive visualisations
– Optimising data summarising for visual truthfulness
For more info: aleksi.kallio@csc.fi,