This was a presentation I gave at the I-DCC (http://www.i-dcc.org) project kick-off meeting in April 2009. The gist of the talk was how we can go about making a collaborative data portal (one of the goals of the project) and showing some early prototype work that I had done.
4. WP4 objectives
Create a site to display current repository
information
Create DAS-tracks, to display this information in
its genomic context
Create a Biomart. The Biomart will ser ve DAS-
tracks, provide query web-ser vices, and link to
other Biomarts (including EnsMart), greatly
enhancing the search capability and future utility
of the repository
5. The idea...
De-centralize the data, everyone who
wants in on the portal: use Biomart!
Standardized
Web services and DAS out of the box
This makes the data open to all
We promise not to take over the world
6. The idea...
2 Interfaces:
Damian
New MartView interface
(advanced search)
Us
Google-like search
(simple search - “MartSearch”)
7. The idea...
Turn the portal into a Biomart mashup!
“In web development, a mashup is a Web
application that combines data from one or more
sources into a single integrated tool. The term
Mashup implies easy, fast integration,
frequently done by access to open APIs and data
sources to produce results that were not the
original reason for producing the raw source
data” - Wikipedia
8. Implementation
100% Javascript driven user interface
User goes to the portal enters a search
term, this gets fired against a cloud of
biomarts and returns a coherent
response
No complex controller logic (it shouldn’t
need any)
9. Javascript?!? Aaargh!!
The old days...
Browser incompatibilities, clunky performance
Now...
Javascript is fast!
Chrome, Firefox 3.1, Safari 4, IE 8
Libraries take care of the cross-browser issues
11. Plan A
HTTP request
MartSearch
Martservice XML query
Biomart based federation
12. Plan A
HTTP request
MartSearch
You Can only federate
across 2 marts
Martservice XML query
Search times can vary
greatly with federation
Biomart based federation
13. Plan B
HTTP request
MartSearch
Martser vice XML query to
each mart, perform
federation on the fly
14. Plan B
HTTP request
MartSearchattribute
Searching on more than one
requires many XML requests per mart
No way to page results
Martser vice XML query to
No way of doing OR queries
each mart, perform
No way of doing loose text queries
federation on the fly
15. Plan C
HTTP request
1
MartSearch
Send query to Lucene based
search index and retrieve paged
list of genes and linking IDs 2
Martservice XML
query to each mart
0 Index the searchable fields from the biomarts
16. Plan C
FAST search results HTTP request
Can do loose text and OR queries
1
Pagination
MartSearch
Solr takes care of the federation for you
Send query to Lucene based
search index and retrieve paged
list of genes and linking IDs 2
Martservice XML
query to each mart
One more software stack to accommodate
Need to re-build index after mart rebuild
0 Index the searchable fields from the biomarts
21. Fast, flexible searching
Customizable
Add and remove data source from display
Restrict the data coming back from source
Extensible
Adding in new data sources should be easy
Custom templates for every data source
Open
Anyone can access the data and index (via ser vices)
Anyone can get the code
22. How it works...
Apache Solr
(http://lucene.apache.org/solr)
Enterprise grade search ser ver built
upon lucene
Web service driven
Represents each search object as
a document
24. How it works...
jQuery (http://jquery.com)
jQuery UI (http://jqueryui.com)
EJS (http://embeddedjs.com)
ActiveRecord.js
(http://activerecordjs.org)
Jamal (http://jamal-mvc.com)
25. Moving for ward...
Make (and/or integrate) more marts
MGI, Komp-DCC, Eurexpress, GXD, EuroPhenome
Portal branding, design, colour, layout
How to represent the data
Dictated by the type of user...
Who are our users and what do
they want from us?!?!?
28. Typical scenario
Each group says...
I’ll take this task - will send you the
results when it’s ready
If we’re (very) lucky, we get something
sort of coherent in the end
30. What we should do...
Open source code on a public repository
Github, Google Code, Sourceforge
Or even one of our own - as long as its public
Shared bug tracking / support and wiki
Github (wiki) + Lighthouse (bug tracking)
Google Code / Sourceforge
Host an instance of Redmine or Trac