From Big Data to Insights: Opportunities and Challenges for TEI in Genomics
This document discusses how tangible and embodied interaction (TEI) systems could help address challenges in genomics. It outlines the scale of genomic data, heterogeneous datasets, and need to engage diverse audiences. Case studies show how TEI supports genome browsing, visualization, and modeling complex systems. The document concludes that TEI may facilitate understanding complex problems, enabling large collaborations, visualizing biological data, and supporting audiences across varied timescales from milliseconds to millennia. Going forward, TEI represents an opportunity to engage with massive genomic datasets in ways that highlight connections between evidence at different scales.
3. Genomics
“While the work is a challenge, making genetics
interactive is potentially as
transformative as the move from batch
processing to time sharing”
-Bafna V. et al. Communications of the ACM Jan 2013
7. Scale
Filesystem @ Broad Inst.: 13+PB
One run of an Illumina HiSeq 2500:
6 billion paired-end sequences
(600 gigabases, or 120Gb/day)
Thousand Genomes project:
692 collaborators
110 institutions
>15 groups in (bi-)weekly
conference calls
Blue Waters cluster:
>380K CPU cores
+ >3K GPUs
10. How can TEI systems be designed to
• Empower citizens to make informed health decisions?
• Communicate scientific data to communities?
• Enhance learning of complex concepts?
• Support experts interacting with big data?
15. 48.4%
1.0%2.4%
46.6%
1.6%
Human genome: understanding ca. 2012
Mobile elements
Processed pseudogenes
Tandem repeats & low
complexity DNA
Dark matter
Protein & RNA coding
regions
Composition of other primate genomes is very similar
Tangibles-targeted computational genomics
16. Example projects: rhesus, orangutan, human, marmoset genomes
• Often multi-institution, multi-person efforts
– Above articles: ~250, 100 co-authors
• Often long duration (e.g., 4-6 years before first publication)
• Iterative fusion of computational and “wet bench” analyses
• Some analyses “big CPU” (e.g., 200 cpu cores for weeks);
others, “big RAM” (200+GB RAM)
21. Lessons learned
TEI can facilitate immediate, visible, and easily reversible manipulations
• How to design TEI for open-ended creative inquiries?
Tangible representations can facilitate multi-stage workflows
• Important for execution and tracking of complex analyses
• Need parametrized, annotatable representations of complex large datasets
TEI could facilitate collaboration for distributed and co-located teams
• Large interdisciplinary teams and distributed work are common in this area
• Users can jointly manipulate assumptions and see consequences
Tangible tools can support understanding and discovery
• Provide access to different pieces of the problem (data, reactions)
• Help users forms accurate mental models through tangible/embodied manipulation
22. Opportunities for TEI Engagement
Understanding Complex Problems
Visualizing Biological Data
Enabling Large Collaborations
Supporting Diverse Audiences
Managing Varied Timescales
25. Managing Varied Timescales
Powers of 10,000:
• Milliseconds
• Minutes
• Months
• Millenia
Entangling Space, Form, Light, Time, Computational STEAM, and Cultural Artifacts
Examples
• Many genome projects: 5+ years
• Sequencing Lincoln’s DNA: under
active discussion since 1991
• Most of us sequenced within decade?
materially impacting all our descendants
26. Going forward
• Some aspects w/ broad TEI, computational science synergies
• How to visualize and engage data, activity, progress spanning
many systems, people, places, timescales?
• What representational forms, device ecologies, most
appropriate for large, abstract data?
• Facilitating engagement with big data in ways that highlight
connections between multiple forms of evidence
• Some aspects specific to genomics
• 2023: anticipate most of us in room + many thousands of
species having genomes fully or partially sequenced
• Commonalities, distinctions in engagements by scientists,
students, street people, senators, senior citizens, solicitors, …
27. THANKS!
Orit Shaer: oshaer@wellesley.edu
Ali Mazalek: mazalek@gatech.edu
Brygg Ullmer: ullmer@lsu.edu
Miriam Konkel: konkel@lsu.edu
Consuelo Valdes (Wellesley College) and Andy Wu (Georgia Tech).
This work has been partially funded by NSF IIS-1017693, DRL-
097394084, and CNS-1126739.