SlideShare a Scribd company logo
1 of 42
Towards an Open Analytics Environment Ian Foster Computation Institute Argonne National Lab & University of Chicago
The Computation Institute ,[object Object],[object Object],www.ci.uchicago.edu Faculty, fellows, staff, students, computers, projects.
The Good Old Days: Astronomy ~1600 30 years ? years 10 years 6 years 2 years
Astronomy, from 1600  to 2000 Automation 10 -1    10 8  Hz data capture Community 10 0    10 4 astronomers (10 6  amateur) Computation Data 10 6    10 15  B aggregate 10 -1    10 15  Hz peak Literature 10 1    10 5 pages/year
Biomedical Research ~1600
Biomedical Research ~2000 ... atcgaattccaggcgtcacattctcaattcca... MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYT... Protein-Protein Interactions metabolism pathways receptor-ligand 4º structure Polymorphism and Variants genetic variants individual patients epidemiology Physiology Cellular biology Biochemistry Neurobiology Endocrinology etc. >10 6 ESTs  Expression patterns Large-scale screens Genetics and Maps Linkage Cytogenetic  Clone-based From John Wooley >10 6 >10 9 >10 6 >10 5 >10 9 DNA sequences alignments Proteins sequence 2º structure 3º structure
Growth of Sequences and Annotations since 1982 Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges  for the Next Decade,  CTWatch , August 2006.
The Analyst in Denial “ I just need a bigger disk (and workstation)”
An Open Analytics Environment Data in ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Results out Programs & rules in
o·pen [oh-puhn] adjective ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What Goes In (1)
What Goes In (2) Rules Workflows Dryad MapReduce Parallel programs SQL BPEL Swift SCFL R MatLab Octave
How it Cooks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What Comes Out Data Data Virtual Data Schema
Analysis as (Collaborative) Process Transform Annotate  Search Add to Tag Visualize Discover Extend Group Share
Centralized or Distributed? Both
Towards an Open Analysis Environment: (1) Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Towards an Open Analysis Environment: (2) Hardware SiCortex 6K cores, 6 Top/s IBM BG/P 160K cores, 500 Top/s PADS 10-40 Gbit/s
PADS: Petascale Active Data Store 500 TB  reliable  storage  (data & metadata) 180 TB,  180 GB/s  17 Top/s analysis Data ingest Dynamic  provisioning Parallel analysis Remote access Offload to remote  data centers P A D S Diverse users Diverse data sources 1000 TB tape  backup
Towards an Open Analysis Environment : (3) Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tagging &  Social Networking GLOSS :  Generalized  Labels Over Scientific  data Sources
XDTM: XML Data Typing & Mapping ./group23 drwxr-xr-x  4 yongzh users 2048 Nov 12 14:15  AA drwxr-xr-x  4 yongzh users 2048 Nov 11 21:13  CH drwxr-xr-x  4 yongzh users 2048 Nov 11 16:32  EC ./group23/AA : drwxr-xr-x  5 yongzh users 2048 Nov  5 12:41  04nov06aa drwxr-xr-x  4 yongzh users 2048 Dec  6 12:24  11nov06aa . /group23/AA/04nov06aa : drwxr-xr-x  2 yongzh users  2048 Nov  5 12:52  ANATOMY drwxr-xr-x  2 yongzh users 49152 Dec  5 11:40  FUNCTIONAL . /group23/AA/04nov06aa/ANATOMY : -rw-r--r--  1 yongzh users  348 Nov  5 12:29  coplanar.hdr -rw-r--r--  1 yongzh users 16777216 Nov  5 12:29  coplanar.img . /group23/AA/04nov06aa/FUNCTIONAL : -rw-r--r--  1 yongzh users  348 Nov  5 12:32  bold1_0001.hdr -rw-r--r--  1 yongzh users  409600 Nov  5 12:32  bold1_0001.img -rw-r--r--  1 yongzh users  348 Nov  5 12:32  bold1_0002.hdr -rw-r--r--  1 yongzh users  409600 Nov  5 12:32  bold1_0002.img -rw-r--r--  1 yongzh users  496 Nov 15 20:44  bold1_0002.mat -rw-r--r--  1 yongzh users  348 Nov  5 12:32  bold1_0003.hdr -rw-r--r--  1 yongzh users  409600 Nov  5 12:32  bold1_0003.img Logical Physical
fMRI Type Definitions type  Study  {  Group g[ ];  } type  Group  {  Subject s[ ];  } type  Subject  {  Volume anat;  Run run[ ];  } type  Run  {  Volume v[ ];  } type  Volume  {  Image img;  Header hdr;  } type  Image  {}; type  Header  {}; type  Warp  {}; type  Air  {}; type  AirVec  {  Air a[ ];  } type  NormAnat  { Volume anat;  Warp aWarp;  Volume nHires; }
High-Performance Data Analytics Functional MRI Ben Clifford,  Mihael Hatigan,  Mike Wilde, Yong Zhao
SwiftScript for fMRI Data Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],(Run or) reorientRun (Run ir,    string direction) { foreach Volume  iv , i in ir.v { or.v[i] = reorient( iv , direction); } }
Provenance Data Model
Multi-level Scheduling SwiftScript Abstract computation Virtual Data Catalog SwiftScript Compiler Specification Execution Virtual Node(s)‏ Worker Nodes Provenance data Provenance data Provenance collector launcher launcher file1 file2 file3 App F1 App F2 Scheduling Execution Engine (Karajan w/ Swift Runtime)‏ Swift runtime callouts C C C C Status reporting Provisioning Falkon Resource Provisioner Amazon EC2
DOCK on SiCortex ,[object Object],[object Object],[object Object],[object Object],[object Object],(does not include ~800 sec to stage input data) Ioan Raicu, Zhao Zhang
LIGO Gravitational Wave Observatory Birmingham • >1 Terabyte/day to 8 sites 770 TB replicated to date: >120 million replicas MTBF = 1 month Ann Chervenak et al., ISI; Scott Koranda et al, LIGO ,[object Object],AEI/Golm
Lag Plot for Data Transfers to Caltech Credit: Kevin Flasch, LIGO
SIDGrid: B. Bertenthal et al., U.Chicago, IU, UIC
Social Informatics Data Grid (SIDgrid) TeraGrid PADS … SIDgrid Collaborative, multi-modal analysis of cognitive science data Diverse experimental data & metadata  Browse data Search Content preview Transcode Download Analyze
ELAN SIDGrid Portal
 
A  C ommunity  I ntegrated  M odel for  E conomic  a nd  R esource  T rajectories for  H umankind ( CIM-EARTH ) Dynamics, foresight, uncertainty, resolution, … Agriculture, transport, taxation, … Data  (global, local, …) (Super) computers CIM-EARTH Framework  Community process Open code, data
Alleviating  Poverty in Thailand: Modeling  Entrepreneurship Consider  only wealth, access to  capital Consider also distance to 6 major cities Rob Townsend, Victor Zhorin, et al. Match High Low
Text Mining
GeneWays Online  Journals Pathways GeneWays Andrey Rzhetsky  et al. Screening 250,000 journal articles 2.5M reasoning chains  4M statements
Evidence Integration: Genetics & Disease Susceptibility Identify Genes Phenotype 1  Phenotype 2  Phenotype 3  Phenotype 4 Predictive Disease Susceptibility Physiology Metabolism Endocrine Proteome Immune Transcriptome Biomarker Signatures Morphometrics Pharmacokinetics Ethnicity Environment Age Gender Source: Terry Magnuson
James Evans, U.Chicago Arabidopsis  articles
An Open Analytics Environment Data in ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Results out Programs & rules in
 

More Related Content

What's hot

LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013Luis Daniel Ibáñez
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Mark Smith
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Mehwish Alam
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009Ian Foster
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog ProjectKendrick Lo
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkXiaoqian Liu
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...CIARD Movement
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudAnsgar Scherp
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesJose Enrique Ruiz
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitTyler Treat
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...William Yetman
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerVitor Hirota Makiyama
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
 
R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo dbMongoDB
 

What's hot (19)

LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
 
Velocity cubes of galaxies
Velocity cubes of galaxiesVelocity cubes of galaxies
Velocity cubes of galaxies
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Data-driven Innovation - Wood
Data-driven Innovation - WoodData-driven Innovation - Wood
Data-driven Innovation - Wood
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxies
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo db
 

Viewers also liked

Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Ian Foster
 
Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Ian Foster
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Running Hot October 2008
Running Hot October 2008Running Hot October 2008
Running Hot October 2008Ian Foster
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selectionr m
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataIan Foster
 
Recruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop SeriesRecruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop Serieshholmes75
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials DataIan Foster
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selectionr m
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World FosterIan Foster
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Campus Bridging with Globus Services
Campus Bridging with Globus ServicesCampus Bridging with Globus Services
Campus Bridging with Globus ServicesIan Foster
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshotsIan Foster
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plansIan Foster
 
Flitterin For Talent Presentation Slides
Flitterin For Talent Presentation SlidesFlitterin For Talent Presentation Slides
Flitterin For Talent Presentation Slideshholmes75
 
Mexico talk foster march 2012
Mexico talk foster march 2012Mexico talk foster march 2012
Mexico talk foster march 2012Ian Foster
 

Viewers also liked (17)

Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009
 
Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Running Hot October 2008
Running Hot October 2008Running Hot October 2008
Running Hot October 2008
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selection
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Recruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop SeriesRecruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop Series
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selection
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Campus Bridging with Globus Services
Campus Bridging with Globus ServicesCampus Bridging with Globus Services
Campus Bridging with Globus Services
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshots
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
 
Flitterin For Talent Presentation Slides
Flitterin For Talent Presentation SlidesFlitterin For Talent Presentation Slides
Flitterin For Talent Presentation Slides
 
Mexico talk foster march 2012
Mexico talk foster march 2012Mexico talk foster march 2012
Mexico talk foster march 2012
 

Similar to Open Analytics Environment

Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architecturesinside-BigData.com
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceUniversity of Washington
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesIan Foster
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloudthetfoot
 
Ph. D. Final Dissertation SLides
Ph. D. Final Dissertation SLidesPh. D. Final Dissertation SLides
Ph. D. Final Dissertation SLidesEmanuele Panigati
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEuropeBigData_Europe
 
Extracting City Traffic Events from Social Streams
 Extracting City Traffic Events from Social Streams Extracting City Traffic Events from Social Streams
Extracting City Traffic Events from Social StreamsPramod Anantharam
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsEnrico Daga
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRoku
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Thomas Gottron
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
 
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18Emanuele Della Valle
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkEamonn Maguire
 

Similar to Open Analytics Environment (20)

Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
HEPData workshop talk
HEPData workshop talkHEPData workshop talk
HEPData workshop talk
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
 
Ph. D. Final Dissertation SLides
Ph. D. Final Dissertation SLidesPh. D. Final Dissertation SLides
Ph. D. Final Dissertation SLides
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
 
Extracting City Traffic Events from Social Streams
 Extracting City Traffic Events from Social Streams Extracting City Traffic Events from Social Streams
Extracting City Traffic Events from Social Streams
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
Potter’S Wheel
Potter’S WheelPotter’S Wheel
Potter’S Wheel
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data era
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Enabling semantic integration
Enabling semantic integration Enabling semantic integration
Enabling semantic integration
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
 

More from Ian Foster

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxIan Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumIan Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsIan Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationIan Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptxIan Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryIan Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon SummaryIan Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperabilityIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasIan Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 

More from Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Open Analytics Environment

  • 1. Towards an Open Analytics Environment Ian Foster Computation Institute Argonne National Lab & University of Chicago
  • 2.
  • 3. The Good Old Days: Astronomy ~1600 30 years ? years 10 years 6 years 2 years
  • 4. Astronomy, from 1600 to 2000 Automation 10 -1  10 8 Hz data capture Community 10 0  10 4 astronomers (10 6 amateur) Computation Data 10 6  10 15 B aggregate 10 -1  10 15 Hz peak Literature 10 1  10 5 pages/year
  • 6. Biomedical Research ~2000 ... atcgaattccaggcgtcacattctcaattcca... MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYT... Protein-Protein Interactions metabolism pathways receptor-ligand 4º structure Polymorphism and Variants genetic variants individual patients epidemiology Physiology Cellular biology Biochemistry Neurobiology Endocrinology etc. >10 6 ESTs Expression patterns Large-scale screens Genetics and Maps Linkage Cytogenetic Clone-based From John Wooley >10 6 >10 9 >10 6 >10 5 >10 9 DNA sequences alignments Proteins sequence 2º structure 3º structure
  • 7. Growth of Sequences and Annotations since 1982 Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges for the Next Decade, CTWatch , August 2006.
  • 8. The Analyst in Denial “ I just need a bigger disk (and workstation)”
  • 9.
  • 10.
  • 12. What Goes In (2) Rules Workflows Dryad MapReduce Parallel programs SQL BPEL Swift SCFL R MatLab Octave
  • 13.
  • 14. What Comes Out Data Data Virtual Data Schema
  • 15. Analysis as (Collaborative) Process Transform Annotate Search Add to Tag Visualize Discover Extend Group Share
  • 17.
  • 18. Towards an Open Analysis Environment: (2) Hardware SiCortex 6K cores, 6 Top/s IBM BG/P 160K cores, 500 Top/s PADS 10-40 Gbit/s
  • 19. PADS: Petascale Active Data Store 500 TB reliable storage (data & metadata) 180 TB, 180 GB/s 17 Top/s analysis Data ingest Dynamic provisioning Parallel analysis Remote access Offload to remote data centers P A D S Diverse users Diverse data sources 1000 TB tape backup
  • 20.
  • 21. Tagging & Social Networking GLOSS : Generalized Labels Over Scientific data Sources
  • 22. XDTM: XML Data Typing & Mapping ./group23 drwxr-xr-x 4 yongzh users 2048 Nov 12 14:15 AA drwxr-xr-x 4 yongzh users 2048 Nov 11 21:13 CH drwxr-xr-x 4 yongzh users 2048 Nov 11 16:32 EC ./group23/AA : drwxr-xr-x 5 yongzh users 2048 Nov 5 12:41 04nov06aa drwxr-xr-x 4 yongzh users 2048 Dec 6 12:24 11nov06aa . /group23/AA/04nov06aa : drwxr-xr-x 2 yongzh users 2048 Nov 5 12:52 ANATOMY drwxr-xr-x 2 yongzh users 49152 Dec 5 11:40 FUNCTIONAL . /group23/AA/04nov06aa/ANATOMY : -rw-r--r-- 1 yongzh users 348 Nov 5 12:29 coplanar.hdr -rw-r--r-- 1 yongzh users 16777216 Nov 5 12:29 coplanar.img . /group23/AA/04nov06aa/FUNCTIONAL : -rw-r--r-- 1 yongzh users 348 Nov 5 12:32 bold1_0001.hdr -rw-r--r-- 1 yongzh users 409600 Nov 5 12:32 bold1_0001.img -rw-r--r-- 1 yongzh users 348 Nov 5 12:32 bold1_0002.hdr -rw-r--r-- 1 yongzh users 409600 Nov 5 12:32 bold1_0002.img -rw-r--r-- 1 yongzh users 496 Nov 15 20:44 bold1_0002.mat -rw-r--r-- 1 yongzh users 348 Nov 5 12:32 bold1_0003.hdr -rw-r--r-- 1 yongzh users 409600 Nov 5 12:32 bold1_0003.img Logical Physical
  • 23. fMRI Type Definitions type Study { Group g[ ]; } type Group { Subject s[ ]; } type Subject { Volume anat; Run run[ ]; } type Run { Volume v[ ]; } type Volume { Image img; Header hdr; } type Image {}; type Header {}; type Warp {}; type Air {}; type AirVec { Air a[ ]; } type NormAnat { Volume anat; Warp aWarp; Volume nHires; }
  • 24. High-Performance Data Analytics Functional MRI Ben Clifford, Mihael Hatigan, Mike Wilde, Yong Zhao
  • 25.
  • 27. Multi-level Scheduling SwiftScript Abstract computation Virtual Data Catalog SwiftScript Compiler Specification Execution Virtual Node(s)‏ Worker Nodes Provenance data Provenance data Provenance collector launcher launcher file1 file2 file3 App F1 App F2 Scheduling Execution Engine (Karajan w/ Swift Runtime)‏ Swift runtime callouts C C C C Status reporting Provisioning Falkon Resource Provisioner Amazon EC2
  • 28.
  • 29.
  • 30. Lag Plot for Data Transfers to Caltech Credit: Kevin Flasch, LIGO
  • 31. SIDGrid: B. Bertenthal et al., U.Chicago, IU, UIC
  • 32. Social Informatics Data Grid (SIDgrid) TeraGrid PADS … SIDgrid Collaborative, multi-modal analysis of cognitive science data Diverse experimental data & metadata Browse data Search Content preview Transcode Download Analyze
  • 34.  
  • 35. A C ommunity I ntegrated M odel for E conomic a nd R esource T rajectories for H umankind ( CIM-EARTH ) Dynamics, foresight, uncertainty, resolution, … Agriculture, transport, taxation, … Data (global, local, …) (Super) computers CIM-EARTH Framework Community process Open code, data
  • 36. Alleviating Poverty in Thailand: Modeling Entrepreneurship Consider only wealth, access to capital Consider also distance to 6 major cities Rob Townsend, Victor Zhorin, et al. Match High Low
  • 38. GeneWays Online Journals Pathways GeneWays Andrey Rzhetsky et al. Screening 250,000 journal articles 2.5M reasoning chains 4M statements
  • 39. Evidence Integration: Genetics & Disease Susceptibility Identify Genes Phenotype 1 Phenotype 2 Phenotype 3 Phenotype 4 Predictive Disease Susceptibility Physiology Metabolism Endocrine Proteome Immune Transcriptome Biomarker Signatures Morphometrics Pharmacokinetics Ethnicity Environment Age Gender Source: Terry Magnuson
  • 40. James Evans, U.Chicago Arabidopsis articles
  • 41.
  • 42.