SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Bridging 
Big 
Data 
and 
Data 
Science 
using 
Scalable 
Workflows 
ILKAY 
ALTINTAS, 
Ph.D. 
alBntas@sdsc.edu 
Director, 
Workflows 
for 
Data 
Science 
(WorDS) 
Center 
of 
Excellence 
San 
Diego 
Supercomputer 
Center, 
UC 
San 
Diego 
WorDS.sdsc.edu
SAN 
DIEGO 
SUPERCOMPUTER 
CENTER 
at 
UC 
San 
Diego 
Providing 
Cyberinfrastructure 
for 
Research 
and 
EducaBon 
• Established 
as 
a 
naBonal 
supercomputer 
resource 
center 
in 
1985 
by 
NSF 
• Aworld 
leader 
in 
HPC, 
data-­‐ 
intensive 
compuBng, 
and 
scienBfic 
data 
management 
• Current 
strategic 
focus 
on 
“Big 
Data” 
1985 
today
Scien&fic 
Workflow 
Automa&on 
Technologies 
Research 
Workflows 
for 
Cloud 
Systems 
Big 
Data 
Applica&ons 
Reproducible 
Science 
Workforce 
Training 
and 
Educa&on 
Development 
and 
Consul&ng 
Services 
Workflows 
for 
Data 
Science 
Center 
Focus 
on 
the 
ques&on, 
not 
the 
technology! 
10+ years of data science R&D 
experience as a Center.
Why 
Data 
Science 
Workflows? 
“You've 
got 
to 
think 
about 
big 
things 
use 
cases 
=> 
purpose 
and 
value 
while 
you're 
doing 
small 
things, 
so 
that 
all 
the 
small 
things 
go 
in 
the 
right 
direcBon.” 
– 
Alvin 
Toffler
So, 
what 
is 
a 
workflow? 
Source: 
hcp://www.fastcodesign.com/1663557/how-­‐a-­‐kitchen-­‐ 
design-­‐could-­‐make-­‐it-­‐easier-­‐to-­‐bond-­‐with-­‐neighbors 
Shop 
Prepare 
Cook 
Store
Let’s 
make 
pasta 
this 
evening! 
Shop 
Prepare 
Cook 
Store 
30 
minutes 
30 
minutes 
15 
minutes 
3 
minutes 
15 
minutes 
3 
minutes
How 
to 
Cook 
Everything 
Fast 
“How 
to 
Cook 
Everything 
Fast 
is 
a 
book 
of 
kitchen 
innovaBons. 
Time 
management— 
the 
essenBal 
principle 
of 
fast 
cooking— 
is 
woven 
into 
revoluBonary 
recipes 
that 
do 
the 
thinking 
for 
you. 
You’ll 
learn 
how 
to 
take 
advantage 
of 
down&me 
to 
prepare 
vegetables 
while 
a 
soup 
simmers 
or 
toast 
croutons 
while 
whisking 
a 
dressing. 
Just 
cook 
as 
you 
read—and 
let 
the 
recipes 
guide 
you 
quickly 
and 
easily 
toward 
a 
delicious 
result.” 
Image 
and 
quote 
source: 
amazon.com
What 
if 
you 
have 
more 
than 
one 
cooks?
… 
… 
… 
MAP 
• Input: 
veggies 
• User 
defined 
func&on(UDF): 
chop 
• Output: 
Chopped 
groups 
of 
each 
kind 
of 
veggie
… 
REDUCE 
… 
• Input: 
chopped 
batches 
for 
each 
veggie 
type 
• User 
defined 
func&on(UDF): 
combine 
based 
on 
veggie 
type 
as 
key 
• Output: 
a 
bowl 
of 
veggies 
per 
veggie 
kind
Thanksgiving 
dinner 
preparaBon: 
more 
planning 
and 
tasks? 
Menu 
Item 
Prepara&on 
Time 
Cooking 
Time 
Cooling 
Time 
Turkey 
30 
minutes 
4 
hours 
15 
minutes 
Veggies 
30 
minutes 
45 
minutes 
None 
Cranberry 
5 
minutes 
30 
minutes 
2 
hours 
Sauce 
Soup 
20 
minutes 
30 
minutes 
None 
Pie 
30 
minutes 
5 
minutes 
1 
day 
• When 
do 
you 
start 
cooking? 
• What 
order 
do 
you 
cook? 
• Can 
you 
cook 
some 
menu 
items 
in 
parallel? 
• Who 
cooks 
what? 
• …
Data 
Science 
Workflows 
-­‐ 
Programmable 
and 
Reproducible 
Scalability 
-­‐ 
• Access 
and 
query 
data 
• Scale 
computaBonal 
analysis 
• Increase 
reuse 
• Save 
Bme, 
energy 
and 
money 
• Formalize 
and 
standardize 
Real-­‐Time 
Hazards 
Management 
wifire.ucsd.edu 
Data-­‐Parallel 
BioinformaBcs 
bioKepler.org 
Scalable 
Automated 
Molecular 
Dynamics 
and 
Drug 
Discovery 
nbcr.ucsd.edu 
kepler-­‐project.org 
WorDS.sdsc.edu
Why 
scalable 
and 
reproducible 
data 
science?
The Big Picture is Supporting the Scientist 
Conceptual SWF 
From 
Executable SWF 
“Napkin 
Drawings” to 
Executable 
Workflows 
Fasta 
File 
Circonspect 
Average 
Genome 
Size 
Combine 
Results 
PHACCS
The Big Picture is Supporting the Data Scientist 
Quality Evaluation & Data 
Conceptual SWF 
Executable SWF 
From 
“Napkin 
Drawings” to 
Executable 
Workflows… 
SBNL workflow 
Local Learner 
Data Quality 
Evaluation 
Local Ensemble 
Learning 
Big Data Partitioning 
Master Learner 
MasterEnsemble 
Learning 
Final BN 
Structure 
Insurance 
and 
Traffic 
Data 
Analy&cs 
using 
Big 
Data 
Bayesian 
Network 
Learning
Kepler is a Scientific Workflow System 
www.kepler-project.org 
• A cross-project collaboration 
… initiated August 2003 
• 2.4 released 04/2013 
• Builds upon the open-source 
Ptolemy II framework 
Ptolemy II: A laboratory for 
investigating design 
KEPLER: A problem-solving 
environment for Scientific 
Workflow 
KEPLER = “Ptolemy II + X” for 
Scientific Workflows
A Toolbox with Many Tools 
Data 
• 
Search, 
database 
access, 
IO 
operaBons, 
streaming 
data 
in 
real-­‐Bme… 
Compute 
• 
Data-­‐parallel 
pacerns, 
external 
execuBon, 
… 
Network 
operaBons 
• 
Provenance 
and 
fault 
tolerance 
Need expertise to identify which tool to use when and how! 
• 
• 
• 
Require computation models to schedule and optimize execution!
So, 
how does this relate to 
data science and big 
data?
Workflows 
integrate 
data 
science 
building 
blocks! 
Toolboxes 
with 
many 
tools 
for: 
• data 
access, 
• analysis, 
• execuBon, 
• fault 
tolerance, 
• provenance 
tracking, 
• reporBng 
• ... 
Business 
Analysis 
Opera&ons 
Research 
Adapted 
from: 
B. 
Tierney, 
2013
Data 
ScienBst 
Skill 
Set 
hcp:// 
datasciencedojo.com/ 
what-­‐are-­‐the-­‐key-­‐skills-­‐ 
of-­‐a-­‐data-­‐scienBst/
Unicorn? 
hcp://www.anlytcs.com/2014/01/data-­‐ 
science-­‐venn-­‐diagram-­‐v20.html
SoluBon: 
Scale 
Your 
Data 
ScienBsts 
Standardize 
the 
data 
science 
process, 
not 
the 
tools! 
Standardized 
processes 
enable 
data 
scien&sts 
to 
communicate 
with 
business 
and 
programming 
partners. 
Also, 
what 
these 
definiBons 
really 
mean 
is 
“computa&onal 
and 
data 
scien&sts”.
Conceptualizing 
a 
ComputaBonal 
Data 
Science 
Workflow
1: 
Start 
with 
the 
Workflow 
As 
a 
Blackbox 
• Treat 
the 
whole 
workflow 
as 
a 
blackbox 
– What 
is 
the 
usecase/ 
applicaBon? 
• What 
is 
the 
science 
quesBon 
this 
workflow 
is 
solving? 
– What 
is 
the 
input 
data? 
– What 
are 
the 
expected 
outcomes? 
Input 
data 
f 
My 
workflow 
Outputs 
• Give 
the 
workflow 
a 
Btle 
based 
on 
iniBal 
assessment!
2: 
ConceptualizaBon 
of 
ScienBfic 
Steps 
Fasta 
File 
Circonspect 
Bake 
Turkey 
Average 
Genome 
Size 
Combine 
Results 
PHACCS 
• ... 
• Cook 
• Chill 
• …. 
Bake 
Pie 
• … 
• Prepare 
• Cook 
• … 
• … 
• Make 
Cranberry 
Sauce 
• Cut 
Veggies 
• Prepare 
Stuffing 
• … 
Make 
Side 
Dishes
3: 
Treat 
Each 
Step 
Like 
a 
Workflow 
-­‐ 
un=l 
you 
reach 
an 
atomic 
func=onal 
step 
-­‐ 
Find 
data 
Access 
data 
Acquire 
data 
Move 
data 
Clean 
data 
Integrate 
data 
Subset 
data 
Pre-­‐process 
data 
Analyze 
data 
Process 
data 
Interpret 
results 
Summarize 
results 
Visualize 
results 
Post-­‐process 
results 
SHOP 
PREPARE 
COOK 
STORE 
Some 
quesBons 
to 
ask: 
• Where 
and 
how 
do 
I 
get 
the 
data? 
• What 
is 
the 
format 
and 
frequency 
of 
the 
data, 
e.g., 
structured, 
textual, 
real-­‐Bme, 
image, 
…? 
• How 
do 
I 
integrate 
or 
subset 
datasets, 
e.g., 
knowledge 
representaBon,… 
? 
• How 
do 
I 
analyze 
the 
data 
and 
what 
is 
the 
analysis 
funcBon? 
• What 
are 
the 
parameters 
to 
customize 
each 
step? 
• What 
are 
the 
compuBng 
needs 
to 
schedule 
and 
run 
each 
step? 
• How 
do 
I 
make 
sure 
the 
results 
are 
useful 
for 
the 
next 
step 
or 
as 
scienBfic 
products, 
e.g., 
standards 
compliance, 
reporBng, 
…? 
configurable 
automated 
analysis
4: 
Start 
Building 
Each 
Step 
Including 
the 
AlternaBves 
• AlternaBve 
tools 
• MulBple 
modes 
of 
scalability 
• Support 
for 
each 
step 
of 
the 
development 
and 
producBon 
process 
• Different 
reporBng 
needs 
for 
exploraBon 
and 
producBon 
stages 
Build 
Explore 
Scale 
Report
Running on Heterogeneous Computing 
Resources 
- Execution of models on where they run most efficiently - 
Different 
models 
have 
different 
compu&ng 
architecture 
needs! 
e.g., 
memory-­‐intensive, 
compute-­‐intensive, 
I/O-­‐intensive 
Gordon 
Trestles 
Local: 
NBCR 
Cluster 
Resources 
NSF/DOE: 
TeraScale 
Resources 
(XSEDE) 
(Gordon) 
(Trestles) 
(Lonestar) 
(Stampede) 
Private 
Cluster: 
User 
Owned 
Resources
5: 
Save 
and 
Share 
Reports 
and 
Final 
Products 
with 
your 
Team 
• Data 
scienBst 
is 
in 
the 
middle 
bridging 
the 
gap 
between 
business 
and 
development 
à 
So, 
Data 
ScienBsts 
defines 
the 
business 
value 
and 
the 
steps 
to 
achieve 
the 
results 
as 
a 
workflow 
• Developers/computer 
scienBsts 
use 
their 
favorite 
tools 
to 
implement 
the 
methods 
in 
the 
workflow 
• The 
process 
is 
kept 
sharable, 
standardized, 
scalable 
and 
accountable
WorDS 
– 
Simple 
and 
Scalable 
Big 
Data 
SoluBons 
using 
Workflows 
Focus 
on 
the 
use 
case, 
not 
the 
technology! 
• Develop 
new 
big 
data 
science 
technologies 
and 
infrastructure 
• Develop 
data 
science 
workflow 
applica&ons 
through 
combinaBon 
of 
tools, 
technologies 
and 
best 
prac&ces 
• Hands 
on 
consul&ng 
on 
workflow 
technologies 
for 
big 
data 
and 
cloud 
systems, 
e.g., 
MapReduce, 
Hadoop, 
Yarn, 
Cascading 
• Technology 
briefings 
and 
applied 
classes 
on 
end-­‐to-­‐end 
support 
for 
data 
science
Using Big Data Computing in Bioinformatics 
- Improving Programmability, Scalability and Reproducibility-biokepler. 
org
www.bioKepler.org 
Gateways 
and 
other 
user 
environments 
Kepler 
and 
Provenance 
Framework 
bioKepler 
BioLinux 
Galaxy 
Clovr 
Hadoop 
… 
CLOUD 
and 
OTHER 
COMPUTING 
RESOURCES 
e.g., 
SGE, 
Amazon, 
FutureGrid, 
XSEDE 
A coordinated ecosystem of biological and 
technological packages for bioinformatics!
Status 
of 
bioActors 
500+ 
bioActors 
are 
listed 
under 
current 
bioKepler 
release, 
~40 
of 
them 
are 
parallelized.
Using Workflows and Cyberinfrastructure 
for Wildfire Resilience 
- A Scalable Data-Driven Monitoring and Dynamic Prediction Approach - 
wifire.ucsd.edu
Fire 
is 
Part 
of 
the 
Natural 
Ecology 
… 
but 
requires 
Monitoring, 
PredicBon 
and 
Resilience 
• Wildfires 
are 
criBcal 
for 
ecology, 
but 
volaBle 
• Fuel 
load 
is 
high 
due 
to 
fire 
suppression 
over 
the 
last 
century 
• Changes 
in 
rainfall, 
wind, 
seasons, 
and 
thus 
wildfires, 
potenBally 
induced 
by 
climate 
change 
• Becer 
prevenBon, 
predicBon 
and 
maintenance 
of 
wildfires 
is 
needed 
Photo 
of 
Harris 
Fire 
(2007) 
by 
former 
Fire 
Captain 
Bill 
Clayton 
Disaster 
management 
of 
(ongoing) 
wildfires 
heavily 
relies 
on 
understanding 
their 
DirecBon 
and 
Rate 
of 
Spread 
(RoS).
Fire 
Data 
Today 
Decision making for wildfire fighting 
and disaster management based 
on heterogeneous data: 
Photograph by Mark Thiessen 
Satellite data 
Wildfire perimeter 
Wind, 
Vegetation 
Terrain.
What 
is 
lacking 
in 
disaster 
management 
today 
is… 
a 
system 
integraBon 
of 
real-­‐Bme 
sensor 
networks, 
satellite 
imagery, 
near-­‐real 
Bme 
data 
management 
tools, 
wildfire 
simulaBon 
tools, 
and 
connecBvity 
to 
emergency 
command 
centers 
…. 
before, 
during 
and 
a{er 
a 
.fi 
restorm.
A 
Scalable 
Data-­‐Driven 
Monitoring, 
Dynamic 
PredicBon 
and 
Resilience 
Cyberinfrastructure 
for 
Wildfires 
(WIFIRE) 
Development 
of: 
“cyberinfrastructure” 
for 
“analysis 
of 
large 
dimensional 
heterogeneous 
real-­‐Bme 
sensed 
data” 
for 
fire 
resilience 
before, 
during 
and 
aAer 
a 
wildfire
Data 
to 
Modeling 
in 
WIFIRE 
Real-­‐&me 
remote 
data 
–> 
Modeling, 
data 
assimilaBon 
and 
dynamic 
wildfire 
behavior 
predicBon 
Sensors:
WIFIRE 
System 
IntegraBon 
System 
IntegraBon 
of 
sensor 
data, 
data 
assimilaBon, 
dynamic 
models 
and 
fire 
direcBon 
and 
RoS 
predicBons 
(computaBons) 
is 
based 
on 
ScienBfic 
and 
Engineering 
Workflows 
(Kepler) 
• Visual 
programming 
• Scalable 
parallel 
execuBon 
• Standardized 
data 
interfaces 
• Reuse 
and 
reproducibility
Training 
and 
ConsulBng 
Services 
in 
the 
WorDS 
Center 
• Ongoing 
programs 
for 
workflow 
bootcamps 
and 
hackathons 
• Technology 
briefings 
for 
industrial 
partners 
• Industry 
labs 
for 
undergraduate 
student 
researchers 
• ConsulBng 
projects 
on 
workflow 
technologies
To Sum Up 
• Workflows and provenance are well-adopted in scientific 
data science infrastructures today, with success 
• WorDS Center applies these concepts to advanced 
dynamic data-driven analytics applications 
• One size does not fit all! 
• Many diverse environments and requirements 
• Need to orchestrate at a higher level 
• Higher level programming components for each domain 
• Lots of future challenges on 
• Optimized execution on heterogeneous platforms 
• Increasing reuse within and across application domains 
• Querying and integration of workflow provenance data
QuesBons? 
WorDS 
Director: 
Ilkay 
AlBntas, 
Ph.D. 
Email: 
alBntas@sdsc.edu

Más contenido relacionado

La actualidad más candente

Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
Love for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 versionLove for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 version
Lourdes Verdes-Montenegro
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
Edureka!
 

La actualidad más candente (19)

H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Big Data Rampage
Big Data RampageBig Data Rampage
Big Data Rampage
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
Hadoop
HadoopHadoop
Hadoop
 
Love for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 versionLove for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 version
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
 
Big Data, Baby Steps
Big Data, Baby StepsBig Data, Baby Steps
Big Data, Baby Steps
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Machine learning in real-time - the next frontier
Machine learning in real-time - the next frontierMachine learning in real-time - the next frontier
Machine learning in real-time - the next frontier
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 

Destacado

User Experience: Research, Design, Process, and Workflow
User Experience: Research, Design, Process, and WorkflowUser Experience: Research, Design, Process, and Workflow
User Experience: Research, Design, Process, and Workflow
sollitaire
 
User Experience (UX) Design Process
User Experience (UX) Design ProcessUser Experience (UX) Design Process
User Experience (UX) Design Process
Jonathan Lupo
 
How to Integrate UX and Agile
How to Integrate UX and AgileHow to Integrate UX and Agile
How to Integrate UX and Agile
UserZoom
 
Best Practice For UX Deliverables - Eventhandler, London, 05 March 2014
Best Practice For UX Deliverables - Eventhandler, London, 05 March 2014Best Practice For UX Deliverables - Eventhandler, London, 05 March 2014
Best Practice For UX Deliverables - Eventhandler, London, 05 March 2014
Anna Dahlström
 

Destacado (17)

User Experience: Research, Design, Process, and Workflow
User Experience: Research, Design, Process, and WorkflowUser Experience: Research, Design, Process, and Workflow
User Experience: Research, Design, Process, and Workflow
 
Fast track Incubation of skill sets for big data and game development and web...
Fast track Incubation of skill sets for big data and game development and web...Fast track Incubation of skill sets for big data and game development and web...
Fast track Incubation of skill sets for big data and game development and web...
 
Ux baby steps
Ux baby stepsUx baby steps
Ux baby steps
 
Ux design workflow
Ux design workflowUx design workflow
Ux design workflow
 
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
 It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201... It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
 
From research life cycle to networks: The role of the library
From research life cycle to networks: The role of the libraryFrom research life cycle to networks: The role of the library
From research life cycle to networks: The role of the library
 
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
 
UX Design Process
UX Design ProcessUX Design Process
UX Design Process
 
User Experience (UX) Design Process
User Experience (UX) Design ProcessUser Experience (UX) Design Process
User Experience (UX) Design Process
 
An introduction to UX in Scrum
An introduction to UX in ScrumAn introduction to UX in Scrum
An introduction to UX in Scrum
 
Easy UX Process Steps Must follow by every UX Designer
Easy UX Process Steps Must follow by every UX Designer Easy UX Process Steps Must follow by every UX Designer
Easy UX Process Steps Must follow by every UX Designer
 
UX Experience Design: Processes and Strategy
UX Experience Design: Processes and StrategyUX Experience Design: Processes and Strategy
UX Experience Design: Processes and Strategy
 
Design and UX in an Agile Process
Design and UX in an Agile ProcessDesign and UX in an Agile Process
Design and UX in an Agile Process
 
How to Integrate UX and Agile
How to Integrate UX and AgileHow to Integrate UX and Agile
How to Integrate UX and Agile
 
Best Practice For UX Deliverables - Eventhandler, London, 05 March 2014
Best Practice For UX Deliverables - Eventhandler, London, 05 March 2014Best Practice For UX Deliverables - Eventhandler, London, 05 March 2014
Best Practice For UX Deliverables - Eventhandler, London, 05 March 2014
 
Designing with Lean UX : Rapid Product Design [UX Lisbon 2014]
Designing with Lean UX : Rapid Product Design [UX Lisbon 2014]Designing with Lean UX : Rapid Product Design [UX Lisbon 2014]
Designing with Lean UX : Rapid Product Design [UX Lisbon 2014]
 
Simple Steps to UX/UI Web Design
Simple Steps to UX/UI Web DesignSimple Steps to UX/UI Web Design
Simple Steps to UX/UI Web Design
 

Similar a Bridging Big Data and Data Science Using Scalable Workflows

Similar a Bridging Big Data and Data Science Using Scalable Workflows (20)

Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraWorkflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 

Último

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Bridging Big Data and Data Science Using Scalable Workflows

  • 1. Bridging Big Data and Data Science using Scalable Workflows ILKAY ALTINTAS, Ph.D. alBntas@sdsc.edu Director, Workflows for Data Science (WorDS) Center of Excellence San Diego Supercomputer Center, UC San Diego WorDS.sdsc.edu
  • 2. SAN DIEGO SUPERCOMPUTER CENTER at UC San Diego Providing Cyberinfrastructure for Research and EducaBon • Established as a naBonal supercomputer resource center in 1985 by NSF • Aworld leader in HPC, data-­‐ intensive compuBng, and scienBfic data management • Current strategic focus on “Big Data” 1985 today
  • 3. Scien&fic Workflow Automa&on Technologies Research Workflows for Cloud Systems Big Data Applica&ons Reproducible Science Workforce Training and Educa&on Development and Consul&ng Services Workflows for Data Science Center Focus on the ques&on, not the technology! 10+ years of data science R&D experience as a Center.
  • 4. Why Data Science Workflows? “You've got to think about big things use cases => purpose and value while you're doing small things, so that all the small things go in the right direcBon.” – Alvin Toffler
  • 5. So, what is a workflow? Source: hcp://www.fastcodesign.com/1663557/how-­‐a-­‐kitchen-­‐ design-­‐could-­‐make-­‐it-­‐easier-­‐to-­‐bond-­‐with-­‐neighbors Shop Prepare Cook Store
  • 6. Let’s make pasta this evening! Shop Prepare Cook Store 30 minutes 30 minutes 15 minutes 3 minutes 15 minutes 3 minutes
  • 7. How to Cook Everything Fast “How to Cook Everything Fast is a book of kitchen innovaBons. Time management— the essenBal principle of fast cooking— is woven into revoluBonary recipes that do the thinking for you. You’ll learn how to take advantage of down&me to prepare vegetables while a soup simmers or toast croutons while whisking a dressing. Just cook as you read—and let the recipes guide you quickly and easily toward a delicious result.” Image and quote source: amazon.com
  • 8. What if you have more than one cooks?
  • 9. … … … MAP • Input: veggies • User defined func&on(UDF): chop • Output: Chopped groups of each kind of veggie
  • 10. … REDUCE … • Input: chopped batches for each veggie type • User defined func&on(UDF): combine based on veggie type as key • Output: a bowl of veggies per veggie kind
  • 11. Thanksgiving dinner preparaBon: more planning and tasks? Menu Item Prepara&on Time Cooking Time Cooling Time Turkey 30 minutes 4 hours 15 minutes Veggies 30 minutes 45 minutes None Cranberry 5 minutes 30 minutes 2 hours Sauce Soup 20 minutes 30 minutes None Pie 30 minutes 5 minutes 1 day • When do you start cooking? • What order do you cook? • Can you cook some menu items in parallel? • Who cooks what? • …
  • 12. Data Science Workflows -­‐ Programmable and Reproducible Scalability -­‐ • Access and query data • Scale computaBonal analysis • Increase reuse • Save Bme, energy and money • Formalize and standardize Real-­‐Time Hazards Management wifire.ucsd.edu Data-­‐Parallel BioinformaBcs bioKepler.org Scalable Automated Molecular Dynamics and Drug Discovery nbcr.ucsd.edu kepler-­‐project.org WorDS.sdsc.edu
  • 13. Why scalable and reproducible data science?
  • 14. The Big Picture is Supporting the Scientist Conceptual SWF From Executable SWF “Napkin Drawings” to Executable Workflows Fasta File Circonspect Average Genome Size Combine Results PHACCS
  • 15. The Big Picture is Supporting the Data Scientist Quality Evaluation & Data Conceptual SWF Executable SWF From “Napkin Drawings” to Executable Workflows… SBNL workflow Local Learner Data Quality Evaluation Local Ensemble Learning Big Data Partitioning Master Learner MasterEnsemble Learning Final BN Structure Insurance and Traffic Data Analy&cs using Big Data Bayesian Network Learning
  • 16. Kepler is a Scientific Workflow System www.kepler-project.org • A cross-project collaboration … initiated August 2003 • 2.4 released 04/2013 • Builds upon the open-source Ptolemy II framework Ptolemy II: A laboratory for investigating design KEPLER: A problem-solving environment for Scientific Workflow KEPLER = “Ptolemy II + X” for Scientific Workflows
  • 17. A Toolbox with Many Tools Data • Search, database access, IO operaBons, streaming data in real-­‐Bme… Compute • Data-­‐parallel pacerns, external execuBon, … Network operaBons • Provenance and fault tolerance Need expertise to identify which tool to use when and how! • • • Require computation models to schedule and optimize execution!
  • 18. So, how does this relate to data science and big data?
  • 19. Workflows integrate data science building blocks! Toolboxes with many tools for: • data access, • analysis, • execuBon, • fault tolerance, • provenance tracking, • reporBng • ... Business Analysis Opera&ons Research Adapted from: B. Tierney, 2013
  • 20. Data ScienBst Skill Set hcp:// datasciencedojo.com/ what-­‐are-­‐the-­‐key-­‐skills-­‐ of-­‐a-­‐data-­‐scienBst/
  • 22. SoluBon: Scale Your Data ScienBsts Standardize the data science process, not the tools! Standardized processes enable data scien&sts to communicate with business and programming partners. Also, what these definiBons really mean is “computa&onal and data scien&sts”.
  • 23. Conceptualizing a ComputaBonal Data Science Workflow
  • 24. 1: Start with the Workflow As a Blackbox • Treat the whole workflow as a blackbox – What is the usecase/ applicaBon? • What is the science quesBon this workflow is solving? – What is the input data? – What are the expected outcomes? Input data f My workflow Outputs • Give the workflow a Btle based on iniBal assessment!
  • 25. 2: ConceptualizaBon of ScienBfic Steps Fasta File Circonspect Bake Turkey Average Genome Size Combine Results PHACCS • ... • Cook • Chill • …. Bake Pie • … • Prepare • Cook • … • … • Make Cranberry Sauce • Cut Veggies • Prepare Stuffing • … Make Side Dishes
  • 26. 3: Treat Each Step Like a Workflow -­‐ un=l you reach an atomic func=onal step -­‐ Find data Access data Acquire data Move data Clean data Integrate data Subset data Pre-­‐process data Analyze data Process data Interpret results Summarize results Visualize results Post-­‐process results SHOP PREPARE COOK STORE Some quesBons to ask: • Where and how do I get the data? • What is the format and frequency of the data, e.g., structured, textual, real-­‐Bme, image, …? • How do I integrate or subset datasets, e.g., knowledge representaBon,… ? • How do I analyze the data and what is the analysis funcBon? • What are the parameters to customize each step? • What are the compuBng needs to schedule and run each step? • How do I make sure the results are useful for the next step or as scienBfic products, e.g., standards compliance, reporBng, …? configurable automated analysis
  • 27. 4: Start Building Each Step Including the AlternaBves • AlternaBve tools • MulBple modes of scalability • Support for each step of the development and producBon process • Different reporBng needs for exploraBon and producBon stages Build Explore Scale Report
  • 28. Running on Heterogeneous Computing Resources - Execution of models on where they run most efficiently - Different models have different compu&ng architecture needs! e.g., memory-­‐intensive, compute-­‐intensive, I/O-­‐intensive Gordon Trestles Local: NBCR Cluster Resources NSF/DOE: TeraScale Resources (XSEDE) (Gordon) (Trestles) (Lonestar) (Stampede) Private Cluster: User Owned Resources
  • 29. 5: Save and Share Reports and Final Products with your Team • Data scienBst is in the middle bridging the gap between business and development à So, Data ScienBsts defines the business value and the steps to achieve the results as a workflow • Developers/computer scienBsts use their favorite tools to implement the methods in the workflow • The process is kept sharable, standardized, scalable and accountable
  • 30. WorDS – Simple and Scalable Big Data SoluBons using Workflows Focus on the use case, not the technology! • Develop new big data science technologies and infrastructure • Develop data science workflow applica&ons through combinaBon of tools, technologies and best prac&ces • Hands on consul&ng on workflow technologies for big data and cloud systems, e.g., MapReduce, Hadoop, Yarn, Cascading • Technology briefings and applied classes on end-­‐to-­‐end support for data science
  • 31. Using Big Data Computing in Bioinformatics - Improving Programmability, Scalability and Reproducibility-biokepler. org
  • 32. www.bioKepler.org Gateways and other user environments Kepler and Provenance Framework bioKepler BioLinux Galaxy Clovr Hadoop … CLOUD and OTHER COMPUTING RESOURCES e.g., SGE, Amazon, FutureGrid, XSEDE A coordinated ecosystem of biological and technological packages for bioinformatics!
  • 33. Status of bioActors 500+ bioActors are listed under current bioKepler release, ~40 of them are parallelized.
  • 34. Using Workflows and Cyberinfrastructure for Wildfire Resilience - A Scalable Data-Driven Monitoring and Dynamic Prediction Approach - wifire.ucsd.edu
  • 35. Fire is Part of the Natural Ecology … but requires Monitoring, PredicBon and Resilience • Wildfires are criBcal for ecology, but volaBle • Fuel load is high due to fire suppression over the last century • Changes in rainfall, wind, seasons, and thus wildfires, potenBally induced by climate change • Becer prevenBon, predicBon and maintenance of wildfires is needed Photo of Harris Fire (2007) by former Fire Captain Bill Clayton Disaster management of (ongoing) wildfires heavily relies on understanding their DirecBon and Rate of Spread (RoS).
  • 36. Fire Data Today Decision making for wildfire fighting and disaster management based on heterogeneous data: Photograph by Mark Thiessen Satellite data Wildfire perimeter Wind, Vegetation Terrain.
  • 37. What is lacking in disaster management today is… a system integraBon of real-­‐Bme sensor networks, satellite imagery, near-­‐real Bme data management tools, wildfire simulaBon tools, and connecBvity to emergency command centers …. before, during and a{er a .fi restorm.
  • 38. A Scalable Data-­‐Driven Monitoring, Dynamic PredicBon and Resilience Cyberinfrastructure for Wildfires (WIFIRE) Development of: “cyberinfrastructure” for “analysis of large dimensional heterogeneous real-­‐Bme sensed data” for fire resilience before, during and aAer a wildfire
  • 39. Data to Modeling in WIFIRE Real-­‐&me remote data –> Modeling, data assimilaBon and dynamic wildfire behavior predicBon Sensors:
  • 40. WIFIRE System IntegraBon System IntegraBon of sensor data, data assimilaBon, dynamic models and fire direcBon and RoS predicBons (computaBons) is based on ScienBfic and Engineering Workflows (Kepler) • Visual programming • Scalable parallel execuBon • Standardized data interfaces • Reuse and reproducibility
  • 41. Training and ConsulBng Services in the WorDS Center • Ongoing programs for workflow bootcamps and hackathons • Technology briefings for industrial partners • Industry labs for undergraduate student researchers • ConsulBng projects on workflow technologies
  • 42. To Sum Up • Workflows and provenance are well-adopted in scientific data science infrastructures today, with success • WorDS Center applies these concepts to advanced dynamic data-driven analytics applications • One size does not fit all! • Many diverse environments and requirements • Need to orchestrate at a higher level • Higher level programming components for each domain • Lots of future challenges on • Optimized execution on heterogeneous platforms • Increasing reuse within and across application domains • Querying and integration of workflow provenance data
  • 43. QuesBons? WorDS Director: Ilkay AlBntas, Ph.D. Email: alBntas@sdsc.edu