SlideShare una empresa de Scribd logo
1 de 86
Scientific Workflows for Big Data
Prof. Shiyong Lu
Big Data Research Laboratory
Department of Computer Science
Wayne State University
shiyong@wayne.edu
Today’s data-intensive science
Looking for needle
in haystack

Looking into
haystack

Jim Gray: Turing Award laureate
Big Data Challenges
Looking for needle
in haystack

For Big Data, data
management and
movement is a frequent
challenge
…between facilities, Looking needle in
archives, researchers… haystack
Many files, large data
volumes
With security, reliability,
performance…

Ian Foster: Father of Grid Computing
Big Data Challenges
Looking for needle
in haystack

Capture

Curation

Looking needle in
haystack
Storage
Search
Sharing

Analysis

Visualization
Big Data Science

Large Hardron Collider (LHC))

15 PB/year
173 TB/day
500 MB/sec

Higgs discovery is “only
possible because of the
extraordinary
achievements of … grid
computing”
—Rolf Heuer, CERN DG
Data flows at Argonne National Lab

Data management challenges
External
Argonne data
sources
flows in
163
9
9
TB/day
Advanced Photon Source
(estimates)
Argonne
Leadership
Computing
Facility

143
100

Shortterm
storage

100

150

Credit: Ian Foster
Data
analysis

10
50

Longterm
storage
Big Data demands new CS research

For example, existing clustering algorithms are typically cubic in N, and
when N is too big, they do not work! - Jim Gray
What is Big Data?
•Definition of Big Data:
“…refers to large, diverse, complex, longitudinal, and/or
distributed data sets generated from instruments, sensors,
Internet transactions, email, video, click streams, and/or
all other digital sources available today and in the future.”
from nsf.gov website
Big Data Challenges
•Challenges of Big Data:
“national big data challenges, which include advances in core
techniques and technologies; big data infrastructure projects in
various science, biomedical research, health and engineering
communities; education and workforce development; and a
comprehensive integrative program to support collaborations of
multi-disciplinary teams and communities to make advances in the
complex grand challenge science, biomedical research, and
engineering problems of a computational- and data-intensive world.”

from nsf.gov website
Big Data demands big workflows

Reminiscent of
And thousands of parallel executions

Managing big workflows and large-scale
parallel execution is a big CS challenge !
Outline

1

Introduction

2

VIEW: A Prototypical SWFMS

3

A Scientific Workflow Composition Model

4

A Collectional Data Model

5

Conclusions and Future Work
Introduction
 Data Intensive Science
 From computation intensive to data intensive.
 A new research cycle – from data capture and data
curation to data analysis and data visualization.
 “In the future, the rapidity with which any given
discipline advances is likely to depend on how well
the community acquires the necessary expertise in
database, workflow management, visualization,
and cloud computing technologies.” (“Beyond
the Data Deluge”, Science, Vol. 323. no. 5919, pp.
1297 – 1298, 2009.)
Introduction
 Scientific Workflow
 A formal specification of a scientific
process.
 Represents, streamlines, and
automates the steps from dataset
selection and integration,
computation and analysis, to final
data product presentation and
visualization.
 Applications: Bioinformatics,
Oceanography, Neuroinformatics,
Astronomy, etc.
Introduction
 Scientific Workflow Management System
(SWFMS)
 Supports the specification, modification, execution,
failure handling, and monitoring of a scientific
workflow.
 Existing SWFMSs:
•
•
•
•

Taverna,
Kepler,
Pegasus,
VisTrails,

• VIEW,
• …
Our VIEW System
Our VIEW System

 Enables scientist to design workflows
Our VIEW System

 Enables scientist to design workflows
 Provides runtime system to execute workflow
Our VIEW System

 Enables scientist to design workflows
 Provides runtime system to execute workflow
 on dedicated VIEW server
Our VIEW System

 Enables scientist to design workflows
 Provides runtime system to execute workflow
 on dedicated VIEW server
 in Cloud computing environment
Our VIEW System

 Enables scientist to design workflows
 Provides runtime system to execute workflow
 on dedicated VIEW server
 in Cloud computing environment

 Supports efficient collection, storage,
querying, and visualization of workflow
provenance
Our VIEW System

 Enables scientist to design workflows
 Provides runtime system to execute workflow
 on dedicated VIEW server
 in Cloud computing environment

 Supports efficient collection, storage,
querying, and visualization of workflow
provenance
 Is currently used in several bioinformatics
applications, including genomic recombination
and gene conversion data analysis
An Example Workflow in VIEW
 Example workflows in
An Example Workflow in VIEW
VIEW 1-2-3

Step 1: Drag and drop inputs and outputs, and computational
VIEW 1-2-3

Step 2: Link them into a scientific workflow
VIEW 1-2-3

Step 3: Click the run button, you get the result!
Kids Play VIEW
An Example Workflow in VIEW
 FiberFlow
 Transforms the large-scale neuroimaging data to knowledge through crosssubject, cross-modality computation, ultimately leading to high clinical
intelligence in neural diseases.
VIEW: A Prototypical SWFMS
 Minimum complexity for users, but massive
techniques in the backstage.
 To provide a clear and simple abstraction for manipulating
and coordinating resources

 Service-oriented architecture.
 Intuitive, user-friendly GUI
A Reference Architecture for SWFMSs
Service-oriented architecture of VIEW
A Reference Architecture for SWFMSs
Other advantages of

:
A Reference Architecture for SWFMSs
Other advantages of

:

 VIEW workflows can be executed in other
systems (specifications are not tied to a particular
SWFMS)
 Use of open standards (Web Services, XML)
promotes collaboration, interoperability and
extensibility of the system
 Workflow and data models implemented in VIEW
are specifically geared towards heavy scientific
data
A Reference Architecture for SWFMSs
VIEW: A Prototypical SWFMS

A typical scientific workflow execution diagram.
Workflow Engine
Workflow Engine is the heart of the
system.
 Workflow Orchestration.
 Workflow Execution.
 Coordination of other subsystems.

Workflow Engine in VIEW.
 Dataflow based.
 Pure workflow composition.
 Workflow constructs.
SWL
 Example of our proposed scientific workflow
specification language (SWL).
Primitive Workflow Specification
 Example SWL specification of a primitive
workflow.
Workflow Execution
Workflow Execution
 Primitive workflow
 Unary construct based workflow
 Graph based workflow
• A workflow graph is a composition of workflows by
binary constructs.
• Optimistic scheduling.
Workflow Database Schema
Data Product Manager
Data Product Manager





Solid data model.
Scalable data storage.
Convenient data access.
Data Independence.

Data Product Manager is based on the
collectional data model.
DPM Architecture
 Architecture of the Data Product Manager.
Data Product Manager
Main
Server

Master

Data Access Layer
Node
Database

Node
Database

Node
Database

Data Mapping Layer

Data Set 1

Relational
Databases

File
Repositorys

Data Set 2

Relational
Databases

File
Repositorys

Data Storage Layer
DPL
 Example of the XML description of a
collectional data product.
Data Storage
 VIEW supports two ways of storage:
 A collection can be stored in a table containing a
set of its key/value pairs, whose values are
references to existing collections.
 A collection can be expanded and stored in two
tables.
• The Group By operator.
• The Compress operator.
Data Typing
A Data Product
 a Collection
 or a List
 or an Empty.

The List type
 Introduced in the workflow engine.
 Each element is a data product.
 Heterogeneous.
Collectional Data Querying
Operators are implemented in primitive
workflows.





Arithmetic operators.
Boolean operators.
Collectional operators.
List operators.

Queries are implemented in workflow
compositions.
Example
 Given a table Reference < Student, Company,
GradTime >, Find the total number of
students offered in each company and each
graduation year; Sort the result in descending
GradTime and ascending Company order.
 SQL query.
 SELECT Company, GradTime, COUNT(DISTINCT Student)
AS NumberOfJob
FROM Reference
GROUP BY Company, GradTime
ORDER BY GradTime DESC, Company ASC;
Example of Query Workflow
Query Workflow.
Key Requirements for Workflow Modeling

R1: Programming-in-the-large.
R2: Dataflow programming model.
R3: Composable dataflow constructs.
R4: Workflow encapsulation and
hierarchical composition.
R5: Single-assignment property.
R6: Physical and logical data models.
R7: Exception handling.
A Scientific Workflow Model
Workflows are the basic and the only
operands for workflow composition.

M

i1

ii1 W1 o1
k

i1

ii1 W2 o1
k

o1 o1

W3
Task components (e.g. Web services)
are constructed to primitive workflows
(a.k.a. tasks) which are the basic
building blocks of scientific workflows.
A Scientific Workflow Model
A workflow construct is a mapping
from a set of workflows to a workflow.
 Unary workflow constructs
 Binary workflow constructs
 …

A construct C takes a set of workflows W1, ...., Wn as input,
and composes them into Wc as the output workflow.
A Scientific Workflow Model
 Our proposed scientific workflow model
consists of the following two layers:
 The logical layer contains the workflow interface that
models the input ports and output ports of a workflow.
 The physical layer contains the workflow body that models
the physical implementation of the workflow.
• Primitive workflows.
• Graph-based workflows.
• Unary-construct-based workflows.
Unary Workflow Constructs

Dataflow-based Unary Workflow Constructs
The Map Construct
 The Map construct enables the parallel
processing of a collection of data products
based on a workflow that can only process a
single data product.
 Example:
[[1,2],[3,6],[4,7]]

[1,2]

ii1
k

W1 o1
W2

o1

W1

o1

2

[3,6]

M

i1

i1
ik
i1
k

W1

o1

18

[4,7]

i1
ik

W1

o1

28
The Reduce Construct
 The Reduce construct enables the aggregation
of a list of data products to a single data
product based on a workflow that aggregates
a limited (two or more) number of input data
products.
 Example:
R

i1
0
[3,5,9]

i2

i1 Add o
1
i2
k
W3

0
3

o1

i1 Addo1
3
i2
5

i1 Add o1
i2
8
9

i1 Add o1
17
i2
The Tree Construct
 The Tree construct
 Enables parallel aggregation of a collection of data products.
 Aggregates a collection pairwisely as a binary tree until one
single aggregated product is generated.

 The Tree construct can be applied on
associative workflows.
 Example:
T

[0,3,5,9]

i1

i1 Add o
1
i2
k
W4

o1

0
3

i1 Addo1 3
i2

5
9

i1 Addo1
i2
14

i1 Add o1
i2
17
The Conditional Construct
 The Conditional construct enables the
conditional execution of a workflow based on a
condition on one of the inputs.
 Example:
[2,3]

2

p=(PI 1 < PI 2 ) C
i1 p i1
o1 o1 p=true [2,3] i1
o
iProjection
k
Projection 1

i2

2

i2

W4

[2,3]

1

p=(PI 1 >= PI 2 ) C
i1 p i1
o1 o1 p=false
Projection
ik

i2

i2

W4

i2

Fail

i1
2

Projection

i2

3
The Loop Construct
 The Loop construct enables cyclic executions
of a workflow.
 The output of the workflow will be repetitively
returned (fed back) to a specified input port
until the predicate evaluates to true.
 Example:

p=(PI 1 >100) L

0
1

i1 i1
i2

i2

ik Add

o1 o1
p

0
1

i1

o1 p=false
1

Add

i2
i1
1

o1 p=false
2

Add

i2
...

1

101
Add

i2

p=true
The Curry Construct
 The Curry construct allows users to fix one of the input
ports with a specified argument and thus reduce the
number of input ports.
 By applying multiple Curry constructs, a workflow that
takes multiple arguments can be translated into a chain
of workflows each with a single argument.
 Example:
U

4

1

i1

i1 Add o
1
i2
k
W8

o1

1

4

i1 Add o
1
i2
k

5
Workflow Composition
 Example of the composition of Map and Map
constructs.
 A Workflow that increase all the numbers in a nested list
by 1.
1
i1
o

M M
1

i1
i2

[[1,2,3],[4,5,6]]

i1
o1
k
ii2 Add
(a) W9

o1

1
1
2
1
3
1
4
1
5
1
6

1
k
ii2 Add
i1
o
ik Add 1
i2
i1
o
ik Add 1
i2
i1
o1
k
ii2 Add
i1
o
ik Add 1
i2

i1
o
ik Add 1
i2

2
3
4

5
6
7
Workflow Composition
 Example of the composition of Map and Reduce
constructs.
 A workflow for parallel summation of each row in a
matrix .
0
o1 o1 1

o1
i1
Addition
i2
k
2

o1
i1
Addition
ik
i2

0
4

o1
i1
Addition
i2
k
5

o1
i1
Addition
ik
i2

M R

0

i1 i1
i2 i2

ik Add

[[1,2,3],[4,5,6]] W11

3

6

o1
i1
Addition 6
ii2
k
o1
i1
Addition 15
ii2
k
Workflow Composition
 Example of complicated workflow composition.
 A workflow to calculate the greatest common divisor.
L
p=(PI(2)==0)
i1
i1

i1 Split o1
o2

G2W

o
i1
iModulus 1
k
i2

ii1
o o
kMerge 1 1
i2
W13

o1

W14
G2W
i1
i2

o1
i1
Merge
i2

i1

M
o1
i1W14 o1
W15
W17

i1
1

M U
o1 o1
i1
iikProjection
2

W16

o1
A Collectional Data Model
 A collectional data model
 Support collection oriented datasets.
• Scientists often work with collection oriented datasets,
such as arrays, lists, tables or file collections.
• A collection-oriented data model enables data
parallelism in scientific workflows.
 Support nested data structures.
• Scientific data is often hierarchically organized.
• Scientific workflow tasks often produce collections of
data products, and the execution of a workflow
composed from such tasks can create increasingly
nested data collections.
 Provide well-defined operators and their arbitrary
compositions to manipulate and query scientific data
collections.
A Collectional Data Model
 A relation is a pair < R, r > where R is a
schema of the relation and r is an instance of
that schema.
 A relation schema can be defined as an
unordered tuple < c1 : d1, c2 : d2, …, cn : dn >
where c1, c2, …, cn are column names and d1,
d2, …, dn are domain names.
 A relation instance is a table with rows
(called tuples) and named columns (called
attributes).
A Collectional Data Model
 A collection schema is a pair < K, V >.
 K, the key, is a pair k : d where k is the key name and d is
the domain name .
 V, the value, is either a relation schema or a collection
schema.

 A collection instance is a set of key-value
pairs (pi, qi) (i∈ {1,…,m}).
 Each pi is a scalar value.
 Each qi is either a relation instance or a collection instance.
A Collectional Data Model
An example:
 Parameters< Model : String, Experiments :
Integer, <Concentration : Double, Degree :
Integer >>.
The Collectional Operators
 We extend the relational operators to the
collectional operators of which the collections
are the only operands.
 Six primitive operators: union, set difference,
selection, projection, Cartesian product and
renaming.
 The set of the collections is closed under those
operators.
 A relation can be defined as a collection whose
height and cardinality are equal to 1. The
collectional operators will then reduce to the
relational operators.
The Collectional Operators
 The union and the set difference operators can
only be applied on union-compatible
collections.
Result
Model

26

m1
Result
m2

32

Result
Model

32

m2
Result
m3

31
The Collectional Operators
 Example of the union operator and the set
difference operator.

Model
m1
m2
m3

Result
26

Model

Result

Result

m1

26

32

m2

Result

Result
31
The Collectional Operators
 Example of the Cartesian product Operator
and the Renaming Operator.
M1.Result M2.Result

M2.model
M1.model
m1
m2

26

32

m1
M1.Result M2.Result

m2
M2.model
m1
m2

26

31

M1.Result M2.Result

32

32

M1.Result M2.Result

32

31
The Collectional Operators
 Example of the selection operator.
Model
m2

Experiment
1
Concentration Degree

7.1

15

...
...
The Collectional Operators
 Example of the projection operator.

Concentration Degree

Experiment
1
2

...

7.0

15

...

7.1

15

...

Concentration Degree

...

7.0
...

30

...

7.1

30

...
Key Features of VIEW
F1: VIEW features the first uniform workflow

model, in which workflows are the only
building blocks. In VIEW, tasks are primitive
workflows and all workflow constructs do not
discriminate workflows from tasks. Such a
model greatly simplifies workflow design, in
which a workflow designer only needs to
compose complex workflows from simpler
ones without the need to first encapsulate
workflows to tasks or vice versa during the
composition process.
F2: VIEW has a powerful workflow composition
power in which workflow constructs are fully
compositional one with another with arbitrary
levels. This often results in VIEW workflows
that are more concise and efficient to
execute, which can be hard to model in other
workflow systems.
F3: VIEW features a pure dataflow-based

workflow language SWL, including the
dataflow counterparts of controlflow-style
constructs, such as conditional and loop.
Existing workflow languages often require
both controlflow and dataflow constructs,
resulting in complex or even obscure
semantics and non-trivial workflow design.
F4: VIEW supports the cloud MapReduce

programming model not only at the job level,
but also at the workflow level. Therefore, one
can apply the Map and Reduce constructs on
an arbitrary workflow with arbitrary number
of times. As a result, VIEW can process
nested lists of data products in parallel using
multiple runs of a workflow.
F5: VIEW features a collectional data model

that supports not only traditional primitive
data types, such as integer, float, double,
boolean, char, string, but also files, relations,
hierarchical collections (hierarchical key-value
pairs) to support parallel processing of data
collections.
F6: VIEW supports a high-level graphbased provenance query language
OPQL. In most cases, users can
formulate lineage queries easily without
the need of writing recursive queries or
knowing the underlying database
schema.
 F7: VIEW features the first service-oriented
architecture that conforms to the reference
architecture for scientific workflow
management systems (SWFMSs). This
architecture greatly facilitates interoperability
and subsystem reusability in the community.
This architecture also provides a generic
infrastructure upon which a domain-specific
scientific workflow application system
(SWFAS) can be easily developed with custom
interface for various platforms and devices.
Conclusions and Future Works
 A scientific workflow composition model.
 A collectional data model.
 A protypical SWFMS.
 Future work:
 Formalization of the scientific workflow algebra and
collectional algebra.
• Completeness.
• Integration.

 Collaborative scientific workflow composition.
• Concurrent design and composition.
• Concurrent execution.
VIEW application

Fiber tract analysis for Epilepsy.
VIEW application

Computational detection of MARS in
genome.
VIEW application

DNA analysis for bacteria E. Coli
VIEW application

Simulation of Nereis succinea mate search
behavior.
Big Data is a Pyramid

Can you contribute a piece too?
Big Data Research Laboratory
Wayne State University

viewsystem.org

Más contenido relacionado

La actualidad más candente

Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudIJAAS Team
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systemshdhappy001
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Workflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to ReportingWorkflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to ReportingRayhan Ferdous
 
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...CitiusTech
 
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATIONA BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATIONcscpconf
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...Institute of Information Systems (HES-SO)
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmIRJET Journal
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational ScienceChelle Gentemann
 
Data analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableauData analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableaupkale1708
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Mitul Tiwari
 

La actualidad más candente (19)

Fn3110961103
Fn3110961103Fn3110961103
Fn3110961103
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Hadoop
HadoopHadoop
Hadoop
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with Cloud
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Workflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to ReportingWorkflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to Reporting
 
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
 
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATIONA BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Data analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableauData analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableau
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
 

Destacado

26 November 2012
26 November 201226 November 2012
26 November 2012renabivens
 
العقيده المسيحيه يسوع المسيح
العقيده المسيحيه   يسوع المسيحالعقيده المسيحيه   يسوع المسيح
العقيده المسيحيه يسوع المسيحIbrahimia Church Ftriends
 
Factors Contributing to the Decline of the Anchovy Fisheries in Krueng Raya B...
Factors Contributing to the Decline of the Anchovy Fisheries in Krueng Raya B...Factors Contributing to the Decline of the Anchovy Fisheries in Krueng Raya B...
Factors Contributing to the Decline of the Anchovy Fisheries in Krueng Raya B...Zulhamsyah Imran
 
Khemjira Plongsawai- My Portfolio forblog
Khemjira Plongsawai- My Portfolio forblogKhemjira Plongsawai- My Portfolio forblog
Khemjira Plongsawai- My Portfolio forblogKhemjira_P
 
Travis Weisleder "Special Finance Online"
Travis Weisleder "Special Finance Online"Travis Weisleder "Special Finance Online"
Travis Weisleder "Special Finance Online"Sean Bradley
 
ELECTRODEPOSITION OF TITANIUM AND ITS DIOXIDE FROM ILMENITE
ELECTRODEPOSITION OF TITANIUM AND ITS DIOXIDE FROM ILMENITE ELECTRODEPOSITION OF TITANIUM AND ITS DIOXIDE FROM ILMENITE
ELECTRODEPOSITION OF TITANIUM AND ITS DIOXIDE FROM ILMENITE Al Baha University
 
اللوبى الصهيونى والاستراتيجية الأمريكية فى الشرق الأوسط
اللوبى الصهيونى والاستراتيجية الأمريكية فى الشرق الأوسطاللوبى الصهيونى والاستراتيجية الأمريكية فى الشرق الأوسط
اللوبى الصهيونى والاستراتيجية الأمريكية فى الشرق الأوسطIbrahimia Church Ftriends
 
2015 JBUG KOREA MEETUP - spring4 width infinispan
2015 JBUG KOREA MEETUP - spring4 width infinispan2015 JBUG KOREA MEETUP - spring4 width infinispan
2015 JBUG KOREA MEETUP - spring4 width infinispanYongHyuk Lee
 
Los espiritus del bosque
Los espiritus del bosqueLos espiritus del bosque
Los espiritus del bosqueada48salamanca
 
Final Presentation
Final Presentation Final Presentation
Final Presentation JSchop
 
Albert helps the BYOD phenomenon
Albert helps the BYOD phenomenonAlbert helps the BYOD phenomenon
Albert helps the BYOD phenomenonnoHold, Inc.
 
Holy orders &amp; Anointing of the sick
Holy orders  &amp;  Anointing of the sickHoly orders  &amp;  Anointing of the sick
Holy orders &amp; Anointing of the sickDolores Vasquez
 
Random 101227210331-phpapp01
Random 101227210331-phpapp01Random 101227210331-phpapp01
Random 101227210331-phpapp01Khemjira_P
 
Rancangan mengajar yang betul
Rancangan mengajar yang betulRancangan mengajar yang betul
Rancangan mengajar yang betulKamariah Osman
 
Instant GMP Compliance Series -Better Compliance through Master Manufacturing...
Instant GMP Compliance Series -Better Compliance through Master Manufacturing...Instant GMP Compliance Series -Better Compliance through Master Manufacturing...
Instant GMP Compliance Series -Better Compliance through Master Manufacturing...InstantGMP™
 

Destacado (20)

26 November 2012
26 November 201226 November 2012
26 November 2012
 
العقيده المسيحيه يسوع المسيح
العقيده المسيحيه   يسوع المسيحالعقيده المسيحيه   يسوع المسيح
العقيده المسيحيه يسوع المسيح
 
Factors Contributing to the Decline of the Anchovy Fisheries in Krueng Raya B...
Factors Contributing to the Decline of the Anchovy Fisheries in Krueng Raya B...Factors Contributing to the Decline of the Anchovy Fisheries in Krueng Raya B...
Factors Contributing to the Decline of the Anchovy Fisheries in Krueng Raya B...
 
Khemjira Plongsawai- My Portfolio forblog
Khemjira Plongsawai- My Portfolio forblogKhemjira Plongsawai- My Portfolio forblog
Khemjira Plongsawai- My Portfolio forblog
 
20 способов влюбить в себя любого
20 способов влюбить в себя любого20 способов влюбить в себя любого
20 способов влюбить в себя любого
 
Travis Weisleder "Special Finance Online"
Travis Weisleder "Special Finance Online"Travis Weisleder "Special Finance Online"
Travis Weisleder "Special Finance Online"
 
ELECTRODEPOSITION OF TITANIUM AND ITS DIOXIDE FROM ILMENITE
ELECTRODEPOSITION OF TITANIUM AND ITS DIOXIDE FROM ILMENITE ELECTRODEPOSITION OF TITANIUM AND ITS DIOXIDE FROM ILMENITE
ELECTRODEPOSITION OF TITANIUM AND ITS DIOXIDE FROM ILMENITE
 
МВА менеджмент
МВА менеджментМВА менеджмент
МВА менеджмент
 
اللوبى الصهيونى والاستراتيجية الأمريكية فى الشرق الأوسط
اللوبى الصهيونى والاستراتيجية الأمريكية فى الشرق الأوسطاللوبى الصهيونى والاستراتيجية الأمريكية فى الشرق الأوسط
اللوبى الصهيونى والاستراتيجية الأمريكية فى الشرق الأوسط
 
Nokia 6630
Nokia 6630Nokia 6630
Nokia 6630
 
2015 JBUG KOREA MEETUP - spring4 width infinispan
2015 JBUG KOREA MEETUP - spring4 width infinispan2015 JBUG KOREA MEETUP - spring4 width infinispan
2015 JBUG KOREA MEETUP - spring4 width infinispan
 
Los espiritus del bosque
Los espiritus del bosqueLos espiritus del bosque
Los espiritus del bosque
 
Final Presentation
Final Presentation Final Presentation
Final Presentation
 
Team building lsp_july_2013
Team building lsp_july_2013Team building lsp_july_2013
Team building lsp_july_2013
 
Albert helps the BYOD phenomenon
Albert helps the BYOD phenomenonAlbert helps the BYOD phenomenon
Albert helps the BYOD phenomenon
 
Programa
ProgramaPrograma
Programa
 
Holy orders &amp; Anointing of the sick
Holy orders  &amp;  Anointing of the sickHoly orders  &amp;  Anointing of the sick
Holy orders &amp; Anointing of the sick
 
Random 101227210331-phpapp01
Random 101227210331-phpapp01Random 101227210331-phpapp01
Random 101227210331-phpapp01
 
Rancangan mengajar yang betul
Rancangan mengajar yang betulRancangan mengajar yang betul
Rancangan mengajar yang betul
 
Instant GMP Compliance Series -Better Compliance through Master Manufacturing...
Instant GMP Compliance Series -Better Compliance through Master Manufacturing...Instant GMP Compliance Series -Better Compliance through Master Manufacturing...
Instant GMP Compliance Series -Better Compliance through Master Manufacturing...
 

Similar a An Overview of VIEW

Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudData Finder
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...Alex Liu
 
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsDATAVERSITY
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011Ian Foster
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013Kirill Osipov
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumVMware Tanzu
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryIan Foster
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 

Similar a An Overview of VIEW (20)

Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloud
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
 
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic Solutions
 
Uses of Data Lakes
Uses of Data Lakes Uses of Data Lakes
Uses of Data Lakes
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 

Último

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Último (20)

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

An Overview of VIEW

  • 1. Scientific Workflows for Big Data Prof. Shiyong Lu Big Data Research Laboratory Department of Computer Science Wayne State University shiyong@wayne.edu
  • 2. Today’s data-intensive science Looking for needle in haystack Looking into haystack Jim Gray: Turing Award laureate
  • 3. Big Data Challenges Looking for needle in haystack For Big Data, data management and movement is a frequent challenge …between facilities, Looking needle in archives, researchers… haystack Many files, large data volumes With security, reliability, performance… Ian Foster: Father of Grid Computing
  • 4. Big Data Challenges Looking for needle in haystack Capture Curation Looking needle in haystack Storage Search Sharing Analysis Visualization
  • 5. Big Data Science Large Hardron Collider (LHC)) 15 PB/year 173 TB/day 500 MB/sec Higgs discovery is “only possible because of the extraordinary achievements of … grid computing” —Rolf Heuer, CERN DG
  • 6. Data flows at Argonne National Lab Data management challenges External Argonne data sources flows in 163 9 9 TB/day Advanced Photon Source (estimates) Argonne Leadership Computing Facility 143 100 Shortterm storage 100 150 Credit: Ian Foster Data analysis 10 50 Longterm storage
  • 7. Big Data demands new CS research For example, existing clustering algorithms are typically cubic in N, and when N is too big, they do not work! - Jim Gray
  • 8. What is Big Data? •Definition of Big Data: “…refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.” from nsf.gov website
  • 9. Big Data Challenges •Challenges of Big Data: “national big data challenges, which include advances in core techniques and technologies; big data infrastructure projects in various science, biomedical research, health and engineering communities; education and workforce development; and a comprehensive integrative program to support collaborations of multi-disciplinary teams and communities to make advances in the complex grand challenge science, biomedical research, and engineering problems of a computational- and data-intensive world.” from nsf.gov website
  • 10. Big Data demands big workflows Reminiscent of
  • 11. And thousands of parallel executions Managing big workflows and large-scale parallel execution is a big CS challenge !
  • 12. Outline 1 Introduction 2 VIEW: A Prototypical SWFMS 3 A Scientific Workflow Composition Model 4 A Collectional Data Model 5 Conclusions and Future Work
  • 13. Introduction  Data Intensive Science  From computation intensive to data intensive.  A new research cycle – from data capture and data curation to data analysis and data visualization.  “In the future, the rapidity with which any given discipline advances is likely to depend on how well the community acquires the necessary expertise in database, workflow management, visualization, and cloud computing technologies.” (“Beyond the Data Deluge”, Science, Vol. 323. no. 5919, pp. 1297 – 1298, 2009.)
  • 14. Introduction  Scientific Workflow  A formal specification of a scientific process.  Represents, streamlines, and automates the steps from dataset selection and integration, computation and analysis, to final data product presentation and visualization.  Applications: Bioinformatics, Oceanography, Neuroinformatics, Astronomy, etc.
  • 15. Introduction  Scientific Workflow Management System (SWFMS)  Supports the specification, modification, execution, failure handling, and monitoring of a scientific workflow.  Existing SWFMSs: • • • • Taverna, Kepler, Pegasus, VisTrails, • VIEW, • …
  • 17. Our VIEW System  Enables scientist to design workflows
  • 18. Our VIEW System  Enables scientist to design workflows  Provides runtime system to execute workflow
  • 19. Our VIEW System  Enables scientist to design workflows  Provides runtime system to execute workflow  on dedicated VIEW server
  • 20. Our VIEW System  Enables scientist to design workflows  Provides runtime system to execute workflow  on dedicated VIEW server  in Cloud computing environment
  • 21. Our VIEW System  Enables scientist to design workflows  Provides runtime system to execute workflow  on dedicated VIEW server  in Cloud computing environment  Supports efficient collection, storage, querying, and visualization of workflow provenance
  • 22. Our VIEW System  Enables scientist to design workflows  Provides runtime system to execute workflow  on dedicated VIEW server  in Cloud computing environment  Supports efficient collection, storage, querying, and visualization of workflow provenance  Is currently used in several bioinformatics applications, including genomic recombination and gene conversion data analysis
  • 23. An Example Workflow in VIEW  Example workflows in
  • 25. VIEW 1-2-3 Step 1: Drag and drop inputs and outputs, and computational
  • 26. VIEW 1-2-3 Step 2: Link them into a scientific workflow
  • 27. VIEW 1-2-3 Step 3: Click the run button, you get the result!
  • 29. An Example Workflow in VIEW  FiberFlow  Transforms the large-scale neuroimaging data to knowledge through crosssubject, cross-modality computation, ultimately leading to high clinical intelligence in neural diseases.
  • 30. VIEW: A Prototypical SWFMS  Minimum complexity for users, but massive techniques in the backstage.  To provide a clear and simple abstraction for manipulating and coordinating resources  Service-oriented architecture.  Intuitive, user-friendly GUI
  • 31. A Reference Architecture for SWFMSs Service-oriented architecture of VIEW
  • 32. A Reference Architecture for SWFMSs Other advantages of :
  • 33. A Reference Architecture for SWFMSs Other advantages of :  VIEW workflows can be executed in other systems (specifications are not tied to a particular SWFMS)  Use of open standards (Web Services, XML) promotes collaboration, interoperability and extensibility of the system  Workflow and data models implemented in VIEW are specifically geared towards heavy scientific data
  • 35. VIEW: A Prototypical SWFMS A typical scientific workflow execution diagram.
  • 36. Workflow Engine Workflow Engine is the heart of the system.  Workflow Orchestration.  Workflow Execution.  Coordination of other subsystems. Workflow Engine in VIEW.  Dataflow based.  Pure workflow composition.  Workflow constructs.
  • 37. SWL  Example of our proposed scientific workflow specification language (SWL).
  • 38. Primitive Workflow Specification  Example SWL specification of a primitive workflow.
  • 39. Workflow Execution Workflow Execution  Primitive workflow  Unary construct based workflow  Graph based workflow • A workflow graph is a composition of workflows by binary constructs. • Optimistic scheduling.
  • 41. Data Product Manager Data Product Manager     Solid data model. Scalable data storage. Convenient data access. Data Independence. Data Product Manager is based on the collectional data model.
  • 42. DPM Architecture  Architecture of the Data Product Manager. Data Product Manager Main Server Master Data Access Layer Node Database Node Database Node Database Data Mapping Layer Data Set 1 Relational Databases File Repositorys Data Set 2 Relational Databases File Repositorys Data Storage Layer
  • 43. DPL  Example of the XML description of a collectional data product.
  • 44. Data Storage  VIEW supports two ways of storage:  A collection can be stored in a table containing a set of its key/value pairs, whose values are references to existing collections.  A collection can be expanded and stored in two tables. • The Group By operator. • The Compress operator.
  • 45. Data Typing A Data Product  a Collection  or a List  or an Empty. The List type  Introduced in the workflow engine.  Each element is a data product.  Heterogeneous.
  • 46. Collectional Data Querying Operators are implemented in primitive workflows.     Arithmetic operators. Boolean operators. Collectional operators. List operators. Queries are implemented in workflow compositions.
  • 47. Example  Given a table Reference < Student, Company, GradTime >, Find the total number of students offered in each company and each graduation year; Sort the result in descending GradTime and ascending Company order.  SQL query.  SELECT Company, GradTime, COUNT(DISTINCT Student) AS NumberOfJob FROM Reference GROUP BY Company, GradTime ORDER BY GradTime DESC, Company ASC;
  • 48. Example of Query Workflow Query Workflow.
  • 49. Key Requirements for Workflow Modeling R1: Programming-in-the-large. R2: Dataflow programming model. R3: Composable dataflow constructs. R4: Workflow encapsulation and hierarchical composition. R5: Single-assignment property. R6: Physical and logical data models. R7: Exception handling.
  • 50. A Scientific Workflow Model Workflows are the basic and the only operands for workflow composition. M i1 ii1 W1 o1 k i1 ii1 W2 o1 k o1 o1 W3 Task components (e.g. Web services) are constructed to primitive workflows (a.k.a. tasks) which are the basic building blocks of scientific workflows.
  • 51. A Scientific Workflow Model A workflow construct is a mapping from a set of workflows to a workflow.  Unary workflow constructs  Binary workflow constructs  … A construct C takes a set of workflows W1, ...., Wn as input, and composes them into Wc as the output workflow.
  • 52. A Scientific Workflow Model  Our proposed scientific workflow model consists of the following two layers:  The logical layer contains the workflow interface that models the input ports and output ports of a workflow.  The physical layer contains the workflow body that models the physical implementation of the workflow. • Primitive workflows. • Graph-based workflows. • Unary-construct-based workflows.
  • 53. Unary Workflow Constructs Dataflow-based Unary Workflow Constructs
  • 54. The Map Construct  The Map construct enables the parallel processing of a collection of data products based on a workflow that can only process a single data product.  Example: [[1,2],[3,6],[4,7]] [1,2] ii1 k W1 o1 W2 o1 W1 o1 2 [3,6] M i1 i1 ik i1 k W1 o1 18 [4,7] i1 ik W1 o1 28
  • 55. The Reduce Construct  The Reduce construct enables the aggregation of a list of data products to a single data product based on a workflow that aggregates a limited (two or more) number of input data products.  Example: R i1 0 [3,5,9] i2 i1 Add o 1 i2 k W3 0 3 o1 i1 Addo1 3 i2 5 i1 Add o1 i2 8 9 i1 Add o1 17 i2
  • 56. The Tree Construct  The Tree construct  Enables parallel aggregation of a collection of data products.  Aggregates a collection pairwisely as a binary tree until one single aggregated product is generated.  The Tree construct can be applied on associative workflows.  Example: T [0,3,5,9] i1 i1 Add o 1 i2 k W4 o1 0 3 i1 Addo1 3 i2 5 9 i1 Addo1 i2 14 i1 Add o1 i2 17
  • 57. The Conditional Construct  The Conditional construct enables the conditional execution of a workflow based on a condition on one of the inputs.  Example: [2,3] 2 p=(PI 1 < PI 2 ) C i1 p i1 o1 o1 p=true [2,3] i1 o iProjection k Projection 1 i2 2 i2 W4 [2,3] 1 p=(PI 1 >= PI 2 ) C i1 p i1 o1 o1 p=false Projection ik i2 i2 W4 i2 Fail i1 2 Projection i2 3
  • 58. The Loop Construct  The Loop construct enables cyclic executions of a workflow.  The output of the workflow will be repetitively returned (fed back) to a specified input port until the predicate evaluates to true.  Example: p=(PI 1 >100) L 0 1 i1 i1 i2 i2 ik Add o1 o1 p 0 1 i1 o1 p=false 1 Add i2 i1 1 o1 p=false 2 Add i2 ... 1 101 Add i2 p=true
  • 59. The Curry Construct  The Curry construct allows users to fix one of the input ports with a specified argument and thus reduce the number of input ports.  By applying multiple Curry constructs, a workflow that takes multiple arguments can be translated into a chain of workflows each with a single argument.  Example: U 4 1 i1 i1 Add o 1 i2 k W8 o1 1 4 i1 Add o 1 i2 k 5
  • 60. Workflow Composition  Example of the composition of Map and Map constructs.  A Workflow that increase all the numbers in a nested list by 1. 1 i1 o M M 1 i1 i2 [[1,2,3],[4,5,6]] i1 o1 k ii2 Add (a) W9 o1 1 1 2 1 3 1 4 1 5 1 6 1 k ii2 Add i1 o ik Add 1 i2 i1 o ik Add 1 i2 i1 o1 k ii2 Add i1 o ik Add 1 i2 i1 o ik Add 1 i2 2 3 4 5 6 7
  • 61. Workflow Composition  Example of the composition of Map and Reduce constructs.  A workflow for parallel summation of each row in a matrix . 0 o1 o1 1 o1 i1 Addition i2 k 2 o1 i1 Addition ik i2 0 4 o1 i1 Addition i2 k 5 o1 i1 Addition ik i2 M R 0 i1 i1 i2 i2 ik Add [[1,2,3],[4,5,6]] W11 3 6 o1 i1 Addition 6 ii2 k o1 i1 Addition 15 ii2 k
  • 62. Workflow Composition  Example of complicated workflow composition.  A workflow to calculate the greatest common divisor. L p=(PI(2)==0) i1 i1 i1 Split o1 o2 G2W o i1 iModulus 1 k i2 ii1 o o kMerge 1 1 i2 W13 o1 W14 G2W i1 i2 o1 i1 Merge i2 i1 M o1 i1W14 o1 W15 W17 i1 1 M U o1 o1 i1 iikProjection 2 W16 o1
  • 63. A Collectional Data Model  A collectional data model  Support collection oriented datasets. • Scientists often work with collection oriented datasets, such as arrays, lists, tables or file collections. • A collection-oriented data model enables data parallelism in scientific workflows.  Support nested data structures. • Scientific data is often hierarchically organized. • Scientific workflow tasks often produce collections of data products, and the execution of a workflow composed from such tasks can create increasingly nested data collections.  Provide well-defined operators and their arbitrary compositions to manipulate and query scientific data collections.
  • 64. A Collectional Data Model  A relation is a pair < R, r > where R is a schema of the relation and r is an instance of that schema.  A relation schema can be defined as an unordered tuple < c1 : d1, c2 : d2, …, cn : dn > where c1, c2, …, cn are column names and d1, d2, …, dn are domain names.  A relation instance is a table with rows (called tuples) and named columns (called attributes).
  • 65. A Collectional Data Model  A collection schema is a pair < K, V >.  K, the key, is a pair k : d where k is the key name and d is the domain name .  V, the value, is either a relation schema or a collection schema.  A collection instance is a set of key-value pairs (pi, qi) (i∈ {1,…,m}).  Each pi is a scalar value.  Each qi is either a relation instance or a collection instance.
  • 66. A Collectional Data Model An example:  Parameters< Model : String, Experiments : Integer, <Concentration : Double, Degree : Integer >>.
  • 67. The Collectional Operators  We extend the relational operators to the collectional operators of which the collections are the only operands.  Six primitive operators: union, set difference, selection, projection, Cartesian product and renaming.  The set of the collections is closed under those operators.  A relation can be defined as a collection whose height and cardinality are equal to 1. The collectional operators will then reduce to the relational operators.
  • 68. The Collectional Operators  The union and the set difference operators can only be applied on union-compatible collections. Result Model 26 m1 Result m2 32 Result Model 32 m2 Result m3 31
  • 69. The Collectional Operators  Example of the union operator and the set difference operator. Model m1 m2 m3 Result 26 Model Result Result m1 26 32 m2 Result Result 31
  • 70. The Collectional Operators  Example of the Cartesian product Operator and the Renaming Operator. M1.Result M2.Result M2.model M1.model m1 m2 26 32 m1 M1.Result M2.Result m2 M2.model m1 m2 26 31 M1.Result M2.Result 32 32 M1.Result M2.Result 32 31
  • 71. The Collectional Operators  Example of the selection operator. Model m2 Experiment 1 Concentration Degree 7.1 15 ... ...
  • 72. The Collectional Operators  Example of the projection operator. Concentration Degree Experiment 1 2 ... 7.0 15 ... 7.1 15 ... Concentration Degree ... 7.0 ... 30 ... 7.1 30 ...
  • 73. Key Features of VIEW F1: VIEW features the first uniform workflow model, in which workflows are the only building blocks. In VIEW, tasks are primitive workflows and all workflow constructs do not discriminate workflows from tasks. Such a model greatly simplifies workflow design, in which a workflow designer only needs to compose complex workflows from simpler ones without the need to first encapsulate workflows to tasks or vice versa during the composition process.
  • 74. F2: VIEW has a powerful workflow composition power in which workflow constructs are fully compositional one with another with arbitrary levels. This often results in VIEW workflows that are more concise and efficient to execute, which can be hard to model in other workflow systems.
  • 75. F3: VIEW features a pure dataflow-based workflow language SWL, including the dataflow counterparts of controlflow-style constructs, such as conditional and loop. Existing workflow languages often require both controlflow and dataflow constructs, resulting in complex or even obscure semantics and non-trivial workflow design.
  • 76. F4: VIEW supports the cloud MapReduce programming model not only at the job level, but also at the workflow level. Therefore, one can apply the Map and Reduce constructs on an arbitrary workflow with arbitrary number of times. As a result, VIEW can process nested lists of data products in parallel using multiple runs of a workflow.
  • 77. F5: VIEW features a collectional data model that supports not only traditional primitive data types, such as integer, float, double, boolean, char, string, but also files, relations, hierarchical collections (hierarchical key-value pairs) to support parallel processing of data collections.
  • 78. F6: VIEW supports a high-level graphbased provenance query language OPQL. In most cases, users can formulate lineage queries easily without the need of writing recursive queries or knowing the underlying database schema.
  • 79.  F7: VIEW features the first service-oriented architecture that conforms to the reference architecture for scientific workflow management systems (SWFMSs). This architecture greatly facilitates interoperability and subsystem reusability in the community. This architecture also provides a generic infrastructure upon which a domain-specific scientific workflow application system (SWFAS) can be easily developed with custom interface for various platforms and devices.
  • 80. Conclusions and Future Works  A scientific workflow composition model.  A collectional data model.  A protypical SWFMS.  Future work:  Formalization of the scientific workflow algebra and collectional algebra. • Completeness. • Integration.  Collaborative scientific workflow composition. • Concurrent design and composition. • Concurrent execution.
  • 81. VIEW application Fiber tract analysis for Epilepsy.
  • 83. VIEW application DNA analysis for bacteria E. Coli
  • 84. VIEW application Simulation of Nereis succinea mate search behavior.
  • 85. Big Data is a Pyramid Can you contribute a piece too?
  • 86. Big Data Research Laboratory Wayne State University viewsystem.org