SlideShare una empresa de Scribd logo
1 de 126
Interactive Programs
Debugging and
Development in
Apache Spark
Outline
‣ Motivating Scenario
‣ Titian Programming Interface
‣ Internals
‣ Vega
‣ Conclusions
Outline
‣ Motivating Scenario
‣ Titian Programming Interface
‣ Internals
‣ Vega
‣ Conclusions
๏ Debugging data processing logic in Data-Intensive Scalable Computing
(DISC) system is difficult
๏ Analysis tools are still in their “infancy”
๏ Today’s large-scale jobs are black boxes:
• Job submitted to a cluster
• Results come back minutes to hours later
• No visibility into running algorithm
Big Data Debugging
Big Data Debugging - State of the Art
Big Data Debugging - State of the Art
Big Data Debugging - State of the Art
Big Data Debugging - State of the Art
Big Data Debugging - State of the Art
๏ Easy to use GDB-like debugger [ICSE 16] (not covered in this talk)
๏ Visibility of data into running workflow
• E.g., what (input) data led to this (outlier) result?
๏ Selectively replaying a portion of the data processing steps on subsets
of intermediate data leading to outliers results
๏ Interactive program analysis
Big Data Debugging - Desiderata
๏ Visibility of data -> Tracking the dependencies between the
individual inputs and outputs records
๏ Selective replay -> Storage of intermediate results:
• Dataset shared among running job and analysis tool
๏ Interactivity -> Implementation Constraints:
• Latency constraint - In memory computation
• Programming interface constraint - Integration with Spark DSL
Big Data Debugging - Challenges
๏ Well known technique in databases
๏ Two granularities of provenance
• Transformation (coarse-grained) provenance
– Records the complete workflow of the derivation of a dataset
– Spark RDD lineage is an example of this form of provenance
• Data (fine-grained) provenance
– Records data dependencies between input and output records
– The type of provenance Titian focuses on
Data Provenance (Lineage)
Tuple-ID Time Sendor-ID Temperature
T1 11AM 1 34
T2 11AM 2 35
T3 11AM 3 35
T4 12PM 1 35
T5 12PM 2 35
T6 12PM 3 100
T7 1PM 1 35
T8 1PM 2 35
T9 1PM 3 80
SELECT AVG(temp),time

FROM sensors

GROUP BY time
Sensors
Result-ID Time AVG(temp)
ID-1 11AM 34.6
ID-2 12PM 56.6
ID-3 1PM 50
Data Provenance - Example
Tuple-ID Time Sendor-ID Temperature
T1 11AM 1 34
T2 11AM 2 35
T3 11AM 3 35
T4 12PM 1 35
T5 12PM 2 35
T6 12PM 3 100
T7 1PM 1 35
T8 1PM 2 35
T9 1PM 3 80
SELECT AVG(temp),time

FROM sensors

GROUP BY time
Sensors
Result-ID Time AVG(temp)
ID-1 11AM 34.6
ID-2 12PM 56.6
ID-3 1PM 50
Outlier
Outlier
Why
ID-2 and ID-3
have those high
Data Provenance - Example
Tuple-ID Time Sendor-ID Temperature
T1 11AM 1 34
T2 11AM 2 35
T3 11AM 3 35
T4 12PM 1 35
T5 12PM 2 35
T6 12PM 3 100
T7 1PM 1 35
T8 1PM 2 35
T9 1PM 3 80
SELECT AVG(temp),time

FROM sensors

GROUP BY time
Sensors
Result-ID Time AVG(temp)
ID-1 11AM 34.6
ID-2 12PM 56.6
ID-3 1PM 50
Outlier
Outlier
Why
ID-2 and ID-3
have those high
Data Provenance - Example
๏ They use external storage systems (HDFS in
RAMP [CIDR-11], DBMS in Newt [SOCC-13]) to
retain lineage data
๏ Data provenance queries are supported in a
separate programming interface
Previous Data Provenance DISC Systems
๏ They use external storage systems (HDFS in
RAMP [CIDR-11], DBMS in Newt [SOCC-13]) to
retain lineage data
๏ Data provenance queries are supported in a
separate programming interface
High overhead
Previous Data Provenance DISC Systems
๏ They use external storage systems (HDFS in
RAMP [CIDR-11], DBMS in Newt [SOCC-13]) to
retain lineage data
๏ Data provenance queries are supported in a
separate programming interface
High overhead
Low interactivity
Previous Data Provenance DISC Systems
๏ Word Count job
๏ RAMP is up to 4X Spark
๏ Newt up to 86X
Experience with Newt and RAMP
100
1000
1 10 100
Time(s)
Dataset Size (GB)
Spark
Newt
RAMP
Outline
‣ Motivating Scenario
‣ Titian Programming Interface
‣ Internals
‣ Vega
‣ Conclusions
Loads error messages from a log, counts the
number of errors occurrences and returns a report
containing the description of each error
lc = new LineageContext(sc)
lines = lc.textFile(“hdfs://...”)
errors = lines.filter(_.startswith(“error”))
codes = errors.map(_.split(“t”)(1))
pairs = codes.map(word =>(word, 1))
counts = pairs.reduceByKey(word =>(_ + _))
reports = counts.map(kv => (dscr(kv._1), kv._2))
reports.collect.foreach(println)
Example: Log Analysis
Given the result of the previous example, select the
most frequent error and trace back to the input
lines containing them
Example: Backward Tracing
Given the result of the previous example, select the
most frequent error and trace back to the input
lines containing them
frequentPair = reports.sortBy(_._2, false).take(1)
frequent = reports.filter(_ == frequentPair)
lineage = frequent.getLineage()
input = lineage.goBackAll()
input.collect().foreach(println)
Example: Backward Tracing
Return the error codes generated from the network
sub-system (indicated in the log by a “NETWORK” tag)
Example: Forward Tracing
Return the error codes generated from the network
sub-system (indicated in the log by a “NETWORK” tag)
network = errors.filter(_.contains(“NETWORK”))
lineage = network.getLineage()
output = lineage.goNextAll()
output.collect().foreach(println)
Example: Forward Tracing
Return the error distribution without the ones cause by
the Guest user
Example: Selective Replay
Return the error distribution without the ones cause by
the Guest user
lineage = reports.getLineage()
inputLines = lineage.goBackAll()
noGuest = inputLines.filter(!_.contains(“Guest”) && _.startswith(“error”))
newCodes = noGuest.map(_.split(“t”)(1))
newPairs = newCodes.map(word =>(word, 1))
newCounts = newPairs.reduceByKey(word =>(_ + _))
newRep = newCounts.map(kv => (dscr(kv._1), kv._2))
newRep.collect
Example: Selective Replay
Outline
‣ Motivating Scenario
‣ Titian Programming Interface
‣ Internals
‣ Vega
‣ Conclusions
๏ LineageContext wrap SparkContext
• Providing visibility into the submitted job
๏ Instrument LineageRDD at stage boundaries
• Wrap native RDDs
• Specific LineageRDD implementation based on instrument transformation
๏ Provenance data is buffered inside LineageRDDs
• Saved into Spark BlockManager for querying
Provenance Capturing
countspairscodeserrorslines
Stage 1 Stage 2
reports
lines = sc.textFile(“hdfs://...”)
errors = lines.filter(_.startswith(“error”))
codes = errors.map(_.split(“t”)(1))
pairs = codes.map(word =>(word, 1))
counts = pairs.reduceByKey(word =>(_ + _))
reports = counts.map(kv => (dscr(kv._1), kv._2))
Spark Stage DAG
Instrumented Spark Stage DAG
Combiner
LineageRDD
Reducer
LineageRDD
Hadoop
LineageRDD
counts
pairscodeserrorslines
Stage 1
Stage 2
reports
Stage
LineageRDD
Instrumented Workflow
Combiner
LineageRDD
Reducer
LineageRDD
Hadoop
LineageRDD
counts
pairscodeserrorslines
Stage 1
Stage 2
reports
Stage
LineageRDD
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Input
ID
Output
ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Lineage Capture Runtime Overheads
100
1000
1 10 100
Time(s)
Dataset Size (GB)
Spark
Titian
Newt
RAMP
๏ Same Word Count job
๏ Titian is in average 1.3X slower than Spark
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Hadoop Combiner Reducer Stage
Example: Captured Data Lineage
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Hadoop Combiner Reducer Stage
Example: Trace Back
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Hadoop Combiner Reducer Stage
Example: Trace Back
Stage.Input IDReducer.Output ID
Reducer.Output IDCombiner.Output ID
Example: Trace Back
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Hadoop Combiner Reducer Stage
Example: Trace Back
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Hadoop Combiner Reducer Stage
Combiner.Input IDHadoop.Output ID
Now let’s do it for real!
Worker1
Worker2
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Example: Trace Back
Example: Trace Back
Worker1
Worker2
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Example: Trace Back
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Example: Trace Back
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Stage.Input IDReducer.Output ID
Example: Trace Back
Worker1
Worker2
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Example: Trace Back
Worker1
Worker2
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Worker1
Worker2
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Input ID Output ID
p1 400
Input ID Output ID
p1 400
Targeted Shuffle
Example: Trace Back
Worker1
Worker2
Worker3
Input ID Output ID
400 id1
4 id2
Stage
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Input ID Output ID
p1 400
Input ID Output ID
p1 400
Example: Trace Back
Worker1
Worker2
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Input ID Output ID
p1 400
Input ID Output ID
p1 400
Combiner.Output ID Reducer.Output ID
Combiner.Output ID Reducer.Output ID
Example: Trace Back
Hadoop Combiner
Worker1
Worker2
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Example: Trace Back
Hadoop Combiner
Worker1
Worker2
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Worker1
Worker2
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Combiner.Input IDHadoop.Output ID
Combiner.Input IDHadoop.Output ID
Tracing Performance
๏ Word Count job
๏ Tracing one record backward in < 1 sec for
dataset < 100GB
๏ 18 sec for 500GB dataset
Vega: Optimizations
for Selective Replay
Matteo Interlandi, Sai Deep Tetali, Muhammad Ali Gulzar, Joseph Noor
Miryung Kim, Todd Millstein, Tyson Condie
Under Submission
Debugging workflow
๏ Run program
๏ Understand the cause for bugs / outliers:
• Lineage
• Breakpoints/watchpoints
• Crash culprit
๏ Fix bug
• Fast selective replay
}
} Titian [VLDB 2016]
BigDebug [ICSE 2016]
First Strategy
Convert changes in code to changes in data
Incremental Plan
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
countspairslines
Stage 1 Stage 2
shuffle
input .map(x=>(x,1)) .reduceByKey(_+_)
Incremental Plan
Inject a filter in the workflow
countspairslines
Stage 1 Stage 2
shufflefilter
input .filter(x=>x!=‘c’).map(x=>(x,1)) .reduceByKey(_+_)
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(aa, 1)
Shuffle
(aa, [1, 1])
(b, 1)
Reduce
(aa, 2)
(b, 1)
Filter
aa
b
aa
countspairslines
Stage 1 Stage 2
shufflefilter
input .filter(x=>x!=‘c’).map(x=>(x,1)) .reduceByKey(_+_)
Incremental Plan
Incremental Plan
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter
aa
b
c
aa
c
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter
aa
b
c
aa
c
δFilter
—c
—c
Incremental Plan
Incremental Plan
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter
aa
b
c
aa
c
δFilter
—c
—c
∆Map
—(c, 1)
—(c, 1)
Incremental Plan
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter
aa
b
c
aa
c
δFilter
—c
—c
∆Map
—(c, 1)
—(c, 1)
∆Shuffle
c, [—1, —1])
Incremental Plan
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter
aa
b
c
aa
c
δFilter
—c
—c
∆Map
—(c, 1)
—(c, 1)
∆Shuffle
c, [—1, —1])
∆Reduce
—(c, 2)
Performance
Input data size (GB)
Time (s)
About 10X faster
Performance
๏ Good up to a certain point
๏ Two factors dominate:
• Space utilization
• Time to shuffle deltas
๏ Insight:
• The more downstream the filter is placed, the better the incremental
performance
• Especially beneficial if we can place it past the shuffle
Second Strategy
Push code changes downstream
Commutative Rewrite
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter
aa
b
c
aa
c
filter(x=>x!=‘c’)
Commutative Rewrite
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter
aa
b
c
aa
c
filter(x=>x!=‘c’)
But the input to the filter is (word, 1)
We cannot use the filter anymore
Commutative Rewrite
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter
aa
b
c
aa
c
filter(x=>x!=‘c’)
Observe that the map is invertible
We can use the old filter by using the inverse of the map
Commutative Rewrite
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter’
aa
b
c
aa
c
filter’((x, o)=>x!=‘c’)
Rewritten filter
Commutative Rewrite
Input
aa
b
c
aa
c
Map
(aa, 1)
(b, 1)
(c, 1)
(aa, 1)
(c, 1)
Shuffle
(aa, [1, 1])
(b, 1)
(c, [1, 1])
Reduce
(aa, 2)
(b, 1)
(c, 2)
Filter’
(aa, 2)
(b, 1)
filter((x, o)=>x!=‘c’)
Shuffle and Reduce operations preserve keys
Performance
Input data size (GB)
Time
About 1000X faster
Why does it scale so well?
๏ Runtime in the order of output
๏ Output depends on the number of unique words
๏ Unique words << total words
Combining Strategies
๏ Push the changed transform past as many shuffles
as possible with rewrites
• The new transform can be placed only after materialization
points
• By default we materialize shuffle output
• Efficient because Spark already save shuffle output for fault
tolerance
๏ Use delta computation for the remaining workflow
Vega
๏ Built on Spark and Spark SQL (only filter rewrite)
๏ Spark SQL API is unchanged
๏ Spark API includes:
• Functions with inverses (for maps)
• Inverse values (for incremental reduce)
๏ Automatically rewrites workflows using commutativity
and incremental evaluation
๏ Titian provides to Spark users the ability of tracing through program execution
๏ Features:
• Intermediate results are shared in memory
• Tight integration with the Spark API (LineageRDD)
• Low job overhead
• Efficient lineage query
๏ Vega provides 1–3 orders magnitude performance gains over rerunning the
computation from scratch
๏ Both provide results in a few seconds for many workflows allowing interactive
usage
} Transformation provenance
Conclusions
Thank you
Outline
‣ Motivating Scenario
‣ Titian Programming Interface
‣ Internals
‣ Performance
‣ Conclusions
Configuration
๏ Two set of experiments:
• Unstructured - grep and word count
• Structured - PigMix queries
๏ Datasets:
• Unstructured: from 500MB to 500GB files contains words generated using a
Zipf distribution from a dictionary of 8000 words
• Structured: we used the PigMix generator to create dataset of sizes ranging
from 1GB to 1TB
๏ Configuration:
• 16 4 cores (2 hyper threads per core) machines, 32GB of RAM, 1TB disk
• Spark 1.2.1
Lineage Capture Runtime Overheads
Tracing Performance
๏ Titian provides to Spark users the ability of tracing through
program execution at interactive speed
๏ Features:
• Intermediate results are shared in memory
• Tight integration with the Spark API (LineageRDD)
• Low job overhead
• Efficient lineage query
๏ We believe Titian will open the door to program logic debugging,
iterative data (and program) cleaning, and exploratory analysis
}Transformation provenance
Titian: Data Provenance in Spark
Combiner
LineageR
DD
Reducer
LineageR
DD
Instrumented Workflow
Hadoop
LineageR
DD
count
s
pairscodes
error
s
lines
Stage 1
Stage 2
repor
ts
Stage
LineageR
DD
Capturing: HadoopLineageRDD
Hadoop
LineageRDD
linesInput records Output records
Input ID Output
ID
TaskCont
ext
Capturing: HadoopLineageRDD
Input records Output records
Input ID Output
ID
Hadoop
LineageRDD
linesoffset1, “error
400 …”
TaskCont
ext
Capturing: HadoopLineageRDD
Input records Output records
Input ID Output
ID
Get input Id
Hadoop
LineageRDD
linesoffset1, “error
400 …”
TaskCont
ext
Capturing: HadoopLineageRDD
Input records Output records
Input ID Output
ID
offset1
Get input Id
Hadoop
LineageRDD
linesoffset1, “error
400 …”
TaskCont
ext
Capturing: HadoopLineageRDD
Input records Output records
Input ID Output
ID
offset1
“error 400
…”
Hadoop
LineageRDD
lines
TaskCont
ext
Capturing: HadoopLineageRDD
Input records Output records
Input ID Output
ID
offset1
Get output Id
Hadoop
LineageRDD
lines “error 400
…”
TaskCont
ext
Capturing: HadoopLineageRDD
Input records Output records
Input ID Output
ID
offset1 id1
Get output Id
Hadoop
LineageRDD
lines “error 400
…”
TaskCont
ext
Capturing: HadoopLineageRDD
Input records Output records
Input ID Output
ID
offset1 id1
Save
Hadoop
LineageRDD
lines “error 400
…”
TaskCont
ext
Capturing: HadoopLineageRDD
Input records Output records
Input ID Output
ID
offset1 id1
Save
Hadoop
LineageRDD
lines “error 400
…”
TaskCont
ext
id1
Capturing: HadoopLineageRDD
Input records Output records
Hadoop
LineageRDD
lines
Input ID Output
ID
offset1 id1
offset2
offset2, “error 4
…”
TaskCont
ext
id1
Capturing: HadoopLineageRDD
Input records Output records
Hadoop
LineageRDD
lines
Input ID Output
ID
offset1 id1
offset2 id2
“error 4 …”
TaskCont
ext
id2
Capturing: HadoopLineageRDD
Input records Output records
Hadoop
LineageRDD
lines
Input ID Output
ID
offset1 id1
offset2 id2
offset3
offset3, “error
400 …”
TaskCont
ext
id2
Capturing: HadoopLineageRDD
Input records Output records
Hadoop
LineageRDD
lines
Input ID Output
ID
offset1 id1
offset2 id2
offset3 id3
“error 400
…”
TaskCont
ext
id3
Combiner
LineageR
DD
Reducer
LineageR
DD
Instrumented Workflow
Hadoop
LineageR
DD
count
s
pairscodes
error
s
lines
Stage 1
Stage 2
repor
ts
Stage
LineageR
DD
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1
offset1, “error
400 …”
Ke
y
Input
IDs
Ke
y
Agg
Value
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
“error 400
…”
Ke
y
Agg
Value
Ke
y
Input
IDs
TaskCont
ext
id1
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
“error 400
…”
Ke
y
Agg
Value
Ke
y
Input
IDs
TaskCont
ext
id1
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
400 Ke
y
Agg
Value
Ke
y
Input
IDs
TaskCont
ext
id1
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
(400, 1) Ke
y
Agg
Value
Ke
y
Input
IDs
TaskCont
ext
id1
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
Ke
y
Agg
Value
40
0
1
Ke
y
Input
IDs
400
TaskCont
ext
id1
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
Ke
y
Agg
Value
40
0
1
Ke
y
Input
IDs
400
TaskCont
ext
id1
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
Ke
y
Agg
Value
40
0
1
Ke
y
Input
IDs
400 { id1 }
TaskCont
ext
id1
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1
Ke
y
Agg
Value
40
0
1
Ke
y
Input
IDs
400 { id1 }
offset2, “error
4 …”
Input ID Output
ID
offset1 id1
TaskCont
ext
id1
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
offset2 id2
Ke
y
Agg
Value
40
0
1
4 1
Ke
y
Input
IDs
400 { id1 }
4 { id2 }
TaskCont
ext
id2
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
offset2 id2
Ke
y
Agg
Value
40
0
1
4 1
Ke
y
Input
IDs
400 { id1 }
4 { id2 }
TaskCont
ext
id2
offset3, “error
400 …”
Combiner
LineageR
DD
Combiner Build Phase
Hadoop
LineageR
DD
pairscodes
error
s
lines
Stage 1Input ID Output
ID
offset1 id1
offset2 id2
offset3 id3
Ke
y
Agg
Value
40
0
2
4 1
Ke
y
Input
IDs
400 { id1,
id3}
4 { id2 }
TaskCont
ext
id3
Combiner Probe Phase
Input records Output records
Input ID Output
ID
Combiner
LineageRDD
pairs
TaskCont
ext
id3
Key Input
IDs
400 { id1,
id3 }
4 { id 2 }
Ke
y
Agg
Value
40
0
2
4 1
Combiner Probe Phase
Input records Output records
Input ID Output
ID
Combiner
LineageRDD
pairs
TaskCont
ext
id3
Key Input
IDs
400 { id1,
id3 }
4 { id 2 }
Ke
y
Agg
Value
40
0
2
4 1
(400, 2)
Combiner Probe Phase
Input records Output records
Input ID Output
ID
Combiner
LineageRDD
pairs
TaskCont
ext
id3
Key Input
IDs
400 { id1,
id3 }
4 { id 2 }
Ke
y
Agg
Value
40
0
2
4 1
(400, 2)
Get output Id
Combiner Probe Phase
Input records Output records
Input ID Output
ID
{id1, id
3}
400
Combiner
LineageRDD
pairs
TaskCont
ext
id3
Key Input
IDs
400 { id1,
id3 }
4 { id 2 }
Ke
y
Agg
Value
40
0
2
4 1
(400, 2)
Get output Id
Combiner Probe Phase
Input records Output records
Input ID Output
ID
{id1, id
3}
400
{ id2 } 4
Combiner
LineageRDD
pairs
TaskCont
ext
id3
Key Input
IDs
400 { id1,
id3 }
4 { id 2 }
Ke
y
Agg
Value
40
0
2
4 1
(4, 1)
Combiner
LineageR
DD
Reducer
LineageR
DD
Instrumented Workflow
Hadoop
LineageR
DD
count
s
pairscodes
error
s
lines
Stage 1
Stage 2
repor
ts
Stage
LineageR
DD
Input ID Output
ID
offset1 id1
TaskConte
xt
Id1
Input ID Output
ID
{ id1, id
3}
400
{ id2 } 4
(400,
2)
(4, 1)
Combiner
LineageR
DD
Reducer
LineageR
DD
Instrumented Workflow
Hadoop
LineageR
DD
count
s
pairscodes
error
s
lines
Stage 1
Stage 2
repor
ts
Stage
LineageR
DD
Input ID Output
ID
offset1 id1
TaskConte
xt
Id1
Input ID Output
ID
{ id1, id
3}
400
{ id2 } 4
(400, (2,
p1))(4, (1,
p1))
Combiner
LineageR
DD
Reducer
LineageR
DD
Instrumented Workflow
Hadoop
LineageR
DD
count
s
pairscodes
error
s
lines
Stage 1
Stage 2
repor
ts
Stage
LineageR
DD
Input ID Output
ID
offset1 id1
TaskConte
xt
Id1
Input ID Output
ID
{ id1, id
3}
400
{ id2 } 4
(400, (2,
p1))(4, (1,
p1))(400, (5,
p2))
…
Combiner
LineageR
DD
Reducer
LineageR
DD
Instrumented Workflow
Hadoop
LineageR
DD
count
s
pairscodes
error
s
lines
Stage 1
Stage 2
repor
ts
Stage
LineageR
DD
Input ID Output
ID
offset1 id1
TaskConte
xt
Id1
Input ID Output
ID
{ id1, id
3}
400
{ id2 } 4
(400, (2,
p1))(4, (1,
p1))(400, (5,
p2))
…
Combiner
LineageR
DD
Reducer
LineageR
DD
Instrumented Workflow
Hadoop
LineageR
DD
count
s
pairscodes
error
s
lines
Stage 1
Stage 2
repor
ts
Stage
LineageR
DD
Input ID Output
ID
offset1 id1
TaskConte
xt
Id1
Input ID Output
ID
{ id1, id
3}
400
{ id2 } 4
(400, (2,
p1))(4, (1,
p1))(400, (5,
p2))
…
TaskConte
xt
400
Input ID Output
ID
[ p1, p2
]
400
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
TaskCont
ext
400
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
TaskCont
ext
400
(Bad request, 7)
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
TaskCont
ext
400
Get input Id
(Bad request, 7)
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
TaskCont
ext
400
Get input Id
(Bad request, 7)
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
400
TaskCont
ext
400
Get input Id
(Bad request, 7)
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
400
TaskCont
ext
400
(Bad request, 7)
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
400
TaskCont
ext
400
Get output Id
(Bad request, 7)
Get output Id
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
400 id1
TaskCont
ext
400
(Bad request, 7)
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
400 id1
4
TaskCont
ext
4
(Failure, 1)
Capturing: StageLineageRDD
Stage
LineageR
DDInput records Output records
Input ID Output
ID
400 id1
4 id2
TaskCont
ext
4
(Failure, 7)

Más contenido relacionado

La actualidad más candente

Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...Holden Karau
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling WaterInteractive Session on Sparkling Water
Interactive Session on Sparkling WaterSri Ambati
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingToni Cebrián
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logsStefan Krawczyk
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applicationsKexin Xie
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)Mathieu Bastian
 
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Keshav Murthy
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_onSri Ambati
 
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
Beyond Shuffling  - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...Beyond Shuffling  - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...Holden Karau
 
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark ApplicationsFuture Processing
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Konrad Malawski
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
Scala introduction
Scala introductionScala introduction
Scala introductionvito jeng
 
Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)Robert Metzger
 

La actualidad más candente (20)

Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling WaterInteractive Session on Sparkling Water
Interactive Session on Sparkling Water
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
 
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on
 
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
Beyond Shuffling  - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...Beyond Shuffling  - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
 
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Scala introduction
Scala introductionScala introduction
Scala introduction
 
Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)
 

Destacado

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...Data Con LA
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Data Con LA
 

Destacado (20)

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
 
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
 
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
 
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
 
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
 
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
 

Similar a Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in Spark, Matteo Interlandi, PostDoc, UCLA

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
PHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & PinbaPHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & PinbaPatrick Allaert
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkRde:code 2017
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovFwdays
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataRui Liu
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataSpark Summit
 
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoScyllaDB
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Data Con LA
 

Similar a Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in Spark, Matteo Interlandi, PostDoc, UCLA (20)

So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
PHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & PinbaPHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & Pinba
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Zone IDA Proc
Zone IDA ProcZone IDA Proc
Zone IDA Proc
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
GCC
GCCGCC
GCC
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
 
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 

Más de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Más de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in Spark, Matteo Interlandi, PostDoc, UCLA