SlideShare a Scribd company logo
1 of 41
Download to read offline
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Stipanicev Zoran, GetYourGuide
Spark Side of the Funnels
#UnifiedDataAnalytics #SparkAISummit
About me
Software engineer for the past 13 years and
started working with data 12 years ago with
Oracle and moved to reporting and BI over the
years. Last 4 years enabling business users at
GetYourGuide to make better decisions with data.
Senior BI Engineer, Data Platform
3
Agenda
1. Intro to GetYourGuide
2. Introduction to Funnels
3. Deep Dive
4. Further Possibilities
5. End Result
4
Intro to GetYourGuide
We make it simple to book and enjoy
incredible experiences
Europe’s largest marketplace
for travel experiences
50k+
Products in 150+
countries
25M+
Tickets sold
$650M+
In VC funding
600+
Strong global team
150+
Traveler nationalities
Introduction to Funnels
8#UnifiedDataAnalytics #SparkAISummit
Requirements
1. Looker as frontend and Spark as backend
2. Respect the order of events
3. Each step can consist of multiple events
4. Anything can happen between two steps of the Funnel
5. Support for Funnel wide and step specific filters
6. Sessions based on Touch-points (for some use cases)
7. Performance: 4 weeks of data in under 60 sec ideally under 30 sec
8. Option to ignore the order of events :)
9#UnifiedDataAnalytics #SparkAISummit
Internal vs External solution
1. Performance
○ External is faster because it’s custom built bottom up
2. Where is the data?
○ With internal solution we are not sending data to 3rd parties
3. Flexibility
○ With internal solution we can join it to all of our internal data
to bring more insights to our stakeholders
○ Adding it to internal dashboards
4. Cost
○ Differs for each company :)
Touchpoints explained
G f
Events Events
Direct
Events
Booking
TOUCH POINTS
Funnel Session 1 Funnel Session 2 Funnel Session 3
Funnel filtering explained
If you define that you want see visitors in the funnel A - B - C - D - E
How can the actual funnel look like?
1) xyzDBz A CyxD B ABAB C BC DE
2) xyzlmnop A CCCC
How many steps do we match in this cases
1) A B C D E
2) A B A B E
3) A B C E D
Deep Dive
How do we build a Funnel
1) We filter for only selected Events
2) Concatenate all the events in a single session into a string
a) We use an alias for each step (A,B,C…)
i) It streamlines the rest of the query
ii) And nothing changed when we added step specific
filters
3) We compare the generated string with the Funnel specified by
filters in our BI tool
Pseudo SQL of the implementation
SELECT CASE … WHEN FROM (
SELECT concat_ws('', collect_list(
CASE
WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’
WHEN event = ‘ProductPage’ AND product_id = ‘123’ THEN ‘B’
...
END)OVER(PARTITION BY session ORDER BY timestamp ROWS …)
AS funnel
// funnel => “ABABBBCDE...”
FROM event_log
WHERE ((event IN (‘LandingPage’,‘HomePage’))
OR (event = ‘ProductPage’ AND product_id = ‘123’) …)
AND date = ...
) t
WHERE t.funnel RLIKE '...'
Pseudo SQL - Innere Where
SELECT CASE … WHEN FROM (
SELECT concat_ws('', collect_list(
CASE
WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’
WHEN event = ‘ProductPage’ AND product_id = ‘123’ THEN ‘B’
...
END)OVER(PARTITION BY session ORDER BY timestamp ROWS …)
AS funnel
FROM event_log
WHERE 1=1
AND ((event IN (‘LandingPage’,‘HomePage’))
OR (event = ‘ProductPage’ AND product_id = ‘123’) …)
AND date BETWEEN ...
) t
WHERE t.funnel RLIKE '...'
Explain plan before the fix
+ - Filter ( ((event = Landing Page) || (event = Home Page)) || ((event = ProductPage) && (product_id = …
+- FileScan parquet … Location: PrunedInMemoryFileIndex[dbfs:…/event=AboutView ... PartitionCount: 1502,
PartitionFilters: [isnotnull(date), (date >= 18135), (date < 18142)], …
What do we see in the explain plan?
● Event partition listed is not for any of the filtered events
● Partition count is really high (date range is 7 days)
● Date partition pruning is applied
Pseudo SQL - Innere Where Fixed
SELECT CASE … WHEN FROM (
SELECT concat_ws('', collect_list(
CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’
….
FROM event_log
WHERE 1=1
AND date BETWEEN ...
AND ( (event IN (‘LandingPage’,‘HomePage’))
OR (event = ‘ProductPage’)
OR (event …) )
AND ( (event IN (‘LandingPage’,‘HomePage’))
OR (event = ‘ProductPage’ AND product_id = ‘123’) …)
) t
WHERE t.funnel RLIKE '...'
Explain plan after the fix
+ - Filter ( ((event = LandingPage) || (event = HomePage)) || ((event = ProductPage) && (product_id = …
+- FileScan parquet … Location: PrunedInMemoryFileIndex[dbfs:…/event=LandingPage ... PartitionCount: 21,
PartitionFilters: [isnotnull(date), (date >= 18135), (date < 18142)], (event_name = Landing…
What do we see in the explain plan?
● Event name partitions are pruned
● Partition count is a lot lower (7 days x 3 events)
● Date partition pruning is still applied
Sample of rows from inner to
outer query
Event Alias Funnel Visitor ID
A BBABABABCBC A1B1
B BBABABABCBC A1B1
C BBABABABCBC A1B1
A BAAA C9D9
B BAAA C9D9
Pseudo SQL - Outer Where
SELECT CASE … WHEN FROM (
SELECT concat_ws('', collect_list(
CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’
….
FROM event_log
WHERE 1=1
AND date BETWEEN ...
) t
WHERE t.funnel RLIKE 'A.*B.*C.*D.*E'
Pseudo SQL - Outer Where
SELECT CASE … WHEN FROM (
SELECT concat_ws('', collect_list(
CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’
….
FROM event_log
WHERE 1=1
AND date BETWEEN ...
) t
WHERE locate('A', funnel) > 0
Pseudo SQL - Outer Select
SELECT CASE
WHEN alias = ‘A’ THEN 1
WHEN alias = ‘B’ AND funnel RLIKE ‘A.*B’ THEN 2
WHEN alias = ‘C’ AND funnel RLIKE ‘A.*B.*C’ THEN 3
ELSE -1 END AS step
FROM (
SELECT concat_ws('', collect_list(
CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’
….
FROM event_log
WHERE 1=1
AND date BETWEEN ...
) t
WHERE locate('A', funnel) > 0
Pseudo SQL - Outer Select
SELECT CASE
WHEN alias = ‘A’ THEN 1
WHEN alias = ‘B’ AND locate('B', funnel, locate('A', funnel)) > 0 THEN 2
WHEN alias = ‘C’ AND locate('C', funnel, locate('B', funnel, locate('A', funnel))) > 0
THEN 3
ELSE -1 END AS step
FROM (
SELECT concat_ws('', collect_list(
CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’
….
FROM event_log
WHERE 1=1
AND date BETWEEN ...
) t
WHERE locate('A', funnel) > 0
Sample of rows from outer to
BI tool query
Event Alias Funnel Visitor ID Step
A BBABABABCBC A1B1 1
B BBABABABCBC A1B1 2
C BBABABABCBC A1B1 3
A BAAA C9D9 1
B BAAA C9D9 -1
Further Possibilities
Slicing the funnel
LandingPg
ProductPg
ID = 1
AddToCart LandingPg
Checkout
Funnel
ProductPg
ID = 2
LandingPg
ProductPg
ID = 3
AddToCart
Checkout
Funnel
Slicing the funnel
● First we get all the values that satisfy the filters
● Then we collect them into an array with a window function to
apply them to every step of the funnel
● Last step is to explode using LATERAL VIEW to support multiple
dimensions
● And now we can expose it as a dimension to users
Slicing – Inner select
SELECT CASE … WHEN FROM (
SELECT …
collect_set(CASE WHEN product_id IN (1,2,3) THEN product_id END)
OVER(PARTITION BY session ORDER BY timestamp ROWS …)
AS product_id_array // Distinct values
…
FROM event_log
WHERE ((event IN (‘LandingPage’,‘HomePage’))
OR (event = ‘ProductPage’ AND product_id = ‘123’) …)
AND date = ...
) t
WHERE t.funnel RLIKE '...'
Sample of rows from inner to
outer query
Event Alias Funnel Visitor ID Product Array
A BBABABABCBC A1B1 [1,2]
A BAAA C9D9 [3]
Slicing – Outer query
SELECT visitor_id …
, product_id
, CASE WHEN …
…
FROM ( /* INNER QUERY */ )
LATERAL VIEW OUTER explode (product_id_arr) products AS product_id
LATERAL VIEW OUTER explode (category_id_arr) categories AS category_id
…
● With LATERAL VIEW we can explode multiple arrays
● This will multiply number of rows sent to the BI tool generated outer
query
● Benefit for end users -> they don’t have to run multiple funnels
analysis to get the same data
Sample of rows from outer to
BI tool query
Event Alias Funnel Visitor ID Product ID
A BBABABABCBC A1B1 1
A BBABABABCBC A1B1 2
A BAAA C9D9 3
Further optimisations
We have optimised the query in its current form and there is one
more part that allows further optimisations
FROM event_log
● We are reading data directly from the partitioned table
● We can consider partitioned table as a union of tables where
each partition is a table
● Could we optimise the query by replacing the table with unions?
What would that look like
Further optimisations
SELECT …
FROM (
SELECT *
FROM event_log
WHERE ((event IN (‘LandingPage’,‘HomePage’))
AND date BETWEEN ...
) t
UNION ALL (
SELECT *
FROM event_log e
INNER JOIN (SELECT visitor_id, min(timestamp) as timestamp
FROM event_log
WHERE ((event IN (‘LandingPage’,‘HomePage’)) AND date BETWEEN ...)
GROUP BY 1
) s1 ON e.visitor_id = s1.visitor_id AND e.timestamp >= s1.timestamp …
WHERE event = ‘ProductPage’ AND product_id = ‘123’ AND date BETWEEN ...
) t
End result
End Result
Screen shots
Screen shots
End Result
Screen shots
How to ensure performance in the BI tool
1. We are using columnar storage for our data
2. Therefore we are using a feature to modify the generated SQL
a. To select only needed fields
b. And to include only the joins needed for those fields
For simpler use cases BI tools provide this out of the box and since we
need to use query with subqueries we had to use additional feature
which allows us to modify custom queries
Thank you
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot

『이펙티브 디버깅』 - 디버깅 지옥에서 탈출하는 66가지 전략과 기법
『이펙티브 디버깅』 - 디버깅 지옥에서 탈출하는 66가지 전략과 기법『이펙티브 디버깅』 - 디버깅 지옥에서 탈출하는 66가지 전략과 기법
『이펙티브 디버깅』 - 디버깅 지옥에서 탈출하는 66가지 전략과 기법복연 이
 
オープンソースで提供される第二のJVM:OpenJ9 VMとIBM Javaについて
オープンソースで提供される第二のJVM:OpenJ9 VMとIBM Javaについてオープンソースで提供される第二のJVM:OpenJ9 VMとIBM Javaについて
オープンソースで提供される第二のJVM:OpenJ9 VMとIBM JavaについてTakakiyo Tanaka
 
Linux/DB Tuning (DevSumi2010, Japanese)
Linux/DB Tuning (DevSumi2010, Japanese)Linux/DB Tuning (DevSumi2010, Japanese)
Linux/DB Tuning (DevSumi2010, Japanese)Yoshinori Matsunobu
 
AngularとSpring Bootで作るSPA + RESTful Web Serviceアプリケーション
AngularとSpring Bootで作るSPA + RESTful Web ServiceアプリケーションAngularとSpring Bootで作るSPA + RESTful Web Serviceアプリケーション
AngularとSpring Bootで作るSPA + RESTful Web Serviceアプリケーションssuser070fa9
 
TIME_WAITに関する話
TIME_WAITに関する話TIME_WAITに関する話
TIME_WAITに関する話Takanori Sejima
 
これで怖くない!?コードリーディングで学ぶSpring Security #中央線Meetup
これで怖くない!?コードリーディングで学ぶSpring Security #中央線Meetupこれで怖くない!?コードリーディングで学ぶSpring Security #中央線Meetup
これで怖くない!?コードリーディングで学ぶSpring Security #中央線MeetupMasatoshi Tada
 
ZabbixによるAWS監視のコツ
ZabbixによるAWS監視のコツZabbixによるAWS監視のコツ
ZabbixによるAWS監視のコツShinsukeYokota
 
Long running processes in DDD
Long running processes in DDDLong running processes in DDD
Long running processes in DDDBernd Ruecker
 
AWSのセキュリティについて
AWSのセキュリティについてAWSのセキュリティについて
AWSのセキュリティについてYasuhiro Horiuchi
 
MySQL5.6と5.7性能比較
MySQL5.6と5.7性能比較MySQL5.6と5.7性能比較
MySQL5.6と5.7性能比較hiroi10
 
分散トレーシングAWS:X-Rayとの上手い付き合い方
分散トレーシングAWS:X-Rayとの上手い付き合い方分散トレーシングAWS:X-Rayとの上手い付き合い方
分散トレーシングAWS:X-Rayとの上手い付き合い方Recruit Lifestyle Co., Ltd.
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMKris Mok
 
Content Management with MongoDB by Mark Helmstetter
 Content Management with MongoDB by Mark Helmstetter Content Management with MongoDB by Mark Helmstetter
Content Management with MongoDB by Mark HelmstetterMongoDB
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側Cloudera Japan
 
MongoDB Configパラメータ解説
MongoDB Configパラメータ解説MongoDB Configパラメータ解説
MongoDB Configパラメータ解説Shoken Fujisaki
 
Azure 클라우드 학생 계정 & Ubuntu VM 셋업 (Mar 2022)
Azure 클라우드 학생 계정 & Ubuntu VM 셋업 (Mar 2022)Azure 클라우드 학생 계정 & Ubuntu VM 셋업 (Mar 2022)
Azure 클라우드 학생 계정 & Ubuntu VM 셋업 (Mar 2022)Ian Choi
 
The Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and SpringThe Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and SpringTimothy Spann
 
Spark autotuning talk final
Spark autotuning talk finalSpark autotuning talk final
Spark autotuning talk finalRachel Warren
 
業務で ISUCON することになった話.pdf
業務で ISUCON することになった話.pdf業務で ISUCON することになった話.pdf
業務で ISUCON することになった話.pdfTakuyaFukuoka2
 

What's hot (20)

『이펙티브 디버깅』 - 디버깅 지옥에서 탈출하는 66가지 전략과 기법
『이펙티브 디버깅』 - 디버깅 지옥에서 탈출하는 66가지 전략과 기법『이펙티브 디버깅』 - 디버깅 지옥에서 탈출하는 66가지 전략과 기법
『이펙티브 디버깅』 - 디버깅 지옥에서 탈출하는 66가지 전략과 기법
 
オープンソースで提供される第二のJVM:OpenJ9 VMとIBM Javaについて
オープンソースで提供される第二のJVM:OpenJ9 VMとIBM Javaについてオープンソースで提供される第二のJVM:OpenJ9 VMとIBM Javaについて
オープンソースで提供される第二のJVM:OpenJ9 VMとIBM Javaについて
 
Linux/DB Tuning (DevSumi2010, Japanese)
Linux/DB Tuning (DevSumi2010, Japanese)Linux/DB Tuning (DevSumi2010, Japanese)
Linux/DB Tuning (DevSumi2010, Japanese)
 
AngularとSpring Bootで作るSPA + RESTful Web Serviceアプリケーション
AngularとSpring Bootで作るSPA + RESTful Web ServiceアプリケーションAngularとSpring Bootで作るSPA + RESTful Web Serviceアプリケーション
AngularとSpring Bootで作るSPA + RESTful Web Serviceアプリケーション
 
TIME_WAITに関する話
TIME_WAITに関する話TIME_WAITに関する話
TIME_WAITに関する話
 
これで怖くない!?コードリーディングで学ぶSpring Security #中央線Meetup
これで怖くない!?コードリーディングで学ぶSpring Security #中央線Meetupこれで怖くない!?コードリーディングで学ぶSpring Security #中央線Meetup
これで怖くない!?コードリーディングで学ぶSpring Security #中央線Meetup
 
ZabbixによるAWS監視のコツ
ZabbixによるAWS監視のコツZabbixによるAWS監視のコツ
ZabbixによるAWS監視のコツ
 
Long running processes in DDD
Long running processes in DDDLong running processes in DDD
Long running processes in DDD
 
AWSのセキュリティについて
AWSのセキュリティについてAWSのセキュリティについて
AWSのセキュリティについて
 
MySQL5.6と5.7性能比較
MySQL5.6と5.7性能比較MySQL5.6と5.7性能比較
MySQL5.6と5.7性能比較
 
分散トレーシングAWS:X-Rayとの上手い付き合い方
分散トレーシングAWS:X-Rayとの上手い付き合い方分散トレーシングAWS:X-Rayとの上手い付き合い方
分散トレーシングAWS:X-Rayとの上手い付き合い方
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VM
 
Content Management with MongoDB by Mark Helmstetter
 Content Management with MongoDB by Mark Helmstetter Content Management with MongoDB by Mark Helmstetter
Content Management with MongoDB by Mark Helmstetter
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
 
MongoDB Configパラメータ解説
MongoDB Configパラメータ解説MongoDB Configパラメータ解説
MongoDB Configパラメータ解説
 
Azure 클라우드 학생 계정 & Ubuntu VM 셋업 (Mar 2022)
Azure 클라우드 학생 계정 & Ubuntu VM 셋업 (Mar 2022)Azure 클라우드 학생 계정 & Ubuntu VM 셋업 (Mar 2022)
Azure 클라우드 학생 계정 & Ubuntu VM 셋업 (Mar 2022)
 
The Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and SpringThe Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and Spring
 
Spark autotuning talk final
Spark autotuning talk finalSpark autotuning talk final
Spark autotuning talk final
 
業務で ISUCON することになった話.pdf
業務で ISUCON することになった話.pdf業務で ISUCON することになった話.pdf
業務で ISUCON することになった話.pdf
 

Similar to Apache Spark Side of Funnels

Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Dan Robinson
 
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Dan Robinson
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubySATOSHI TAGOMORI
 
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...Altinity Ltd
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Patterns and Practices for Event Design With Adam Bellemare | Current 2022
Patterns and Practices for Event Design With Adam Bellemare | Current 2022Patterns and Practices for Event Design With Adam Bellemare | Current 2022
Patterns and Practices for Event Design With Adam Bellemare | Current 2022HostedbyConfluent
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenHuy Nguyen
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache ArrowWes McKinney
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Reportnyin27
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logsStefan Krawczyk
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabRoman
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdfsash236
 
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0WSO2
 
Hidden Docs in Angular
Hidden Docs in AngularHidden Docs in Angular
Hidden Docs in AngularYadong Xie
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And ReportingCloudera, Inc.
 
Couchbase@live person meetup july 22nd
Couchbase@live person meetup   july 22ndCouchbase@live person meetup   july 22nd
Couchbase@live person meetup july 22ndIdo Shilon
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
 

Similar to Apache Spark Side of Funnels (20)

Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
 
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
shiny_v1.pptx
shiny_v1.pptxshiny_v1.pptx
shiny_v1.pptx
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
 
Learning with F#
Learning with F#Learning with F#
Learning with F#
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Patterns and Practices for Event Design With Adam Bellemare | Current 2022
Patterns and Practices for Event Design With Adam Bellemare | Current 2022Patterns and Practices for Event Design With Adam Bellemare | Current 2022
Patterns and Practices for Event Design With Adam Bellemare | Current 2022
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Report
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at Grab
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
 
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
 
Hidden Docs in Angular
Hidden Docs in AngularHidden Docs in Angular
Hidden Docs in Angular
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And Reporting
 
Couchbase@live person meetup july 22nd
Couchbase@live person meetup   july 22ndCouchbase@live person meetup   july 22nd
Couchbase@live person meetup july 22nd
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 

Recently uploaded (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 

Apache Spark Side of Funnels

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Stipanicev Zoran, GetYourGuide Spark Side of the Funnels #UnifiedDataAnalytics #SparkAISummit
  • 3. About me Software engineer for the past 13 years and started working with data 12 years ago with Oracle and moved to reporting and BI over the years. Last 4 years enabling business users at GetYourGuide to make better decisions with data. Senior BI Engineer, Data Platform 3
  • 4. Agenda 1. Intro to GetYourGuide 2. Introduction to Funnels 3. Deep Dive 4. Further Possibilities 5. End Result 4
  • 6. We make it simple to book and enjoy incredible experiences
  • 7. Europe’s largest marketplace for travel experiences 50k+ Products in 150+ countries 25M+ Tickets sold $650M+ In VC funding 600+ Strong global team 150+ Traveler nationalities
  • 9. Requirements 1. Looker as frontend and Spark as backend 2. Respect the order of events 3. Each step can consist of multiple events 4. Anything can happen between two steps of the Funnel 5. Support for Funnel wide and step specific filters 6. Sessions based on Touch-points (for some use cases) 7. Performance: 4 weeks of data in under 60 sec ideally under 30 sec 8. Option to ignore the order of events :) 9#UnifiedDataAnalytics #SparkAISummit
  • 10. Internal vs External solution 1. Performance ○ External is faster because it’s custom built bottom up 2. Where is the data? ○ With internal solution we are not sending data to 3rd parties 3. Flexibility ○ With internal solution we can join it to all of our internal data to bring more insights to our stakeholders ○ Adding it to internal dashboards 4. Cost ○ Differs for each company :)
  • 11. Touchpoints explained G f Events Events Direct Events Booking TOUCH POINTS Funnel Session 1 Funnel Session 2 Funnel Session 3
  • 12. Funnel filtering explained If you define that you want see visitors in the funnel A - B - C - D - E How can the actual funnel look like? 1) xyzDBz A CyxD B ABAB C BC DE 2) xyzlmnop A CCCC How many steps do we match in this cases 1) A B C D E 2) A B A B E 3) A B C E D
  • 14. How do we build a Funnel 1) We filter for only selected Events 2) Concatenate all the events in a single session into a string a) We use an alias for each step (A,B,C…) i) It streamlines the rest of the query ii) And nothing changed when we added step specific filters 3) We compare the generated string with the Funnel specified by filters in our BI tool
  • 15. Pseudo SQL of the implementation SELECT CASE … WHEN FROM ( SELECT concat_ws('', collect_list( CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’ WHEN event = ‘ProductPage’ AND product_id = ‘123’ THEN ‘B’ ... END)OVER(PARTITION BY session ORDER BY timestamp ROWS …) AS funnel // funnel => “ABABBBCDE...” FROM event_log WHERE ((event IN (‘LandingPage’,‘HomePage’)) OR (event = ‘ProductPage’ AND product_id = ‘123’) …) AND date = ... ) t WHERE t.funnel RLIKE '...'
  • 16. Pseudo SQL - Innere Where SELECT CASE … WHEN FROM ( SELECT concat_ws('', collect_list( CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’ WHEN event = ‘ProductPage’ AND product_id = ‘123’ THEN ‘B’ ... END)OVER(PARTITION BY session ORDER BY timestamp ROWS …) AS funnel FROM event_log WHERE 1=1 AND ((event IN (‘LandingPage’,‘HomePage’)) OR (event = ‘ProductPage’ AND product_id = ‘123’) …) AND date BETWEEN ... ) t WHERE t.funnel RLIKE '...'
  • 17. Explain plan before the fix + - Filter ( ((event = Landing Page) || (event = Home Page)) || ((event = ProductPage) && (product_id = … +- FileScan parquet … Location: PrunedInMemoryFileIndex[dbfs:…/event=AboutView ... PartitionCount: 1502, PartitionFilters: [isnotnull(date), (date >= 18135), (date < 18142)], … What do we see in the explain plan? ● Event partition listed is not for any of the filtered events ● Partition count is really high (date range is 7 days) ● Date partition pruning is applied
  • 18. Pseudo SQL - Innere Where Fixed SELECT CASE … WHEN FROM ( SELECT concat_ws('', collect_list( CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’ …. FROM event_log WHERE 1=1 AND date BETWEEN ... AND ( (event IN (‘LandingPage’,‘HomePage’)) OR (event = ‘ProductPage’) OR (event …) ) AND ( (event IN (‘LandingPage’,‘HomePage’)) OR (event = ‘ProductPage’ AND product_id = ‘123’) …) ) t WHERE t.funnel RLIKE '...'
  • 19. Explain plan after the fix + - Filter ( ((event = LandingPage) || (event = HomePage)) || ((event = ProductPage) && (product_id = … +- FileScan parquet … Location: PrunedInMemoryFileIndex[dbfs:…/event=LandingPage ... PartitionCount: 21, PartitionFilters: [isnotnull(date), (date >= 18135), (date < 18142)], (event_name = Landing… What do we see in the explain plan? ● Event name partitions are pruned ● Partition count is a lot lower (7 days x 3 events) ● Date partition pruning is still applied
  • 20. Sample of rows from inner to outer query Event Alias Funnel Visitor ID A BBABABABCBC A1B1 B BBABABABCBC A1B1 C BBABABABCBC A1B1 A BAAA C9D9 B BAAA C9D9
  • 21. Pseudo SQL - Outer Where SELECT CASE … WHEN FROM ( SELECT concat_ws('', collect_list( CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’ …. FROM event_log WHERE 1=1 AND date BETWEEN ... ) t WHERE t.funnel RLIKE 'A.*B.*C.*D.*E'
  • 22. Pseudo SQL - Outer Where SELECT CASE … WHEN FROM ( SELECT concat_ws('', collect_list( CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’ …. FROM event_log WHERE 1=1 AND date BETWEEN ... ) t WHERE locate('A', funnel) > 0
  • 23. Pseudo SQL - Outer Select SELECT CASE WHEN alias = ‘A’ THEN 1 WHEN alias = ‘B’ AND funnel RLIKE ‘A.*B’ THEN 2 WHEN alias = ‘C’ AND funnel RLIKE ‘A.*B.*C’ THEN 3 ELSE -1 END AS step FROM ( SELECT concat_ws('', collect_list( CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’ …. FROM event_log WHERE 1=1 AND date BETWEEN ... ) t WHERE locate('A', funnel) > 0
  • 24. Pseudo SQL - Outer Select SELECT CASE WHEN alias = ‘A’ THEN 1 WHEN alias = ‘B’ AND locate('B', funnel, locate('A', funnel)) > 0 THEN 2 WHEN alias = ‘C’ AND locate('C', funnel, locate('B', funnel, locate('A', funnel))) > 0 THEN 3 ELSE -1 END AS step FROM ( SELECT concat_ws('', collect_list( CASE WHEN event IN (‘LandingPage’,‘HomePage’) THEN ‘A’ …. FROM event_log WHERE 1=1 AND date BETWEEN ... ) t WHERE locate('A', funnel) > 0
  • 25. Sample of rows from outer to BI tool query Event Alias Funnel Visitor ID Step A BBABABABCBC A1B1 1 B BBABABABCBC A1B1 2 C BBABABABCBC A1B1 3 A BAAA C9D9 1 B BAAA C9D9 -1
  • 27. Slicing the funnel LandingPg ProductPg ID = 1 AddToCart LandingPg Checkout Funnel ProductPg ID = 2 LandingPg ProductPg ID = 3 AddToCart Checkout Funnel
  • 28. Slicing the funnel ● First we get all the values that satisfy the filters ● Then we collect them into an array with a window function to apply them to every step of the funnel ● Last step is to explode using LATERAL VIEW to support multiple dimensions ● And now we can expose it as a dimension to users
  • 29. Slicing – Inner select SELECT CASE … WHEN FROM ( SELECT … collect_set(CASE WHEN product_id IN (1,2,3) THEN product_id END) OVER(PARTITION BY session ORDER BY timestamp ROWS …) AS product_id_array // Distinct values … FROM event_log WHERE ((event IN (‘LandingPage’,‘HomePage’)) OR (event = ‘ProductPage’ AND product_id = ‘123’) …) AND date = ... ) t WHERE t.funnel RLIKE '...'
  • 30. Sample of rows from inner to outer query Event Alias Funnel Visitor ID Product Array A BBABABABCBC A1B1 [1,2] A BAAA C9D9 [3]
  • 31. Slicing – Outer query SELECT visitor_id … , product_id , CASE WHEN … … FROM ( /* INNER QUERY */ ) LATERAL VIEW OUTER explode (product_id_arr) products AS product_id LATERAL VIEW OUTER explode (category_id_arr) categories AS category_id … ● With LATERAL VIEW we can explode multiple arrays ● This will multiply number of rows sent to the BI tool generated outer query ● Benefit for end users -> they don’t have to run multiple funnels analysis to get the same data
  • 32. Sample of rows from outer to BI tool query Event Alias Funnel Visitor ID Product ID A BBABABABCBC A1B1 1 A BBABABABCBC A1B1 2 A BAAA C9D9 3
  • 33. Further optimisations We have optimised the query in its current form and there is one more part that allows further optimisations FROM event_log ● We are reading data directly from the partitioned table ● We can consider partitioned table as a union of tables where each partition is a table ● Could we optimise the query by replacing the table with unions? What would that look like
  • 34. Further optimisations SELECT … FROM ( SELECT * FROM event_log WHERE ((event IN (‘LandingPage’,‘HomePage’)) AND date BETWEEN ... ) t UNION ALL ( SELECT * FROM event_log e INNER JOIN (SELECT visitor_id, min(timestamp) as timestamp FROM event_log WHERE ((event IN (‘LandingPage’,‘HomePage’)) AND date BETWEEN ...) GROUP BY 1 ) s1 ON e.visitor_id = s1.visitor_id AND e.timestamp >= s1.timestamp … WHERE event = ‘ProductPage’ AND product_id = ‘123’ AND date BETWEEN ... ) t
  • 39. How to ensure performance in the BI tool 1. We are using columnar storage for our data 2. Therefore we are using a feature to modify the generated SQL a. To select only needed fields b. And to include only the joins needed for those fields For simpler use cases BI tools provide this out of the box and since we need to use query with subqueries we had to use additional feature which allows us to modify custom queries
  • 41. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT