SlideShare una empresa de Scribd logo
1 de 23
Let’s introduce Amazon Kinesis
Inaugural meetup of the
Amazon Kinesis - London User Group
This evening
•

Introducing Amazon Kinesis, Ian Meyers, AWS

•

Pizza and drinks break

•

Kinesis and Snowplow, Alex Dean, Snowplow Analytics

•

Drinks

•

All courtesy of our hosts:
Introducing Amazon Kinesis
Snowplow and Kinesis

1.

Snowplow – who we are

2.

Why are we excited about Kinesis?

3.

Adding Kinesis support to Snowplow

4.

Live demo!

5.

Questions
Snowplow – who we are
Today, Snowplow is primarily an open source web analytics
platform
Snowplow: data pipeline
Website / webapp
Amazon S3

Collect

Transform
and enrich

Amazon
Redshift /
PostgreSQL

• Your granular, event-level and customer-level data,
in your own data warehouse
• Connect any analytics tool to your data
• Join your web analytics data with any other data set
Snowplow was born out of our frustration with traditional web
analytics tools…
• Limited set of reports that don’t answer business questions
•
•
•
•

Traffic levels by source
Conversion levels
Bounce rates
Pages / visit

• Web analytics tools don’t understand the entities that
matter to business
• Customers, intentions, behaviours, articles, videos, authors,
subjects, services…
• …vs pages, conversions, goals, clicks, transactions

• Web analytics tools are siloed
• Hard to integrate with other data sets incl. digital (marketing
spend, ad server data), customer data (CRM), financial data
(cost of goods, customer lifetime value)
…and out of the opportunities to tame big data new
technologies presented

These tools make it possible to capture, transform, store and analyse all your
granular, event-level data, to you can perform any analysis
Snowplow is composed of a set of loosely coupled subsystems,
architected to be robust and scalable
1. Trackers

A

2. Collectors

B

3. Enrich

C

4. Storage

D

5. Analytics

Generate event
data

Receive data
from trackers
and log it to S3

Clean and
enrich raw data

Store data
ready for
analysis

Examples:
• Javascript
tracker
• Python /
Lua / No-JS
/ Arduino
tracker

Examples:
• Cloudfront
collector
• Clojure
collector for
Amazon EB

Built on
Scalding /
Cascading /
Hadoop and
powered by
Amazon EMR

Examples:
• Amazon
Redshift
• PostgreSQL
• Amazon S3

• Batch-based A D Standardised data protocols
• Normally run overnight; sometimes
every 4-6 hours
Why are we excited about
Kinesis?
A quick history lesson: the three eras of business data processing

1.

The classic era, 1996+

2.

The hybrid era, 2005+

3.

The unified era, 2013+

For more see http://snowplowanalytics.com/blog/2014/01/20/the-three-eras-of-business-data-processing/
The classic era, 1996+
OWN DATA CENTER
NARROW DATA SILOES

LOW LATENCY LOCAL LOOPS

Point-to-point
connections

CMS

E-comm

Local loop

ERP

Local loop

Silo

CRM

Local loop

Silo

Local loop

Silo

Nightly batch ETL process

HIGH LATENCY
WIDE DATA
COVERAGE

Management
reporting

Data warehouse
FULL DATA
HISTORY

Silo
The hybrid era, 2005+
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

Search

LOW LATENCY LOCAL LOOPS

CMS

Local loop

SAAS VENDOR #1

E-comm

Local loop

Silo

Local loop

Silo

APIs

ERP
Local loop

Silo

CRM
Local loop

Silo
Bulk exports
SAAS VENDOR #2

Stream
processing

Micro-batch
processing

Batch
processing

Batch
processing

Email
marketing
Local loop

Product
rec’s
Local loop
LOW LATENCY

Systems
monitoring

Data
warehouse

Hadoop
SAAS VENDOR #3

Local loop
LOW LATENCY

Management
reporting
HIGH LATENCY

Ad hoc
analytics
HIGH LATENCY

Web
analytics
Local loop
The unified era, 2013+
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

SOME LOW LATENCY LOCAL LOOPS

Search

CMS
Silo

E-comm
Silo

APIs

ERP
Silo

LOW LATENCY

Streaming APIs /
web hooks
WIDE DATA

SAAS VENDOR #2

COVERAGE

Unified log

Email
marketing

FEW DAYS’
DATA HISTORY

Hadoop

HIGH LATENCY

< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >

CRM

Silo

Eventstream

Archiving

SAAS VENDOR #1

Ad hoc
analytics

Product rec’s

Systems
monitoring

Management
reporting

Fraud
detection

Churn
prevention

LOW LATENCY
The unified log is Kinesis (or Kafka)
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

Search

SAAS VENDOR #1

SOME LOW LATENCY LOCAL LOOPS

CMS
Silo

E-comm
Silo

APIs

ERP
Silo

CRM

Silo
Streaming APIs /
web hooks

Eventstream

SAAS VENDOR #2

Unified log

Archiving

Hadoop

HIGH LATENCY

< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >

Email
marketing

Ad hoc
analytics

Product rec’s

Systems
monitoring

Management
reporting

Fraud
detection

Churn
prevention

LOW LATENCY
Can we implement Snowplow on top of Kinesis?
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

Search

SAAS VENDOR #1

SOME LOW LATENCY LOCAL LOOPS

CMS
Silo

E-comm
Silo

APIs

ERP
Silo

CRM

Silo
Streaming APIs /
web hooks

Eventstream

SAAS VENDOR #2

Unified log

Archiving

Hadoop

HIGH LATENCY

< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >

Email
marketing

Ad hoc
analytics

Product rec’s

Systems
monitoring

Management
reporting

Fraud
detection

Churn
prevention

LOW LATENCY
Adding Kinesis support to
Snowplow
Where we are heading with our Kinesis architecture
Snowplow
Trackers

Scala Stream
Collector

Raw event
stream

S3 sink
Kinesis app

S3

Enrich
Kinesis app

Enriched
event
stream

Redshift
sink Kinesis
app

Redshift

Bad raw
events
stream
We took an important first step in our last release…

0.8.12

pre-0.8.12

hadoop-etl

scala-hadoopenrich

scala-kinesis-enrich

Record-level
enrichment
functionality
scala-common-enrich
… and the next release should get us much closer
Snowplow
Trackers

Scala Stream
Collector

Raw event
stream

S3 sink Kinesis
app

S3

Enrich
Kinesis app

Enriched
event
stream

Redshift sink
Kinesis app

Redshift

Bad raw
events stream
Live demo!
Questions?

http://snowplowanalytics.com
https://github.com/snowplow/snowplow
@snowplowdata
And finally…

Huge thanks to our hosts!

Más contenido relacionado

La actualidad más candente

What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesAlexander Dean
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
Big data meetup budapest adding data schemas to snowplow
Big data meetup budapest   adding data schemas to snowplowBig data meetup budapest   adding data schemas to snowplow
Big data meetup budapest adding data schemas to snowplowyalisassoon
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Thoughtworks
 
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registryconfluent
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Amazon Web Services
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...HostedbyConfluent
 
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...Lucas Jellema
 
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...Lucas Jellema
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020confluent
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Lightbend
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesSwami Sundaramurthy
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Amazon Web Services
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modelingyalisassoon
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Beconfluent
 

La actualidad más candente (20)

What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registries
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
Big data meetup budapest adding data schemas to snowplow
Big data meetup budapest   adding data schemas to snowplowBig data meetup budapest   adding data schemas to snowplow
Big data meetup budapest adding data schemas to snowplow
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
 
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
 
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
 
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Be
 

Similar a Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Amazon Web Services
 
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and AnalyticsAmazon Web Services
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersAmazon Web Services
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Amazon Web Services
 
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...Amazon Web Services
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream ProcessingLuis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingBEEVA_es
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analyticsSebastian Montini
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享Amazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionJean-Claude Sotto
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutionsClaudio Pontili
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 

Similar a Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group (20)

Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million Users
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
 
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analytics
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 

Último

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

  • 1. Let’s introduce Amazon Kinesis Inaugural meetup of the Amazon Kinesis - London User Group
  • 2. This evening • Introducing Amazon Kinesis, Ian Meyers, AWS • Pizza and drinks break • Kinesis and Snowplow, Alex Dean, Snowplow Analytics • Drinks • All courtesy of our hosts:
  • 4. Snowplow and Kinesis 1. Snowplow – who we are 2. Why are we excited about Kinesis? 3. Adding Kinesis support to Snowplow 4. Live demo! 5. Questions
  • 6. Today, Snowplow is primarily an open source web analytics platform Snowplow: data pipeline Website / webapp Amazon S3 Collect Transform and enrich Amazon Redshift / PostgreSQL • Your granular, event-level and customer-level data, in your own data warehouse • Connect any analytics tool to your data • Join your web analytics data with any other data set
  • 7. Snowplow was born out of our frustration with traditional web analytics tools… • Limited set of reports that don’t answer business questions • • • • Traffic levels by source Conversion levels Bounce rates Pages / visit • Web analytics tools don’t understand the entities that matter to business • Customers, intentions, behaviours, articles, videos, authors, subjects, services… • …vs pages, conversions, goals, clicks, transactions • Web analytics tools are siloed • Hard to integrate with other data sets incl. digital (marketing spend, ad server data), customer data (CRM), financial data (cost of goods, customer lifetime value)
  • 8. …and out of the opportunities to tame big data new technologies presented These tools make it possible to capture, transform, store and analyse all your granular, event-level data, to you can perform any analysis
  • 9. Snowplow is composed of a set of loosely coupled subsystems, architected to be robust and scalable 1. Trackers A 2. Collectors B 3. Enrich C 4. Storage D 5. Analytics Generate event data Receive data from trackers and log it to S3 Clean and enrich raw data Store data ready for analysis Examples: • Javascript tracker • Python / Lua / No-JS / Arduino tracker Examples: • Cloudfront collector • Clojure collector for Amazon EB Built on Scalding / Cascading / Hadoop and powered by Amazon EMR Examples: • Amazon Redshift • PostgreSQL • Amazon S3 • Batch-based A D Standardised data protocols • Normally run overnight; sometimes every 4-6 hours
  • 10. Why are we excited about Kinesis?
  • 11. A quick history lesson: the three eras of business data processing 1. The classic era, 1996+ 2. The hybrid era, 2005+ 3. The unified era, 2013+ For more see http://snowplowanalytics.com/blog/2014/01/20/the-three-eras-of-business-data-processing/
  • 12. The classic era, 1996+ OWN DATA CENTER NARROW DATA SILOES LOW LATENCY LOCAL LOOPS Point-to-point connections CMS E-comm Local loop ERP Local loop Silo CRM Local loop Silo Local loop Silo Nightly batch ETL process HIGH LATENCY WIDE DATA COVERAGE Management reporting Data warehouse FULL DATA HISTORY Silo
  • 13. The hybrid era, 2005+ CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES Search LOW LATENCY LOCAL LOOPS CMS Local loop SAAS VENDOR #1 E-comm Local loop Silo Local loop Silo APIs ERP Local loop Silo CRM Local loop Silo Bulk exports SAAS VENDOR #2 Stream processing Micro-batch processing Batch processing Batch processing Email marketing Local loop Product rec’s Local loop LOW LATENCY Systems monitoring Data warehouse Hadoop SAAS VENDOR #3 Local loop LOW LATENCY Management reporting HIGH LATENCY Ad hoc analytics HIGH LATENCY Web analytics Local loop
  • 14. The unified era, 2013+ CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES SOME LOW LATENCY LOCAL LOOPS Search CMS Silo E-comm Silo APIs ERP Silo LOW LATENCY Streaming APIs / web hooks WIDE DATA SAAS VENDOR #2 COVERAGE Unified log Email marketing FEW DAYS’ DATA HISTORY Hadoop HIGH LATENCY < WIDE DATA COVERAGE > < FULL DATA HISTORY > CRM Silo Eventstream Archiving SAAS VENDOR #1 Ad hoc analytics Product rec’s Systems monitoring Management reporting Fraud detection Churn prevention LOW LATENCY
  • 15. The unified log is Kinesis (or Kafka) CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES Search SAAS VENDOR #1 SOME LOW LATENCY LOCAL LOOPS CMS Silo E-comm Silo APIs ERP Silo CRM Silo Streaming APIs / web hooks Eventstream SAAS VENDOR #2 Unified log Archiving Hadoop HIGH LATENCY < WIDE DATA COVERAGE > < FULL DATA HISTORY > Email marketing Ad hoc analytics Product rec’s Systems monitoring Management reporting Fraud detection Churn prevention LOW LATENCY
  • 16. Can we implement Snowplow on top of Kinesis? CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES Search SAAS VENDOR #1 SOME LOW LATENCY LOCAL LOOPS CMS Silo E-comm Silo APIs ERP Silo CRM Silo Streaming APIs / web hooks Eventstream SAAS VENDOR #2 Unified log Archiving Hadoop HIGH LATENCY < WIDE DATA COVERAGE > < FULL DATA HISTORY > Email marketing Ad hoc analytics Product rec’s Systems monitoring Management reporting Fraud detection Churn prevention LOW LATENCY
  • 17. Adding Kinesis support to Snowplow
  • 18. Where we are heading with our Kinesis architecture Snowplow Trackers Scala Stream Collector Raw event stream S3 sink Kinesis app S3 Enrich Kinesis app Enriched event stream Redshift sink Kinesis app Redshift Bad raw events stream
  • 19. We took an important first step in our last release… 0.8.12 pre-0.8.12 hadoop-etl scala-hadoopenrich scala-kinesis-enrich Record-level enrichment functionality scala-common-enrich
  • 20. … and the next release should get us much closer Snowplow Trackers Scala Stream Collector Raw event stream S3 sink Kinesis app S3 Enrich Kinesis app Enriched event stream Redshift sink Kinesis app Redshift Bad raw events stream
  • 23. And finally… Huge thanks to our hosts!