SlideShare una empresa de Scribd logo
1 de 23
Let’s introduce Amazon Kinesis
Inaugural meetup of the
Amazon Kinesis - London User Group
This evening
•

Introducing Amazon Kinesis, Ian Meyers, AWS

•

Pizza and drinks break

•

Kinesis and Snowplow, Alex Dean, Snowplow Analytics

•

Drinks

•

All courtesy of our hosts:
Introducing Amazon Kinesis
Snowplow and Kinesis

1.

Snowplow – who we are

2.

Why are we excited about Kinesis?

3.

Adding Kinesis support to Snowplow

4.

Live demo!

5.

Questions
Snowplow – who we are
Today, Snowplow is primarily an open source web analytics
platform
Snowplow: data pipeline
Website / webapp
Amazon S3

Collect

Transform
and enrich

Amazon
Redshift /
PostgreSQL

• Your granular, event-level and customer-level data,
in your own data warehouse
• Connect any analytics tool to your data
• Join your web analytics data with any other data set
Snowplow was born out of our frustration with traditional web
analytics tools…
• Limited set of reports that don’t answer business questions
•
•
•
•

Traffic levels by source
Conversion levels
Bounce rates
Pages / visit

• Web analytics tools don’t understand the entities that
matter to business
• Customers, intentions, behaviours, articles, videos, authors,
subjects, services…
• …vs pages, conversions, goals, clicks, transactions

• Web analytics tools are siloed
• Hard to integrate with other data sets incl. digital (marketing
spend, ad server data), customer data (CRM), financial data
(cost of goods, customer lifetime value)
…and out of the opportunities to tame big data new
technologies presented

These tools make it possible to capture, transform, store and analyse all your
granular, event-level data, to you can perform any analysis
Snowplow is composed of a set of loosely coupled subsystems,
architected to be robust and scalable
1. Trackers

A

2. Collectors

B

3. Enrich

C

4. Storage

D

5. Analytics

Generate event
data

Receive data
from trackers
and log it to S3

Clean and
enrich raw data

Store data
ready for
analysis

Examples:
• Javascript
tracker
• Python /
Lua / No-JS
/ Arduino
tracker

Examples:
• Cloudfront
collector
• Clojure
collector for
Amazon EB

Built on
Scalding /
Cascading /
Hadoop and
powered by
Amazon EMR

Examples:
• Amazon
Redshift
• PostgreSQL
• Amazon S3

• Batch-based A D Standardised data protocols
• Normally run overnight; sometimes
every 4-6 hours
Why are we excited about
Kinesis?
A quick history lesson: the three eras of business data processing

1.

The classic era, 1996+

2.

The hybrid era, 2005+

3.

The unified era, 2013+

For more see http://snowplowanalytics.com/blog/2014/01/20/the-three-eras-of-business-data-processing/
The classic era, 1996+
OWN DATA CENTER
NARROW DATA SILOES

LOW LATENCY LOCAL LOOPS

Point-to-point
connections

CMS

E-comm

Local loop

ERP

Local loop

Silo

CRM

Local loop

Silo

Local loop

Silo

Nightly batch ETL process

HIGH LATENCY
WIDE DATA
COVERAGE

Management
reporting

Data warehouse
FULL DATA
HISTORY

Silo
The hybrid era, 2005+
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

Search

LOW LATENCY LOCAL LOOPS

CMS

Local loop

SAAS VENDOR #1

E-comm

Local loop

Silo

Local loop

Silo

APIs

ERP
Local loop

Silo

CRM
Local loop

Silo
Bulk exports
SAAS VENDOR #2

Stream
processing

Micro-batch
processing

Batch
processing

Batch
processing

Email
marketing
Local loop

Product
rec’s
Local loop
LOW LATENCY

Systems
monitoring

Data
warehouse

Hadoop
SAAS VENDOR #3

Local loop
LOW LATENCY

Management
reporting
HIGH LATENCY

Ad hoc
analytics
HIGH LATENCY

Web
analytics
Local loop
The unified era, 2013+
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

SOME LOW LATENCY LOCAL LOOPS

Search

CMS
Silo

E-comm
Silo

APIs

ERP
Silo

LOW LATENCY

Streaming APIs /
web hooks
WIDE DATA

SAAS VENDOR #2

COVERAGE

Unified log

Email
marketing

FEW DAYS’
DATA HISTORY

Hadoop

HIGH LATENCY

< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >

CRM

Silo

Eventstream

Archiving

SAAS VENDOR #1

Ad hoc
analytics

Product rec’s

Systems
monitoring

Management
reporting

Fraud
detection

Churn
prevention

LOW LATENCY
The unified log is Kinesis (or Kafka)
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

Search

SAAS VENDOR #1

SOME LOW LATENCY LOCAL LOOPS

CMS
Silo

E-comm
Silo

APIs

ERP
Silo

CRM

Silo
Streaming APIs /
web hooks

Eventstream

SAAS VENDOR #2

Unified log

Archiving

Hadoop

HIGH LATENCY

< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >

Email
marketing

Ad hoc
analytics

Product rec’s

Systems
monitoring

Management
reporting

Fraud
detection

Churn
prevention

LOW LATENCY
Can we implement Snowplow on top of Kinesis?
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

Search

SAAS VENDOR #1

SOME LOW LATENCY LOCAL LOOPS

CMS
Silo

E-comm
Silo

APIs

ERP
Silo

CRM

Silo
Streaming APIs /
web hooks

Eventstream

SAAS VENDOR #2

Unified log

Archiving

Hadoop

HIGH LATENCY

< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >

Email
marketing

Ad hoc
analytics

Product rec’s

Systems
monitoring

Management
reporting

Fraud
detection

Churn
prevention

LOW LATENCY
Adding Kinesis support to
Snowplow
Where we are heading with our Kinesis architecture
Snowplow
Trackers

Scala Stream
Collector

Raw event
stream

S3 sink
Kinesis app

S3

Enrich
Kinesis app

Enriched
event
stream

Redshift
sink Kinesis
app

Redshift

Bad raw
events
stream
We took an important first step in our last release…

0.8.12

pre-0.8.12

hadoop-etl

scala-hadoopenrich

scala-kinesis-enrich

Record-level
enrichment
functionality
scala-common-enrich
… and the next release should get us much closer
Snowplow
Trackers

Scala Stream
Collector

Raw event
stream

S3 sink Kinesis
app

S3

Enrich
Kinesis app

Enriched
event
stream

Redshift sink
Kinesis app

Redshift

Bad raw
events stream
Live demo!
Questions?

http://snowplowanalytics.com
https://github.com/snowplow/snowplow
@snowplowdata
And finally…

Huge thanks to our hosts!

Más contenido relacionado

La actualidad más candente

What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesAlexander Dean
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
Big data meetup budapest adding data schemas to snowplow
Big data meetup budapest   adding data schemas to snowplowBig data meetup budapest   adding data schemas to snowplow
Big data meetup budapest adding data schemas to snowplowyalisassoon
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Thoughtworks
 
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registryconfluent
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Amazon Web Services
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...HostedbyConfluent
 
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...Lucas Jellema
 
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...Lucas Jellema
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020confluent
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Lightbend
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesSwami Sundaramurthy
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Amazon Web Services
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modelingyalisassoon
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Beconfluent
 

La actualidad más candente (20)

What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registries
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
Big data meetup budapest adding data schemas to snowplow
Big data meetup budapest   adding data schemas to snowplowBig data meetup budapest   adding data schemas to snowplow
Big data meetup budapest adding data schemas to snowplow
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
 
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
 
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
 
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
Soaring through the Clouds - World Record Oracle PaaS Cloud - Friday Cloud Up...
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Be
 

Similar a Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Amazon Web Services
 
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and AnalyticsAmazon Web Services
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersAmazon Web Services
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Amazon Web Services
 
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...Amazon Web Services
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream ProcessingLuis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingBEEVA_es
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analyticsSebastian Montini
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享Amazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionJean-Claude Sotto
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutionsClaudio Pontili
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 

Similar a Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group (20)

Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million Users
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
 
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
Drinking our own Champagne: How Woot, an Amazon subsidiary, uses AWS (ARC212)...
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analytics
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 

Último

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

  • 1. Let’s introduce Amazon Kinesis Inaugural meetup of the Amazon Kinesis - London User Group
  • 2. This evening • Introducing Amazon Kinesis, Ian Meyers, AWS • Pizza and drinks break • Kinesis and Snowplow, Alex Dean, Snowplow Analytics • Drinks • All courtesy of our hosts:
  • 4. Snowplow and Kinesis 1. Snowplow – who we are 2. Why are we excited about Kinesis? 3. Adding Kinesis support to Snowplow 4. Live demo! 5. Questions
  • 6. Today, Snowplow is primarily an open source web analytics platform Snowplow: data pipeline Website / webapp Amazon S3 Collect Transform and enrich Amazon Redshift / PostgreSQL • Your granular, event-level and customer-level data, in your own data warehouse • Connect any analytics tool to your data • Join your web analytics data with any other data set
  • 7. Snowplow was born out of our frustration with traditional web analytics tools… • Limited set of reports that don’t answer business questions • • • • Traffic levels by source Conversion levels Bounce rates Pages / visit • Web analytics tools don’t understand the entities that matter to business • Customers, intentions, behaviours, articles, videos, authors, subjects, services… • …vs pages, conversions, goals, clicks, transactions • Web analytics tools are siloed • Hard to integrate with other data sets incl. digital (marketing spend, ad server data), customer data (CRM), financial data (cost of goods, customer lifetime value)
  • 8. …and out of the opportunities to tame big data new technologies presented These tools make it possible to capture, transform, store and analyse all your granular, event-level data, to you can perform any analysis
  • 9. Snowplow is composed of a set of loosely coupled subsystems, architected to be robust and scalable 1. Trackers A 2. Collectors B 3. Enrich C 4. Storage D 5. Analytics Generate event data Receive data from trackers and log it to S3 Clean and enrich raw data Store data ready for analysis Examples: • Javascript tracker • Python / Lua / No-JS / Arduino tracker Examples: • Cloudfront collector • Clojure collector for Amazon EB Built on Scalding / Cascading / Hadoop and powered by Amazon EMR Examples: • Amazon Redshift • PostgreSQL • Amazon S3 • Batch-based A D Standardised data protocols • Normally run overnight; sometimes every 4-6 hours
  • 10. Why are we excited about Kinesis?
  • 11. A quick history lesson: the three eras of business data processing 1. The classic era, 1996+ 2. The hybrid era, 2005+ 3. The unified era, 2013+ For more see http://snowplowanalytics.com/blog/2014/01/20/the-three-eras-of-business-data-processing/
  • 12. The classic era, 1996+ OWN DATA CENTER NARROW DATA SILOES LOW LATENCY LOCAL LOOPS Point-to-point connections CMS E-comm Local loop ERP Local loop Silo CRM Local loop Silo Local loop Silo Nightly batch ETL process HIGH LATENCY WIDE DATA COVERAGE Management reporting Data warehouse FULL DATA HISTORY Silo
  • 13. The hybrid era, 2005+ CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES Search LOW LATENCY LOCAL LOOPS CMS Local loop SAAS VENDOR #1 E-comm Local loop Silo Local loop Silo APIs ERP Local loop Silo CRM Local loop Silo Bulk exports SAAS VENDOR #2 Stream processing Micro-batch processing Batch processing Batch processing Email marketing Local loop Product rec’s Local loop LOW LATENCY Systems monitoring Data warehouse Hadoop SAAS VENDOR #3 Local loop LOW LATENCY Management reporting HIGH LATENCY Ad hoc analytics HIGH LATENCY Web analytics Local loop
  • 14. The unified era, 2013+ CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES SOME LOW LATENCY LOCAL LOOPS Search CMS Silo E-comm Silo APIs ERP Silo LOW LATENCY Streaming APIs / web hooks WIDE DATA SAAS VENDOR #2 COVERAGE Unified log Email marketing FEW DAYS’ DATA HISTORY Hadoop HIGH LATENCY < WIDE DATA COVERAGE > < FULL DATA HISTORY > CRM Silo Eventstream Archiving SAAS VENDOR #1 Ad hoc analytics Product rec’s Systems monitoring Management reporting Fraud detection Churn prevention LOW LATENCY
  • 15. The unified log is Kinesis (or Kafka) CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES Search SAAS VENDOR #1 SOME LOW LATENCY LOCAL LOOPS CMS Silo E-comm Silo APIs ERP Silo CRM Silo Streaming APIs / web hooks Eventstream SAAS VENDOR #2 Unified log Archiving Hadoop HIGH LATENCY < WIDE DATA COVERAGE > < FULL DATA HISTORY > Email marketing Ad hoc analytics Product rec’s Systems monitoring Management reporting Fraud detection Churn prevention LOW LATENCY
  • 16. Can we implement Snowplow on top of Kinesis? CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES Search SAAS VENDOR #1 SOME LOW LATENCY LOCAL LOOPS CMS Silo E-comm Silo APIs ERP Silo CRM Silo Streaming APIs / web hooks Eventstream SAAS VENDOR #2 Unified log Archiving Hadoop HIGH LATENCY < WIDE DATA COVERAGE > < FULL DATA HISTORY > Email marketing Ad hoc analytics Product rec’s Systems monitoring Management reporting Fraud detection Churn prevention LOW LATENCY
  • 17. Adding Kinesis support to Snowplow
  • 18. Where we are heading with our Kinesis architecture Snowplow Trackers Scala Stream Collector Raw event stream S3 sink Kinesis app S3 Enrich Kinesis app Enriched event stream Redshift sink Kinesis app Redshift Bad raw events stream
  • 19. We took an important first step in our last release… 0.8.12 pre-0.8.12 hadoop-etl scala-hadoopenrich scala-kinesis-enrich Record-level enrichment functionality scala-common-enrich
  • 20. … and the next release should get us much closer Snowplow Trackers Scala Stream Collector Raw event stream S3 sink Kinesis app S3 Enrich Kinesis app Enriched event stream Redshift sink Kinesis app Redshift Bad raw events stream
  • 23. And finally… Huge thanks to our hosts!