SlideShare una empresa de Scribd logo
1 de 31
Extending Data Lake using the Lambda Architecture
June 2015
Dr. William Kornfeld – R& D Director Think Big, a Teradata company
Peyman Mohajerian – UDA Architecture COE, Teradata
Agenda
 Considerations for choosing a real-time architecture
 Use cases
• What does it mean to be a real-time architecture?
• What are the use cases that real-time architecture serves?
• When would it be a mistake to use a real-time architecture?
• What are useful design patterns for implementing real-time
architectures (including lambda)?
Introduction
3
What is “Real Time”?
4
Data StoreData In Info Out
Generally means something is happening in seconds, not minutes or
hours.
What is “Real Time”?
5
Data StoreData In Info Out
Generally means something is happening in second or so, not minutes or
hours.
Push or
Pull
What is “Real Time”?
6
Data StoreData In Info Out
Generally means something is happening in a second give-or-take, not
minutes or hours.
Push or
Pull
For purposes of this talk, “Real Time” is measuring from Data In through Info
Out.
 The significant component of
each individual message
coming in is stored.
 Example:
- Individual prescription records to
be retrieved.
 Each of the messages coming
in contriburtes to one or more
aggregates.
 Example:
- Number of prescriptions for
penicillin on June 9, 2015
Two General Classes of Information for Storage and
Retrieval
7
Atomic Aggregate
• Question to ask: If a new message comes in, do I need to be able to
see or react to it nearly immediately?
• Case 1: A message represents a doctor ordering a prescription.
• Case 2: A message represents a student completing the SAT with a
certain score.
Atomic Retrieval
8
• Some aggregate types make sense in real time as an instantaneous
snapshot at the present moment.
• The “real time” value of some aggregate types are really an estimate
of the value of something at some indeterminate time in the past.
• Some aggregate types lose their meaning as real-time values.
• Some real time processes can be enabled by batch aggregates.
Aggregate Retrieval
9
• Includes sums and counts.
• Examples:
− Dollars of revenue earned so far today
− Number of prescriptions for penicillin written today
Aggregates with Instantaneous Meaning in Real
Time
10
• Includes aggregates which are ratios.
• Examples
− Click-through rate on an ad
− Conversion rate on an email marketing campaign
− Percent of prescriptions filled
Aggregates Whose Current Value may not be an
accurate reflection of what is happening NOW
11
• Includes aggregates which are ratios.
• Examples
− Click-through rate on an ad
− Conversion rate on an email marketing campaign
− Percent of prescriptions filled
Aggregates Whose Current Value may not be an
accurate reflection of what is happening NOW
12
Now
• Includes Unique User Counts
• Well-defined meaning only on intervals
Aggregates that Have no Instaneous Meaning
13
Joe
Ken
Sue
Fred
Jane
Bob
Joe
Ken
Joe
Fred
Joe
Real Time Aggregate Update Can be Significantly
More Expensive Than Batch
14
Web
Server
PC/Male
PC/Female
Mac/Male
Mac/Female
PC
Mac
Male
Female
Everyone
Real Time Aggregate Update Can be Significantly
More Expensive Than Batch
15
Web
Server
PC/Male
PC/Female
Mac/Male
Mac/Female
PC
Mac
Male
Female
Everyone
Real Time Processes that Use Batch Aggregates
16
Data
Model
Periodically
Rebuild
Web
Server
Suppose your Information Can be Real Time, Should
You Use a Real TIme Architecture?
17
Real World
Big Data
System
Do you need to know about or react to changes in the Real World
within a couple of minutes of the changes?
• There are use cases for both batch and real-time data processing.
• Batch tools are stabler; less subject to frequent revision.
• Real-time architectures can be significantly more expensive.
• Many systems will have some of each.
Real Time vs. Batch
18
Lambda Architecture
19
Streaming
Batch
Serving
Stream
Serving
Batch
Kappa Architecture
20
Streaming
Serving
StreamKafka
Mu Architecture
21
Streaming
Batch
Serving
Real-Time Use Cases
 Lambda Architecture
- Medical: Patient Critical Care
 Event Driven Architecture
- Marketing: Customer Engagement
Why Big Data?
Challenges in Medical Data
Health data tends to be “wide”, not “deep”
New data types are becoming more important
Unstructured
Real-time streaming
A challenge to generally move from retrospective “BI”
viewing to event-based and predictive analytics usage
Multiple layers
Lots of events, data
Complex
Lots of different languages and data structures
Difficult to maintain
Lots of moving pieces/components/technologies
Lots of changes in the business
Project
Optimize an existing Natural Language Processing pipeline
in support of critical Colorectal Surgery
(Move to tens of thousands of documents processed)
Replace an existing free-text search facility used by Clinical
Web Service for cancer
(Move search to milliseconds)
Overall Architecture
 Current Storm throughput up to 1.5 million documents per hour
 Average of 140,000 HL7 messages actually processed per day with average latency
of 60 milliseconds from ingest to persistence
 Average of 50,000 documents passed through annotators per day versus 5,000
historically
 Actual annotations of documents up to 6 times faster than previously accomplished
 Free-text search use cases that took over 30 minutes on old infrastructure completing
in milliseconds in ElasticSearch
Operational Statistics
Applications Deliver the Company’s Brand and Customer
Experience
Social Media
The Customer Marketing
Channels
Mobile Apps
Devices &
Form-factors
• Entirety of applications combine to deliver
the full customer experience
• Today they are mostly designed in a silo’d
manner
• Applications are not designed to solicit and
extract customer experience data well
• At the core of application design should be
the considerations for obtaining and
delivering information about the customer
experience
The Customer Experience Universe
Day 1 Day 3 Day 7 Day 17 Day 21 Day 25
IM Campaign Fragment Email Campaign Fragment Customers Services Fragment
PaidSearch
LandingPage
CreateAccount
TXN
AttachedCC
EmailSent
EmailOpened
EmailLinkClicked
EmailClicked
AccountLogin
BannerAd1Impression
BannerAd2Impression
AddBank
EmailSent
EmailSent
TXN
AccountLogin
HelpCenter
EnterDispute
C.S.EmailSent
EmailOpened
EmailLinkClicked
HelpCenterHP
DisputePage
VirtualAgent
CallsIntoIVR
IVR:DisputeWorkflow
TransferredtoAgent
DisputeResolved
C.S.SurveyEmailed
Social Media
The Customer
Marketing
Channels
Mobile Apps
Devices &
Form-factors
A universe of customer experience data:
• Create threads
• Build graphs
• Identify patterns
Event Analytics Ecosystem
Social
Media
Email
Marketing
Display
Marketing
Website
Activity
Customer
Account
Products
Transactions
Customer
Care
Event Repository
EAP Metadata Dictionary & Library
Core Event Dictionary, Library &
Data Source Adapters
Custom Business Event
Dictionary & Library
Machine Learning
Customer Experience
Best Offers
Digital Marketing
Applications
ReportingHigh Speed Query & Reporting APIs
Guided UI Driven Analytics
Funnel
Path
Graph
Guided UI
Funnel & Path
Processing
Functions
Graph
Engine &
Functions
Business Analyst
Business Analyst
Event Analytics Ecosystem
EAP Metadata Dictionary & Library
Core Event Dictionary, Library & Data Source Adapters Custom Business Event Dictionary & Library
Event Repository
Offers
Best Offers
Machine Learning
A/B Testing
Reporting
High Speed Query & Reporting APIs
Guided UI Driven Analytics
Funnel
Path
Graph
Guided UI
Funnel & Path
Processing
Functions
Graph
Engine &
Functions
Business Analyst
Business Analyst
Product, Customer and
Transaction Data
Mobile
Apps
Web Site
Activity
Social
Media
Display &
Search
Marketing
Customer
State
eComm
Customer
Care
3rd Party
Tracking
Batch Ingest
Data Dictionary
Event Pattern
Matching & Scoring
Decisioning
Buffer
Serve
LWIftp
Aster Analytic Engine
Event Metadata Dictionary
Guided
UI
Funnel
Reporting
UI
Processing
Engine
Dashboard Engine
Dashboard API
R-T Events for Decisioning
Dashboard API
Data Warehouse
Product, Customer,
Transaction
Event Processing
&
Event Repository
Event
Processing
Engine
HDFS
(Time)
Event
Repository
(HBase)
Event
Repository
(Hive)
Stream Ingest
Spark
3131

Más contenido relacionado

La actualidad más candente

Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseCloudera, Inc.
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeDataWorks Summit
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Kolja Manuel Rödel
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data LakeVMware Tanzu
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...DataWorks Summit
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 

La actualidad más candente (20)

Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-Time
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
Rob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San JoseRob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San Jose
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 

Destacado

Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...DataWorks Summit
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaHelena Edelson
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseDaniel Upton
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
Predictive Analytics [UTC]
Predictive Analytics [UTC]Predictive Analytics [UTC]
Predictive Analytics [UTC]Matouš Havlena
 
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)Yahoo Developer Network
 
A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)Trevor Lalish-Menagh
 
Spark on YARN: The Road Ahead
Spark on YARN: The Road AheadSpark on YARN: The Road Ahead
Spark on YARN: The Road AheadCloudera, Inc.
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"DataStax Academy
 
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Sabri Skhiri
 

Destacado (20)

Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile Enterprise
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Predictive Analytics [UTC]
Predictive Analytics [UTC]Predictive Analytics [UTC]
Predictive Analytics [UTC]
 
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
 
A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)
 
Spark on YARN: The Road Ahead
Spark on YARN: The Road AheadSpark on YARN: The Road Ahead
Spark on YARN: The Road Ahead
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
 
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
 

Similar a Extending Data Lake using the Lambda Architecture June 2015

ANZ C-Level Roundtable
ANZ C-Level RoundtableANZ C-Level Roundtable
ANZ C-Level Roundtableconfluent
 
APAC Exec Roundtable
APAC Exec Roundtable APAC Exec Roundtable
APAC Exec Roundtable confluent
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Formulatedby
 
Open Blueprint for Real-Time Analytics with In-Stream Processing (ISP); 2017 ...
Open Blueprint for Real-Time Analytics with In-Stream Processing (ISP); 2017 ...Open Blueprint for Real-Time Analytics with In-Stream Processing (ISP); 2017 ...
Open Blueprint for Real-Time Analytics with In-Stream Processing (ISP); 2017 ...Grid Dynamics
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 
Digital Transformation Mindset - More Than Just Technology
Digital Transformation Mindset - More Than Just TechnologyDigital Transformation Mindset - More Than Just Technology
Digital Transformation Mindset - More Than Just Technologyconfluent
 
Building a Business Case for Innovation: Project Considerations for Cloud, Mo...
Building a Business Case for Innovation: Project Considerations for Cloud, Mo...Building a Business Case for Innovation: Project Considerations for Cloud, Mo...
Building a Business Case for Innovation: Project Considerations for Cloud, Mo...Fred Isbell
 
Data reply sneak peek: real time decision engines
Data reply sneak peek:  real time decision enginesData reply sneak peek:  real time decision engines
Data reply sneak peek: real time decision enginesconfluent
 
Real time data integration best practices and architecture
Real time data integration best practices and architectureReal time data integration best practices and architecture
Real time data integration best practices and architectureBui Kiet
 
Open Blueprint for Real-Time Analytics with In-Stream Processing
Open Blueprint for Real-Time Analytics with In-Stream ProcessingOpen Blueprint for Real-Time Analytics with In-Stream Processing
Open Blueprint for Real-Time Analytics with In-Stream ProcessingGrid Dynamics
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
Information Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentationInformation Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentationChristopher Wynder
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalVMware Tanzu Korea
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4Janani Eshwaran
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4Janani Eshwaran
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsConnotate
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsConnotate
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...VoltDB
 

Similar a Extending Data Lake using the Lambda Architecture June 2015 (20)

ANZ C-Level Roundtable
ANZ C-Level RoundtableANZ C-Level Roundtable
ANZ C-Level Roundtable
 
APAC Exec Roundtable
APAC Exec Roundtable APAC Exec Roundtable
APAC Exec Roundtable
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
 
Open Blueprint for Real-Time Analytics with In-Stream Processing (ISP); 2017 ...
Open Blueprint for Real-Time Analytics with In-Stream Processing (ISP); 2017 ...Open Blueprint for Real-Time Analytics with In-Stream Processing (ISP); 2017 ...
Open Blueprint for Real-Time Analytics with In-Stream Processing (ISP); 2017 ...
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Digital Transformation Mindset - More Than Just Technology
Digital Transformation Mindset - More Than Just TechnologyDigital Transformation Mindset - More Than Just Technology
Digital Transformation Mindset - More Than Just Technology
 
Building a Business Case for Innovation: Project Considerations for Cloud, Mo...
Building a Business Case for Innovation: Project Considerations for Cloud, Mo...Building a Business Case for Innovation: Project Considerations for Cloud, Mo...
Building a Business Case for Innovation: Project Considerations for Cloud, Mo...
 
Data reply sneak peek: real time decision engines
Data reply sneak peek:  real time decision enginesData reply sneak peek:  real time decision engines
Data reply sneak peek: real time decision engines
 
Real time data integration best practices and architecture
Real time data integration best practices and architectureReal time data integration best practices and architecture
Real time data integration best practices and architecture
 
Open Blueprint for Real-Time Analytics with In-Stream Processing
Open Blueprint for Real-Time Analytics with In-Stream ProcessingOpen Blueprint for Real-Time Analytics with In-Stream Processing
Open Blueprint for Real-Time Analytics with In-Stream Processing
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Information Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentationInformation Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentation
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Último (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

Extending Data Lake using the Lambda Architecture June 2015

  • 1. Extending Data Lake using the Lambda Architecture June 2015 Dr. William Kornfeld – R& D Director Think Big, a Teradata company Peyman Mohajerian – UDA Architecture COE, Teradata
  • 2. Agenda  Considerations for choosing a real-time architecture  Use cases
  • 3. • What does it mean to be a real-time architecture? • What are the use cases that real-time architecture serves? • When would it be a mistake to use a real-time architecture? • What are useful design patterns for implementing real-time architectures (including lambda)? Introduction 3
  • 4. What is “Real Time”? 4 Data StoreData In Info Out Generally means something is happening in seconds, not minutes or hours.
  • 5. What is “Real Time”? 5 Data StoreData In Info Out Generally means something is happening in second or so, not minutes or hours. Push or Pull
  • 6. What is “Real Time”? 6 Data StoreData In Info Out Generally means something is happening in a second give-or-take, not minutes or hours. Push or Pull For purposes of this talk, “Real Time” is measuring from Data In through Info Out.
  • 7.  The significant component of each individual message coming in is stored.  Example: - Individual prescription records to be retrieved.  Each of the messages coming in contriburtes to one or more aggregates.  Example: - Number of prescriptions for penicillin on June 9, 2015 Two General Classes of Information for Storage and Retrieval 7 Atomic Aggregate
  • 8. • Question to ask: If a new message comes in, do I need to be able to see or react to it nearly immediately? • Case 1: A message represents a doctor ordering a prescription. • Case 2: A message represents a student completing the SAT with a certain score. Atomic Retrieval 8
  • 9. • Some aggregate types make sense in real time as an instantaneous snapshot at the present moment. • The “real time” value of some aggregate types are really an estimate of the value of something at some indeterminate time in the past. • Some aggregate types lose their meaning as real-time values. • Some real time processes can be enabled by batch aggregates. Aggregate Retrieval 9
  • 10. • Includes sums and counts. • Examples: − Dollars of revenue earned so far today − Number of prescriptions for penicillin written today Aggregates with Instantaneous Meaning in Real Time 10
  • 11. • Includes aggregates which are ratios. • Examples − Click-through rate on an ad − Conversion rate on an email marketing campaign − Percent of prescriptions filled Aggregates Whose Current Value may not be an accurate reflection of what is happening NOW 11
  • 12. • Includes aggregates which are ratios. • Examples − Click-through rate on an ad − Conversion rate on an email marketing campaign − Percent of prescriptions filled Aggregates Whose Current Value may not be an accurate reflection of what is happening NOW 12 Now
  • 13. • Includes Unique User Counts • Well-defined meaning only on intervals Aggregates that Have no Instaneous Meaning 13 Joe Ken Sue Fred Jane Bob Joe Ken Joe Fred Joe
  • 14. Real Time Aggregate Update Can be Significantly More Expensive Than Batch 14 Web Server PC/Male PC/Female Mac/Male Mac/Female PC Mac Male Female Everyone
  • 15. Real Time Aggregate Update Can be Significantly More Expensive Than Batch 15 Web Server PC/Male PC/Female Mac/Male Mac/Female PC Mac Male Female Everyone
  • 16. Real Time Processes that Use Batch Aggregates 16 Data Model Periodically Rebuild Web Server
  • 17. Suppose your Information Can be Real Time, Should You Use a Real TIme Architecture? 17 Real World Big Data System Do you need to know about or react to changes in the Real World within a couple of minutes of the changes?
  • 18. • There are use cases for both batch and real-time data processing. • Batch tools are stabler; less subject to frequent revision. • Real-time architectures can be significantly more expensive. • Many systems will have some of each. Real Time vs. Batch 18
  • 22. Real-Time Use Cases  Lambda Architecture - Medical: Patient Critical Care  Event Driven Architecture - Marketing: Customer Engagement
  • 23. Why Big Data? Challenges in Medical Data Health data tends to be “wide”, not “deep” New data types are becoming more important Unstructured Real-time streaming A challenge to generally move from retrospective “BI” viewing to event-based and predictive analytics usage Multiple layers Lots of events, data Complex Lots of different languages and data structures Difficult to maintain Lots of moving pieces/components/technologies Lots of changes in the business
  • 24. Project Optimize an existing Natural Language Processing pipeline in support of critical Colorectal Surgery (Move to tens of thousands of documents processed) Replace an existing free-text search facility used by Clinical Web Service for cancer (Move search to milliseconds)
  • 26.  Current Storm throughput up to 1.5 million documents per hour  Average of 140,000 HL7 messages actually processed per day with average latency of 60 milliseconds from ingest to persistence  Average of 50,000 documents passed through annotators per day versus 5,000 historically  Actual annotations of documents up to 6 times faster than previously accomplished  Free-text search use cases that took over 30 minutes on old infrastructure completing in milliseconds in ElasticSearch Operational Statistics
  • 27. Applications Deliver the Company’s Brand and Customer Experience Social Media The Customer Marketing Channels Mobile Apps Devices & Form-factors • Entirety of applications combine to deliver the full customer experience • Today they are mostly designed in a silo’d manner • Applications are not designed to solicit and extract customer experience data well • At the core of application design should be the considerations for obtaining and delivering information about the customer experience
  • 28. The Customer Experience Universe Day 1 Day 3 Day 7 Day 17 Day 21 Day 25 IM Campaign Fragment Email Campaign Fragment Customers Services Fragment PaidSearch LandingPage CreateAccount TXN AttachedCC EmailSent EmailOpened EmailLinkClicked EmailClicked AccountLogin BannerAd1Impression BannerAd2Impression AddBank EmailSent EmailSent TXN AccountLogin HelpCenter EnterDispute C.S.EmailSent EmailOpened EmailLinkClicked HelpCenterHP DisputePage VirtualAgent CallsIntoIVR IVR:DisputeWorkflow TransferredtoAgent DisputeResolved C.S.SurveyEmailed Social Media The Customer Marketing Channels Mobile Apps Devices & Form-factors A universe of customer experience data: • Create threads • Build graphs • Identify patterns
  • 29. Event Analytics Ecosystem Social Media Email Marketing Display Marketing Website Activity Customer Account Products Transactions Customer Care Event Repository EAP Metadata Dictionary & Library Core Event Dictionary, Library & Data Source Adapters Custom Business Event Dictionary & Library Machine Learning Customer Experience Best Offers Digital Marketing Applications ReportingHigh Speed Query & Reporting APIs Guided UI Driven Analytics Funnel Path Graph Guided UI Funnel & Path Processing Functions Graph Engine & Functions Business Analyst Business Analyst
  • 30. Event Analytics Ecosystem EAP Metadata Dictionary & Library Core Event Dictionary, Library & Data Source Adapters Custom Business Event Dictionary & Library Event Repository Offers Best Offers Machine Learning A/B Testing Reporting High Speed Query & Reporting APIs Guided UI Driven Analytics Funnel Path Graph Guided UI Funnel & Path Processing Functions Graph Engine & Functions Business Analyst Business Analyst Product, Customer and Transaction Data Mobile Apps Web Site Activity Social Media Display & Search Marketing Customer State eComm Customer Care 3rd Party Tracking Batch Ingest Data Dictionary Event Pattern Matching & Scoring Decisioning Buffer Serve LWIftp Aster Analytic Engine Event Metadata Dictionary Guided UI Funnel Reporting UI Processing Engine Dashboard Engine Dashboard API R-T Events for Decisioning Dashboard API Data Warehouse Product, Customer, Transaction Event Processing & Event Repository Event Processing Engine HDFS (Time) Event Repository (HBase) Event Repository (Hive) Stream Ingest Spark
  • 31. 3131

Notas del editor

  1. HL7 actual processing based on “pull” requests from users not actual processing power HL7 are large xml-based documents Much larger than say JSON or others (roughly 800k-900k in size) Contains significant data related to medical information End goal An architecturally-driven, internally-owned technology stack that blends: An event-based processing fabric A real-time processing framework A multi-destination distillation hub “Classic” BI delivery techniques “Services-based” delivery techniques A “serendipitous” discovery environment Mutually supportive components that combine in delivering novel clinical solutions.
  2. How the business looks to the customer The customer experiences the company across the entirety of applications that company has developed and deployed. Applications more so represent the Brand of the company Most applications are not designed to solicit and extract the customer experience data well. There are 2 major ways data is obtained from applications Web-site tagging Very detailed logging data for engineers for application development and application operational performance One is too aggregate and difficult to administer; the other is too engineering oriented Furthermore applications are designed within themselves and mostly are not designed to thinking about the experiences across other applications and channels. Stitching the customer experience across multiple applications is difficult.
  3. The problem is big 7 sources by client Ability to customize for the consumer
  4. Ingestion: depending on the type of source TD has IP; basically there are 2 types of sources: streaming & batch. For streaming TD Listener will be the advocated solution; for batch TB has 2 pieces of IP for ingestion (Light-weight ingestion (LWI) & Buffer Server). Light-weight ingestion (LWI) is for large 3rd party files like Omniture. Instead of having to FTP OMNI to a landing server; LWI connects directly to FTP and pulls the file and lands into HDFS in time-partitions. Buffer Server is a set of IP that is designed to ingest large numbers of small files, concatenate them together to large files that are more Hadoop friendly and lands them into HDFS time-partitions. Event Processing & Repository TB has designed (but not yet implemented) 2 pieces of IP in this area Event Processing: built using M/R it converts the incoming data sources into event objects (3 processing steps include: pre-pend an event header, pre-pend an event type header and resolve incoming ID (cookie, GUID, customer, email address, etc.) to a specific customer. Populates event records into Hbase. The Event Processing Engine processese both streaming and batch sources Event Repository is an HBase schema that is to central storage for all events Dashboard Engine TB has built IP that allows quickly building KPI’s from the Event Repository. Using a UI, a developer can quickly aggregate metrics into an Hbase schema onto top of which tools like Tableau can optimall run Guided, Metadata-driven Discovery Event Analytics