SlideShare una empresa de Scribd logo
1 de 34
Interactive Analytics in Human Time
S u p r e e t h R a o , S u n i l G u p t a ⎪ J u n e 1 8 , 2 0 1 4
4 5 t h B a y A r e a H U G , S u n n yv a l e , C a l i f o r n i a
Interactive – How we see it?
2 Yahoo Confidential & Proprietary
60B events,
3.5TB of compressed data
Response 400ms
Serve an ad and get insights
< 2s
Agenda:
Motivation
Approach
Problem Deepdive- Instant Overlap
Summary
Questions
Motivation:
Lots of data
Analytics
Data restatement - batch and real time
Human time
Lots of data
~30B advertising events/day
~10s of TB of compressed data/day
Minutes to Year Grain
Multi-quarter data retention
Data Aging
Analytics
Reporting Metrics
Attribution
Multi-level hierarchical computation
Bidding/Targeting optimization
Non-additive computation
Data Restatement
Real time
Batch
Producer Consumer
quick path, lower
amount of checks or
reconciliation,
typically no lookups
high latency path,
checks and
reconciliations,
can have lookbacks
and lookups
Human Time
<1s ( 99 percentile)
Default time grain ( < 300 ms)
Instant overlap ( < 60s)
Data ingested, insights available ( < 2s)
Lots of data
Analytics
Data restatement - batch and real time
Human time
Approach:
Data Ingestion or Collection
Transformations
Data Ingestion
Persistence
Runtime Compute
Caching
API
Optional Middleware
Business API
UI
Data Collection
Data Pipelines
Data Warehouse/ Analytics and
Optimizations
Reporting Application/UI
Logical View - Scope
Transformations/Aggs
Data Ingestion
Persistence
Runtime Compute
Caching
API
Optional Middleware
Business API
UI
Data Collection
Impacts
Out of scope
Transformations/Aggs
Data Ingestion
Persistence
Runtime Compute
Caching
API
Optional Middleware
Business API
UI
Data Collection
Batch processing DAG, Real-time topology, SOX,
Traffic protection, Late processing, Retention,
Completeness Monitoring, PII cleansing/masking
Compatible with HDFS, Performance (Indexed,
Columnar, Compression, Serialization, Flexibility,
Concurrency, Grain of data stored)
Distributed/Stand-alone, Caching objects vs caching
results
Access to data with group by, order by etc..; SQL or SQL
like
Translate JSON to SQL(optional)
Logical View - Characteristics
Impacts
Out of scope
Transformations/Aggs
Data Ingestion
Persistence
Runtime Compute
Caching
API
Optional Middleware
Business API
UI
Data Collection
Hadoop MR/PIG /Oozie(Lotus)/Storm(Trident)
Druid, Shark, Hive, Oracle RAC, Mysql, Hbase, Impala
memcached_y, Redis
JSON-REST API ; JDBC; ODBC
Data Ingestion
Persistence
Runtime Compute
Caching
API
Optional Middleware
Business API
UI
Data Collection
Logical View - Choices
Impacts
Out of scope
How we do what we do? Components of Advertising Data Warehouse
Druid
JDBC/ODBC
Data Warehouse-Persistence
Hive
Metrics
Store
JSON-API Persistence and run-time compute
Computation and Ingestion
Quick cache ( using a database for now)
Upstream: API layer, MSTR,
Adhoc access, Identity Service,
Ad-Serving manifests
Data Producers; Serving,
Scoring, Booking, 3rd/1st Party
Data
Real time and batch compute engine
(Hadoop/Storm )
Data filtering/transformations:
Transformations, format
conversions
Custom Algorithms : computing
recursive uniques, indexing
Human time, How?
Druid for interactive queries
Storm-Druid for quick ingestion and
index
Specialized computation and
processing for quicker response
› Sketches
› Feature sequence based overlaps
› Custom indexing
Problem Deepdive: Instant Overlap
Users
Car
commuters
Soccer
Fans
Vegans
Users
Car
Commuters
Soccer
Fans
Vegans
Overlap
Non-additive
› Require access to raw (user level data) to compute
non-additive
• Billions of events a day
• TBs of data a day
 1-1 vs 1-n vs few-n
› Between car commuter and vegan what is the overlap
› For Car commuter which are the top overlap groups
› For Vegan, Car commuters what are the top overlap
groups
Re-stating motivation
Given two sets having identifiers, how
can we do exact overlaps in close to
real time?
( < 1 min).
Overlap is like a AND operation or a set
Existing Approaches
● Use exact compute paradigms
o Do joins for intersections which will lead to
exact results
 Hive, PIG, MR can all support efficient joins
 Exact but not real time
● Use sketches
o Approximate algorithms
 HLL, KMV, accuracy vs size, performance
 Approx, needs high perf tuning
 close to real time but not exact
Using Feature Sequences – 1/4
Feature sequence encoding
o Encode the sequence
 {Ram} - { car commuter, soccer fan,...}
 {Tom} - { soccer fan, vegan...}
 {Sam} - { car commuter, soccer fan, vegan...}
 ….
Using Feature Sequences – 2/4
Eliminate the user on encoded bitmaps
 {car commuter, soccer fan, vegan...}- count -c1- #
 {soccer fan, vegan...} - count - c2 - #
 {car commuter, vegan...} - count - c2 - #
Counts become additive now
Using Feature Sequences – 3/4
● Store row qualifications into a bitmap
o Car commuter- Row1, Row3
 1010000000
o Vegan - Row1, Row2, Row3
 1110000000
● Load the bitmap into Druid using a
custom indexer
o in-memory or memory mapped
Using Feature Sequences – 4/4
 Data Structures
› {feature_sequence}->count
› Feature->row qualification bitmaps
 AND is now an “AND” on bitmaps
› supported within Druid
› Very efficient
 Works alongside topN and
groupBys
Comparison with existing algorithm
● 1-n – Bulk Overlap on grid
o 19 hours on grid
o Few-n calls for a re-process
o 1-1 ( <1s)
● Instant Overlap
o < 60s ( pre-processing 3-4 hours)
o Supports “exact” AND
o Flexible ( few-n, 1-n)
o 1-1 ( < 1s)
Summary
● Yahoo’s Advertising Data Warehouse
o Peta Byte Scale
o Normalized view across many systems
o Analytics and optimizations with specialized
algorithms
o Data restatement - batch and realtime
o Human time
Thank You
@supreeth_
@_skgupta
We are hiring!
Reach out to us at
bigdata@yahoo-inc.com.
Data Ingestion or Collection
Transformations
Data Ingestion
Persistence
Runtime Compute
Caching
API
Optional Middleware
Business API
UI
Data Collection
Data Pipelines
Data Warehouse/ Analytics and
Optimizations
Reporting Application/UI
Logical View - Scope
Transformations/Aggs
Data Ingestion
Persistence
Runtime Compute
Caching
API
Optional Middleware
Business API
UI
Data Collection
Impacts
Out of scope
Dimension Flexibility
Many dimensions
Adding new dimensions
Time zones
Time grain
Normalized view across systems
PaidSearch
Display
Native
Programmatic buying and
selling
Ad-targeting
Hardware Configs
●High-memory boxes
●SSD preferred
●Savings due to better
compression

Más contenido relacionado

La actualidad más candente

Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterKamran Munshi
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Translating Models to Medicine an Example of Managing Visual Communications
Translating Models to Medicine an Example of Managing Visual CommunicationsTranslating Models to Medicine an Example of Managing Visual Communications
Translating Models to Medicine an Example of Managing Visual CommunicationsDatabricks
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big DataShankar R
 
Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...Amazon Web Services
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerSri Ambati
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConfQubole
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksDatabricks
 
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...Big Data Spain
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demoDatabricks
 
MLFlow as part of ML CI/CD at Avalara
MLFlow as part of ML CI/CD at AvalaraMLFlow as part of ML CI/CD at Avalara
MLFlow as part of ML CI/CD at AvalaraManoj Mahalingam
 
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksDatabricks
 
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
How Spark Fits into Baidu's Scale-(James Peng, Baidu)How Spark Fits into Baidu's Scale-(James Peng, Baidu)
How Spark Fits into Baidu's Scale-(James Peng, Baidu)Spark Summit
 
Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseAttunity
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph✔ Eric David Benari, PMP
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopdhruv_gairola
 
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingReal-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingDatabricks
 

La actualidad más candente (20)

Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @Twitter
 
Bigdata
BigdataBigdata
Bigdata
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Translating Models to Medicine an Example of Managing Visual Communications
Translating Models to Medicine an Example of Managing Visual CommunicationsTranslating Models to Medicine an Example of Managing Visual Communications
Translating Models to Medicine an Example of Managing Visual Communications
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
 
Data science big data and analytics
Data science big data and analyticsData science big data and analytics
Data science big data and analytics
 
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 
MLFlow as part of ML CI/CD at Avalara
MLFlow as part of ML CI/CD at AvalaraMLFlow as part of ML CI/CD at Avalara
MLFlow as part of ML CI/CD at Avalara
 
LinkedinResume
LinkedinResumeLinkedinResume
LinkedinResume
 
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with Databricks
 
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
How Spark Fits into Baidu's Scale-(James Peng, Baidu)How Spark Fits into Baidu's Scale-(James Peng, Baidu)
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
 
Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data Warehouse
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingReal-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
 

Destacado

PLEs from virtual ethnography of web 2.0
PLEs from virtual ethnography of web 2.0PLEs from virtual ethnography of web 2.0
PLEs from virtual ethnography of web 2.0Luis Torres-Yepez
 
MODUL/MEDIA PEMBELAJARAN SINARWATI LABINDA
MODUL/MEDIA PEMBELAJARAN SINARWATI LABINDAMODUL/MEDIA PEMBELAJARAN SINARWATI LABINDA
MODUL/MEDIA PEMBELAJARAN SINARWATI LABINDASINARWATILABINDA
 
Lesson Plan 20151016-Donald TANG
Lesson Plan 20151016-Donald TANGLesson Plan 20151016-Donald TANG
Lesson Plan 20151016-Donald TANGDonald Tang
 
44. Instalowanie urządzeń peryferyjnych
44. Instalowanie urządzeń peryferyjnych44. Instalowanie urządzeń peryferyjnych
44. Instalowanie urządzeń peryferyjnychLukas Pobocha
 
A casa de zaqueu
A casa de zaqueuA casa de zaqueu
A casa de zaqueuGilmar Taty
 
Illustrations from mysteries of harris burdick
Illustrations from mysteries of harris burdickIllustrations from mysteries of harris burdick
Illustrations from mysteries of harris burdickLori Larson
 

Destacado (12)

PLEs from virtual ethnography of web 2.0
PLEs from virtual ethnography of web 2.0PLEs from virtual ethnography of web 2.0
PLEs from virtual ethnography of web 2.0
 
MODUL/MEDIA PEMBELAJARAN SINARWATI LABINDA
MODUL/MEDIA PEMBELAJARAN SINARWATI LABINDAMODUL/MEDIA PEMBELAJARAN SINARWATI LABINDA
MODUL/MEDIA PEMBELAJARAN SINARWATI LABINDA
 
FIP Report
FIP ReportFIP Report
FIP Report
 
Lesson Plan 20151016-Donald TANG
Lesson Plan 20151016-Donald TANGLesson Plan 20151016-Donald TANG
Lesson Plan 20151016-Donald TANG
 
We are don bosco
We are don boscoWe are don bosco
We are don bosco
 
44. Instalowanie urządzeń peryferyjnych
44. Instalowanie urządzeń peryferyjnych44. Instalowanie urządzeń peryferyjnych
44. Instalowanie urządzeń peryferyjnych
 
iOS 7 Transition guide
iOS 7 Transition guideiOS 7 Transition guide
iOS 7 Transition guide
 
A casa de zaqueu
A casa de zaqueuA casa de zaqueu
A casa de zaqueu
 
Leslie Bill
Leslie BillLeslie Bill
Leslie Bill
 
Illustrations from mysteries of harris burdick
Illustrations from mysteries of harris burdickIllustrations from mysteries of harris burdick
Illustrations from mysteries of harris burdick
 
yoctoComm - Cellular Services over Wi-Fi
yoctoComm - Cellular Services over Wi-FiyoctoComm - Cellular Services over Wi-Fi
yoctoComm - Cellular Services over Wi-Fi
 
Un week celebration project proposals
Un week celebration project proposalsUn week celebration project proposals
Un week celebration project proposals
 

Similar a June 2014 HUG: Interactive analytics over hadoop

Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human TimeDataWorks Summit
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmPresentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmAlexander Oppel
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2zhang hua
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
 
REX Hadoop et R
REX Hadoop et RREX Hadoop et R
REX Hadoop et Rpkernevez
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldSerg Masyutin
 
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Amazon Web Services
 
High availability, real-time and scalable architectures
High availability, real-time and scalable architecturesHigh availability, real-time and scalable architectures
High availability, real-time and scalable architecturesJampp
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018VMware Tanzu
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of MusicLars Albertsson
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Building High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataBuilding High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataAmazon Web Services
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Dataplumbee
 

Similar a June 2014 HUG: Interactive analytics over hadoop (20)

Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmPresentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
REX Hadoop et R
REX Hadoop et RREX Hadoop et R
REX Hadoop et R
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java World
 
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
 
High availability, real-time and scalable architectures
High availability, real-time and scalable architecturesHigh availability, real-time and scalable architectures
High availability, real-time and scalable architectures
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of Music
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Building High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataBuilding High Performance Apps with In-memory Data
Building High Performance Apps with In-memory Data
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Data
 

Más de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Más de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

June 2014 HUG: Interactive analytics over hadoop

  • 1. Interactive Analytics in Human Time S u p r e e t h R a o , S u n i l G u p t a ⎪ J u n e 1 8 , 2 0 1 4 4 5 t h B a y A r e a H U G , S u n n yv a l e , C a l i f o r n i a
  • 2. Interactive – How we see it? 2 Yahoo Confidential & Proprietary 60B events, 3.5TB of compressed data Response 400ms Serve an ad and get insights < 2s
  • 6. Lots of data Analytics Data restatement - batch and real time Human time
  • 7. Lots of data ~30B advertising events/day ~10s of TB of compressed data/day Minutes to Year Grain Multi-quarter data retention Data Aging
  • 8. Analytics Reporting Metrics Attribution Multi-level hierarchical computation Bidding/Targeting optimization Non-additive computation
  • 9. Data Restatement Real time Batch Producer Consumer quick path, lower amount of checks or reconciliation, typically no lookups high latency path, checks and reconciliations, can have lookbacks and lookups
  • 10. Human Time <1s ( 99 percentile) Default time grain ( < 300 ms) Instant overlap ( < 60s) Data ingested, insights available ( < 2s)
  • 11. Lots of data Analytics Data restatement - batch and real time Human time
  • 13. Data Ingestion or Collection Transformations Data Ingestion Persistence Runtime Compute Caching API Optional Middleware Business API UI Data Collection Data Pipelines Data Warehouse/ Analytics and Optimizations Reporting Application/UI Logical View - Scope Transformations/Aggs Data Ingestion Persistence Runtime Compute Caching API Optional Middleware Business API UI Data Collection Impacts Out of scope
  • 14. Transformations/Aggs Data Ingestion Persistence Runtime Compute Caching API Optional Middleware Business API UI Data Collection Batch processing DAG, Real-time topology, SOX, Traffic protection, Late processing, Retention, Completeness Monitoring, PII cleansing/masking Compatible with HDFS, Performance (Indexed, Columnar, Compression, Serialization, Flexibility, Concurrency, Grain of data stored) Distributed/Stand-alone, Caching objects vs caching results Access to data with group by, order by etc..; SQL or SQL like Translate JSON to SQL(optional) Logical View - Characteristics Impacts Out of scope
  • 15. Transformations/Aggs Data Ingestion Persistence Runtime Compute Caching API Optional Middleware Business API UI Data Collection Hadoop MR/PIG /Oozie(Lotus)/Storm(Trident) Druid, Shark, Hive, Oracle RAC, Mysql, Hbase, Impala memcached_y, Redis JSON-REST API ; JDBC; ODBC Data Ingestion Persistence Runtime Compute Caching API Optional Middleware Business API UI Data Collection Logical View - Choices Impacts Out of scope
  • 16. How we do what we do? Components of Advertising Data Warehouse Druid JDBC/ODBC Data Warehouse-Persistence Hive Metrics Store JSON-API Persistence and run-time compute Computation and Ingestion Quick cache ( using a database for now) Upstream: API layer, MSTR, Adhoc access, Identity Service, Ad-Serving manifests Data Producers; Serving, Scoring, Booking, 3rd/1st Party Data Real time and batch compute engine (Hadoop/Storm ) Data filtering/transformations: Transformations, format conversions Custom Algorithms : computing recursive uniques, indexing
  • 17. Human time, How? Druid for interactive queries Storm-Druid for quick ingestion and index Specialized computation and processing for quicker response › Sketches › Feature sequence based overlaps › Custom indexing
  • 21. Overlap Non-additive › Require access to raw (user level data) to compute non-additive • Billions of events a day • TBs of data a day  1-1 vs 1-n vs few-n › Between car commuter and vegan what is the overlap › For Car commuter which are the top overlap groups › For Vegan, Car commuters what are the top overlap groups
  • 22. Re-stating motivation Given two sets having identifiers, how can we do exact overlaps in close to real time? ( < 1 min). Overlap is like a AND operation or a set
  • 23. Existing Approaches ● Use exact compute paradigms o Do joins for intersections which will lead to exact results  Hive, PIG, MR can all support efficient joins  Exact but not real time ● Use sketches o Approximate algorithms  HLL, KMV, accuracy vs size, performance  Approx, needs high perf tuning  close to real time but not exact
  • 24. Using Feature Sequences – 1/4 Feature sequence encoding o Encode the sequence  {Ram} - { car commuter, soccer fan,...}  {Tom} - { soccer fan, vegan...}  {Sam} - { car commuter, soccer fan, vegan...}  ….
  • 25. Using Feature Sequences – 2/4 Eliminate the user on encoded bitmaps  {car commuter, soccer fan, vegan...}- count -c1- #  {soccer fan, vegan...} - count - c2 - #  {car commuter, vegan...} - count - c2 - # Counts become additive now
  • 26. Using Feature Sequences – 3/4 ● Store row qualifications into a bitmap o Car commuter- Row1, Row3  1010000000 o Vegan - Row1, Row2, Row3  1110000000 ● Load the bitmap into Druid using a custom indexer o in-memory or memory mapped
  • 27. Using Feature Sequences – 4/4  Data Structures › {feature_sequence}->count › Feature->row qualification bitmaps  AND is now an “AND” on bitmaps › supported within Druid › Very efficient  Works alongside topN and groupBys
  • 28. Comparison with existing algorithm ● 1-n – Bulk Overlap on grid o 19 hours on grid o Few-n calls for a re-process o 1-1 ( <1s) ● Instant Overlap o < 60s ( pre-processing 3-4 hours) o Supports “exact” AND o Flexible ( few-n, 1-n) o 1-1 ( < 1s)
  • 29. Summary ● Yahoo’s Advertising Data Warehouse o Peta Byte Scale o Normalized view across many systems o Analytics and optimizations with specialized algorithms o Data restatement - batch and realtime o Human time
  • 30. Thank You @supreeth_ @_skgupta We are hiring! Reach out to us at bigdata@yahoo-inc.com.
  • 31. Data Ingestion or Collection Transformations Data Ingestion Persistence Runtime Compute Caching API Optional Middleware Business API UI Data Collection Data Pipelines Data Warehouse/ Analytics and Optimizations Reporting Application/UI Logical View - Scope Transformations/Aggs Data Ingestion Persistence Runtime Compute Caching API Optional Middleware Business API UI Data Collection Impacts Out of scope
  • 32. Dimension Flexibility Many dimensions Adding new dimensions Time zones Time grain
  • 33. Normalized view across systems PaidSearch Display Native Programmatic buying and selling Ad-targeting
  • 34. Hardware Configs ●High-memory boxes ●SSD preferred ●Savings due to better compression

Notas del editor

  1. Logical view of a typical data driven application architecture Sox compliance
  2. -quick cache store for querying all metrics in a single fetch, to support one-page load UI architecture - Hive for scheduled job and for adhoc long range generic queries which are not supported on the interactive interface
  3. Bitmap indexed, columnar, can operate on compressed bitmaps, distributed -