SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Real-time Interactive
Big Data Analysis
Using In-Memory
Computing
Mike	
  Joyce	
  –	
  Manager	
  So0ware	
  Engineer,	
  iCrossing	
  
Shawn	
  Nguyen	
  –	
  Lead	
  So0ware	
  Engineer,	
  iCrossing	
  
CONNECTED	
  MARKETING	
  PLATFORM	
  (TECHNOLOGY)	
  
Bid	
  Management	
  /	
  Trading	
  Desk	
  
Data	
  Management	
  PlaNorm	
  (Core	
  Audience)	
  
+	
   +	
  
STRATEGY	
  &	
  PLANNING	
  
Market	
  Research	
  
AnalyPcs	
  
Strategy	
  &	
  Planning	
  
PROGRAM	
  DESIGN	
  
Media	
  Planning	
  	
  
&	
  Buying	
  
CreaPve	
  &	
  	
  
Experience	
  Design	
  
Content	
  CreaPon	
  	
  
&	
  Management	
  
AUDIENCE	
  
ENGAGEMENT	
  
Search	
  MarkePng	
  Programs	
  
Social	
  Media	
  /	
  Mobile	
  
Technology	
  &	
  	
  
App	
  Development	
  
Measurement	
  &	
  
	
  OpPmizaPon	
  
Leveraging audience insights:
•  20+	
  brands	
  	
  
•  30+	
  TV	
  networks	
  
•  50+	
  newspapers	
  
•  300+	
  magazines	
  
CONTENT	
  
DIGITAL	
  AGENCY	
  INSIDE	
  A	
  
EMPIRE	
  
Big Data - Cookies!
300+	
  million	
  
unique	
  cookies	
  
•  Subscribers	
  
•  Visitors	
  
•  InternaPonal	
  
•  MulPple	
  devices	
  
DMP Audience Data
A]ributes	
  
•  Geographic	
  
•  Demographic	
  
•  Behavioral	
  
•  Psychographic	
  
11,000+ Unique Attributes
Cookies + Audience Attributes = Super Big Data!
90M+
Cookies
Male
Age 20 - 35
Sports Enthusiasts
Average
user
800+
attributes
Iowa
High Income
iPad, iPhone
Drives Mini Van
Foodie
72B+
Attribute
User
pairs
Audiences – Targeting vs Discovering
•  Who	
  you	
  are	
  targePng	
  
•  How	
  do	
  you	
  connect	
  
with	
  them?	
  
•  What	
  describes	
  them?	
  
Data Scientists
Discovering	
  Audience	
  A]ributes	
  
1.  Define	
  an	
  audience	
  using	
  
a]ributes	
  
2.  IdenPfy	
  all	
  a]ributes	
  of	
  
cookies	
  in	
  audience	
  
3.  Calculate	
  highly	
  indexing	
  
a]ributes	
  
1) Define the Audience
Population"
90M Cookies"
Audience"
300K Cookies"
Age: 20-35"
US > North Dakota"
Gender: Male"
2) Audience Attributes
Interest:	
  Sports	
  Enthusiast	
  
Interest:	
  Moose	
  HunPng	
  
Intent:	
  Auto	
  Purchase	
  >	
  Truck	
  
US	
  >	
  North	
  Dakota	
  >	
  Fargo	
  
Pet	
  Supplies	
  >	
  Dog	
  Food	
  
Attributes of"
Cookies in Audience"
Audience"
300K Cookies"
A3ribute	
  
Audience	
  
Frequency	
  
PopulaDon	
  
Frequency	
  
Interest:	
  Sports	
  Enthusiast	
   24%	
   27%	
  
Interest:	
  Moose	
  HunPng	
   40%	
   6%	
  
Intent:	
  Auto	
  Purchase	
  >	
  Truck	
   17%	
   4%	
  
US	
  >	
  North	
  Dakota	
  >	
  Fargo	
   30%	
   2%	
  
Pet	
  Supplies	
  >	
  Dog	
  Food	
   6%	
   9%	
  
3) Index the Attributes
Interest:	
  Sports	
  Enthusiast	
  
Interest:	
  Moose	
  HunPng	
  
Intent:	
  Auto	
  Purchase	
  >	
  Truck	
  
US	
  >	
  North	
  Dakota	
  >	
  Fargo	
  
Pet	
  Supplies	
  >	
  Dog	
  Food	
  
Attributes of"
Cookies in Audience"
Data Scientists
Development	
  Ask	
  
1.  Make	
  it	
  accessible	
  to	
  
“normals”	
  
2.  Exportable	
  visualizaPons	
  &	
  
calculaPons	
  
3.  Reduce	
  query	
  Pme	
  from	
  1	
  hr	
  
to	
  1	
  sec	
  
	
  
Why is this Hard?
90M+
Cookies
Male
Age 20 - 35
Sports Enthusiasts
Average
user
800+
attributes
Iowa
High Income
iPad, iPhone
Drives Mini Van
Foodie
72B+
Attribute
User
pairs
Algorithm	
  
1. Check	
  every	
  cookie	
  if	
  it	
  
saPsfies	
  audience	
  criteria	
  
2. Collect	
  all	
  a]ributes	
  for	
  
every	
  audience	
  cookie	
  
3. Calculate	
  percentages	
  &	
  
index	
  
Within	
  1	
  sec	
  !!!!!!	
  
•  Audience discovery
–  Cookie Attributes
–  Frequency vs Population
•  Built for non-technical users
–  Strategy
–  Sales / Account
–  Anyone
•  Flexible
–  Research tool
–  In-meeting, iterative discovery
•  Approachable
–  Real-time
–  Results in seconds
–  Simple, elegant interface
–  Multiple export formats
“Making science accessible”
The Answer – Audience Discovery Tool
Data Processing R& D
Traditional Relational Databases
•  Long	
  load	
  Pme	
  
•  Complex	
  queries	
  resulPng	
  in	
  long	
  query	
  
Pmes	
  
•  Rigid	
  data	
  model	
  
Non Traditional Databases
•  Lack	
  of	
  complex	
  query	
  feature	
  
•  Large	
  memory	
  footprint	
  requirement	
  
•  AggregaPon	
  query	
  exceeded	
  by	
  many	
  
10x	
  of	
  seconds	
  
The Low Hanging Fruit
•  In	
  memory	
  cache	
  
•  Customizable	
  query	
  using	
  Java	
  code	
  
•  RelaPvely	
  low	
  data	
  loading	
  Pme	
  
The Vertical Problem
Distributed Computing Ecosystem
•  Not	
  producPon	
  ready	
  
•  Data	
  import	
  fails	
  without	
  explanaPon	
  
•  AggregaPon	
  fails	
  to	
  impress	
  
Back to Basics
•  Pure	
  Java	
  code	
  soluPon	
  
•  Data	
  and	
  logic	
  must	
  exists	
  in	
  same	
  
memory	
  space	
  
•  Capable	
  of	
  advanced	
  filtering	
  
•  Distributed	
  compuPng,	
  low	
  overhead	
  
•  Data	
  locality	
  
•  Minimal	
  code	
  migraPon	
  
The Distributed Solution
The Challenges
•  Tedious	
  manual	
  data	
  distribuPon	
  
•  Gar	
  building	
  and	
  deployment	
  issues	
  
•  Development	
  challenges	
  
What We Learned
•  Indexed	
  data	
  requiring	
  minor	
  calculaPons	
  -­‐-­‐	
  
databases	
  (relaPonal	
  &	
  noSQL)	
  great	
  
•  Large	
  non-­‐indexed	
  data	
  	
  -­‐-­‐	
  the	
  data	
  &	
  
processing	
  	
  need	
  to	
  live	
  in	
  the	
  same	
  (memory)	
  
space	
  

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
Big Data Usecases
Big Data UsecasesBig Data Usecases
Big Data Usecases
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 

Destacado

Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on Hadoop
DataWorks Summit
 
Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark) Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark)
Matt Turck
 

Destacado (17)

COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on Hadoop
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBase
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Oracle Database In-Memory Meets Oracle RAC
Oracle Database In-Memory Meets Oracle RACOracle Database In-Memory Meets Oracle RAC
Oracle Database In-Memory Meets Oracle RAC
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Robust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkRobust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache Spark
 
Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark) Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark)
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 

Similar a IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Cloudera, Inc.
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Sp meetup 17 slidedeck
Sp meetup 17 slidedeckSp meetup 17 slidedeck
Sp meetup 17 slidedeck
Ric Centre
 

Similar a IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing (20)

Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
 
Big Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingBig Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision Making
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
Offline just got reachable
Offline just got reachableOffline just got reachable
Offline just got reachable
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Realize Greater "Return on Information" with Google Enterprise Search
Realize Greater "Return on Information" with Google Enterprise SearchRealize Greater "Return on Information" with Google Enterprise Search
Realize Greater "Return on Information" with Google Enterprise Search
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
2nd Big Data Business Forum Nov 13th to 15th, 2013 in San Francisco
2nd Big Data Business Forum Nov 13th to 15th, 2013 in San Francisco2nd Big Data Business Forum Nov 13th to 15th, 2013 in San Francisco
2nd Big Data Business Forum Nov 13th to 15th, 2013 in San Francisco
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019
 
SEO - What is it?
SEO - What is it?SEO - What is it?
SEO - What is it?
 
Digital-Warriors-Marketing Roadmap with Big Data Analytics
Digital-Warriors-Marketing Roadmap with Big Data AnalyticsDigital-Warriors-Marketing Roadmap with Big Data Analytics
Digital-Warriors-Marketing Roadmap with Big Data Analytics
 
Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With Data
 
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyfBig Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
 
Anatomy of a Big Data Application (BDA)
Anatomy of a Big Data Application (BDA)Anatomy of a Big Data Application (BDA)
Anatomy of a Big Data Application (BDA)
 
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Sp meetup 17 slidedeck
Sp meetup 17 slidedeckSp meetup 17 slidedeck
Sp meetup 17 slidedeck
 

Más de In-Memory Computing Summit

IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
In-Memory Computing Summit
 

Más de In-Memory Computing Summit (20)

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
 
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
 
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
 
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

  • 1. Real-time Interactive Big Data Analysis Using In-Memory Computing Mike  Joyce  –  Manager  So0ware  Engineer,  iCrossing   Shawn  Nguyen  –  Lead  So0ware  Engineer,  iCrossing  
  • 2. CONNECTED  MARKETING  PLATFORM  (TECHNOLOGY)   Bid  Management  /  Trading  Desk   Data  Management  PlaNorm  (Core  Audience)   +   +   STRATEGY  &  PLANNING   Market  Research   AnalyPcs   Strategy  &  Planning   PROGRAM  DESIGN   Media  Planning     &  Buying   CreaPve  &     Experience  Design   Content  CreaPon     &  Management   AUDIENCE   ENGAGEMENT   Search  MarkePng  Programs   Social  Media  /  Mobile   Technology  &     App  Development   Measurement  &    OpPmizaPon  
  • 3. Leveraging audience insights: •  20+  brands     •  30+  TV  networks   •  50+  newspapers   •  300+  magazines   CONTENT   DIGITAL  AGENCY  INSIDE  A   EMPIRE  
  • 4. Big Data - Cookies! 300+  million   unique  cookies   •  Subscribers   •  Visitors   •  InternaPonal   •  MulPple  devices  
  • 5. DMP Audience Data A]ributes   •  Geographic   •  Demographic   •  Behavioral   •  Psychographic   11,000+ Unique Attributes
  • 6. Cookies + Audience Attributes = Super Big Data! 90M+ Cookies Male Age 20 - 35 Sports Enthusiasts Average user 800+ attributes Iowa High Income iPad, iPhone Drives Mini Van Foodie 72B+ Attribute User pairs
  • 7. Audiences – Targeting vs Discovering •  Who  you  are  targePng   •  How  do  you  connect   with  them?   •  What  describes  them?  
  • 8. Data Scientists Discovering  Audience  A]ributes   1.  Define  an  audience  using   a]ributes   2.  IdenPfy  all  a]ributes  of   cookies  in  audience   3.  Calculate  highly  indexing   a]ributes  
  • 9. 1) Define the Audience Population" 90M Cookies" Audience" 300K Cookies" Age: 20-35" US > North Dakota" Gender: Male"
  • 10. 2) Audience Attributes Interest:  Sports  Enthusiast   Interest:  Moose  HunPng   Intent:  Auto  Purchase  >  Truck   US  >  North  Dakota  >  Fargo   Pet  Supplies  >  Dog  Food   Attributes of" Cookies in Audience" Audience" 300K Cookies"
  • 11. A3ribute   Audience   Frequency   PopulaDon   Frequency   Interest:  Sports  Enthusiast   24%   27%   Interest:  Moose  HunPng   40%   6%   Intent:  Auto  Purchase  >  Truck   17%   4%   US  >  North  Dakota  >  Fargo   30%   2%   Pet  Supplies  >  Dog  Food   6%   9%   3) Index the Attributes Interest:  Sports  Enthusiast   Interest:  Moose  HunPng   Intent:  Auto  Purchase  >  Truck   US  >  North  Dakota  >  Fargo   Pet  Supplies  >  Dog  Food   Attributes of" Cookies in Audience"
  • 12. Data Scientists Development  Ask   1.  Make  it  accessible  to   “normals”   2.  Exportable  visualizaPons  &   calculaPons   3.  Reduce  query  Pme  from  1  hr   to  1  sec    
  • 13. Why is this Hard? 90M+ Cookies Male Age 20 - 35 Sports Enthusiasts Average user 800+ attributes Iowa High Income iPad, iPhone Drives Mini Van Foodie 72B+ Attribute User pairs Algorithm   1. Check  every  cookie  if  it   saPsfies  audience  criteria   2. Collect  all  a]ributes  for   every  audience  cookie   3. Calculate  percentages  &   index   Within  1  sec  !!!!!!  
  • 14. •  Audience discovery –  Cookie Attributes –  Frequency vs Population •  Built for non-technical users –  Strategy –  Sales / Account –  Anyone •  Flexible –  Research tool –  In-meeting, iterative discovery •  Approachable –  Real-time –  Results in seconds –  Simple, elegant interface –  Multiple export formats “Making science accessible” The Answer – Audience Discovery Tool
  • 16. Traditional Relational Databases •  Long  load  Pme   •  Complex  queries  resulPng  in  long  query   Pmes   •  Rigid  data  model  
  • 17. Non Traditional Databases •  Lack  of  complex  query  feature   •  Large  memory  footprint  requirement   •  AggregaPon  query  exceeded  by  many   10x  of  seconds  
  • 18. The Low Hanging Fruit •  In  memory  cache   •  Customizable  query  using  Java  code   •  RelaPvely  low  data  loading  Pme  
  • 20. Distributed Computing Ecosystem •  Not  producPon  ready   •  Data  import  fails  without  explanaPon   •  AggregaPon  fails  to  impress  
  • 21. Back to Basics •  Pure  Java  code  soluPon   •  Data  and  logic  must  exists  in  same   memory  space   •  Capable  of  advanced  filtering  
  • 22. •  Distributed  compuPng,  low  overhead   •  Data  locality   •  Minimal  code  migraPon  
  • 24. The Challenges •  Tedious  manual  data  distribuPon   •  Gar  building  and  deployment  issues   •  Development  challenges  
  • 25. What We Learned •  Indexed  data  requiring  minor  calculaPons  -­‐-­‐   databases  (relaPonal  &  noSQL)  great   •  Large  non-­‐indexed  data    -­‐-­‐  the  data  &   processing    need  to  live  in  the  same  (memory)   space