SlideShare a Scribd company logo
1 of 28
Download to read offline
HP VERTICA BI
SUB-SECOND BIG DATA ANALYTICS
YOUR USERS AND DEVELOPERS CAN
TRULY APPRECIATE
PRESENTED BY MINA NAGUIB
BIG DATA MONTRÉAL AUGUST 2015
Director, Platform Engineering
@AdGear
Background:
Software hacker
Network enthusiast
Web designer, SQL weaver, kernel debugger, PM, RE, SRE, QA, ...
What I do:
Hire great people at AdGear
Offer technical leadership
Get out of their way
Observe, optimize, rinse, repeat
ABOUT ME
AdGear is a digital advertising technology company, providing
platforms, ad technology and services to publishers,
advertisers, media agencies and ad tech providers.
AdGear delivers a full-stack advertising platform that includes:
Demand-Side Platform, Supply-Side Platform, 1st and 3rd Party
Ad Server, Attribution and Analytics, and multiple retargeting
offerings.
ABOUT ADGEAR
ABOUT ADGEAR
2008 year founded
40 employees
2 offices (514, 416)
~10 billion impressions served per month
0.5 Trillion Bid Requests per month
ABOUT ADGEAR
We power these fine purveyors of your favourite internet services:
ABOUT ADGEAR
And many others, sometimes in the background
ADGEAR: DATA
Internet advertising generates lots of data.
The majority of which is transactional data
that must be accurately accounted.
If you can't account for it, it
didn't happen. The data
generated is often more
important than the
occurrence of the event
itself.
ADGEAR: SOME NUMBERS
September 2008 First event served in production
2008 2 events / second
2010 250 events / second
2012 2,500 events / second
2014 5,500 events / second
ADGEAR: SOME NUMBERS
September 2008 First event served in production
2008 2 events / second
2010 250 events / second
2012 80,000 events / second
2014 200,000 events / second
RTB Changed the game:
ADGEAR: DATA
From Day 1:
Offer customers a self-serve reporting section in the UI to report on
what happened
Make it responsive, pivotable, discoverable, useful and insightful
We're competing against dinosaurs with closed-day banking
mentality - go for realtime and semi-realtime
Safe and correct - better say N/A than offer a partial metric
ADGEAR: DATA
The data architecture plan, circa 2008
Step 1: Log the event locally on the server it occurs on
Step 2: Harvest the events
Step 3: ????
Step 4: Profit!
ADGEAR: DATA
Step 1: Log the event locally on the server it occurs on
Step 2: Harvest the events
Step 3: ???? (How hard can this really be ?)
Step 4: Profit!
The data architecture plan, circa 2008
ADGEAR: DATA
2008 2009 2010
The elusive Step 3
Raw event management Home-grown "Harvester" library
Raw event warehousing Single unix filesystem, .json.gz files, .sqlite files
Raw event analysis+aggregation "Harvester" library streaming abstraction, custom jobs
Aggregate metrics warehousing PostgreSQL (app-db) tables, key-value design
Reporting Primary web-based app accessing aggregates key-values table
ADGEAR: DATA
2009 2010 2011 2012
Raw event management Home-grown "Harvester" library
Raw event warehousing Single unix filesystem, .json.gz files, .sqlite CEROD files
Raw event analysis+aggregation "Harvester" library streaming abstraction, custom jobs
Aggregate metrics warehousing PostgreSQL (app-db) tables, key-value design
Reporting Primary web-based app accessing aggregates key-values table
The elusive Step 3
ADGEAR: DATA
2009 2010 2011 2012
Raw event management Home-grown "Harvester" + "DDAL" libraries
Raw event warehousing Multiple servers, unix filesystem, .json.gz files, .sqlite CEROD files
Raw event analysis+aggregation "Harvester" + "DDAL" libraries streaming abstraction, custom
jobs
Aggregate metrics warehousing PostgreSQL (app-db) tables, key-value design
Reporting Primary web-based app accessing aggregates key-values table
The elusive Step 3
ADGEAR: DATA
2010 2011 2012 2013
Raw event management Home-grown "Harvester" + "DDAL" libraries
Raw event warehousing Multiple servers, unix filesystem, .json.gz files, .sqlite CEROD files
Raw event analysis+aggregation "Harvester" + "DDAL" libraries streaming abstraction, custom jobs
Aggregate metrics warehousing Dedicated MongoDB server, hourly documents
Reporting Dedicated reporting service abstracting away Mongo DB
The elusive Step 3
ADGEAR: DATA
2011 2012 2013 2014
Raw event management Home-grown "Harvester" + "DDAL" libraries
Raw event warehousing Multiple servers, unix filesystem, .json.gz files, .sqlite CEROD files
Raw event analysis+aggregation "Harvester" + "DDAL" libraries streaming abstraction, custom jobs
Aggregate metrics warehousing Dedicated PostgreSQL reporting DB, star schema
Reporting Dedicated reporting service abstracting away PG DB
The elusive Step 3
ADGEAR: DATA
1 2012 2013 2014 2015
Raw event management Home-grown push mechanism
Raw event warehousing HDFS, .json.gz files, .avro files
Raw event analysis+aggregation Hadoop M+R, Pig, Hive
Aggregate metrics warehousing Dedicated PostgreSQL reporting DB, star schema
Reporting Dedicated reporting service abstracting away PG DB
The elusive Step 3
ADGEAR: DATA
2012 2013 2014 2015
Raw event management Home-grown push mechanism
Raw event warehousing HDFS, .json.gz files, .avro files
Raw event analysis+aggregation Hadoop M+R, Pig, Hive
Aggregate metrics warehousing Vertica
Reporting Dedicated reporting service abstracting away Vertica DB
The elusive Step 3
ADGEAR: DATA
2015
Raw event management Home-grown push mechanism, Kafka
Raw event warehousing HDFS, .json.gz files, .avro files
Raw event analysis+aggregation Hadoop, HP Vertica, Hive
Aggregate metrics warehousing HP Vertica
Reporting Dedicated reporting service abstracting awayVertica DB
The elusive Step 3
ADGEAR: DATA
= The "Secret Sauce" *
* Actual unsolicited description used by myself and other Vertica customers
From a dev/ops perspective,
Vertica is:
• A columnar database
• Offers a familiar DB/Schema/Table/Row/Column
paradigm
• Distributed + Horizontally scalable
• Easily accessible from the CLI and many programming
languages
• Extremely fast
• SOLID SQL support. Not 100% ANSI SQL-99
Compliant, but more than enough for our use cases
• Stable, predictable, easy to administer
• Well documented
• Enterprise-ready, in production at many large
companies
From a dev/ops perspective,
Vertica is:
• A columnar database
• Offers a familiar DB/Schema/Table/Row/Column
paradigm
• Distributed + Horizontally scalable
• Easily accessible from the CLI and many programming
languages
• Extremely fast
• SOLID SQL support. Not 100% ANSI SQL-99
Compliant, but more than enough for our use cases
• Stable, predictable, easy to administer
• Well documented
• Enterprise-ready, in production at many large
companies
From a product perspective:
Extremely fast
At AdGear
FactTable N
Hour Dimension1 Dimension2 Dimension3 Dimension...N Metric1 Metric2 Metric...N
2015-08-05-01 1 55 105 9 1 0 0
2015-08-05-01 1 56 106 9 3551 6 9
2015-08-05-01 1 56 107 9 2382 6 66
2015-08-05-01 2 901 107 33 23 4 0
Growth via Append-Only row insertion
At AdGear
FactTable 1 FactTable 2 FactTable 3
DimensionTable 1 DimensionTable 3 DimensionTable 5
DimensionTable 2 DimensionTable 4
Simple SQL joins
At AdGear
Let's see it in action
To download and try:
https://my.vertica.com/community/
Free, up to 1TB, 3 nodes, no time limit
Get in touch:
http://adgear.com/
mina@adgear.com
Mina Naguib
To learn more:
http://www.vertica.com/
Thank you

More Related Content

What's hot

Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
DataWorks Summit
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 

What's hot (20)

Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Depositing Value from Transactional Data at Danske Bank
Depositing Value from Transactional Data at Danske BankDepositing Value from Transactional Data at Danske Bank
Depositing Value from Transactional Data at Danske Bank
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBig Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 

Similar to BDM39: HP Vertica BI: Sub-second big data analytics your users and developers can truly appreciate - by Mina Naguib, AdGear

RedisGraph A Low Latency Graph DB: Pieter Cailliau
RedisGraph A Low Latency Graph DB: Pieter CailliauRedisGraph A Low Latency Graph DB: Pieter Cailliau
RedisGraph A Low Latency Graph DB: Pieter Cailliau
Redis Labs
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
VMware Tanzu
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
Stylight
 

Similar to BDM39: HP Vertica BI: Sub-second big data analytics your users and developers can truly appreciate - by Mina Naguib, AdGear (20)

Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
RedisGraph A Low Latency Graph DB: Pieter Cailliau
RedisGraph A Low Latency Graph DB: Pieter CailliauRedisGraph A Low Latency Graph DB: Pieter Cailliau
RedisGraph A Low Latency Graph DB: Pieter Cailliau
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos... Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM Exellys
 
Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
AWS Summit Atlanta Keynote
AWS Summit Atlanta KeynoteAWS Summit Atlanta Keynote
AWS Summit Atlanta Keynote
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 

Recently uploaded

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 

Recently uploaded (20)

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

BDM39: HP Vertica BI: Sub-second big data analytics your users and developers can truly appreciate - by Mina Naguib, AdGear

  • 1. HP VERTICA BI SUB-SECOND BIG DATA ANALYTICS YOUR USERS AND DEVELOPERS CAN TRULY APPRECIATE PRESENTED BY MINA NAGUIB BIG DATA MONTRÉAL AUGUST 2015
  • 2. Director, Platform Engineering @AdGear Background: Software hacker Network enthusiast Web designer, SQL weaver, kernel debugger, PM, RE, SRE, QA, ... What I do: Hire great people at AdGear Offer technical leadership Get out of their way Observe, optimize, rinse, repeat ABOUT ME
  • 3. AdGear is a digital advertising technology company, providing platforms, ad technology and services to publishers, advertisers, media agencies and ad tech providers. AdGear delivers a full-stack advertising platform that includes: Demand-Side Platform, Supply-Side Platform, 1st and 3rd Party Ad Server, Attribution and Analytics, and multiple retargeting offerings. ABOUT ADGEAR
  • 4. ABOUT ADGEAR 2008 year founded 40 employees 2 offices (514, 416) ~10 billion impressions served per month 0.5 Trillion Bid Requests per month
  • 5. ABOUT ADGEAR We power these fine purveyors of your favourite internet services:
  • 6. ABOUT ADGEAR And many others, sometimes in the background
  • 7. ADGEAR: DATA Internet advertising generates lots of data. The majority of which is transactional data that must be accurately accounted. If you can't account for it, it didn't happen. The data generated is often more important than the occurrence of the event itself.
  • 8. ADGEAR: SOME NUMBERS September 2008 First event served in production 2008 2 events / second 2010 250 events / second 2012 2,500 events / second 2014 5,500 events / second
  • 9. ADGEAR: SOME NUMBERS September 2008 First event served in production 2008 2 events / second 2010 250 events / second 2012 80,000 events / second 2014 200,000 events / second RTB Changed the game:
  • 10. ADGEAR: DATA From Day 1: Offer customers a self-serve reporting section in the UI to report on what happened Make it responsive, pivotable, discoverable, useful and insightful We're competing against dinosaurs with closed-day banking mentality - go for realtime and semi-realtime Safe and correct - better say N/A than offer a partial metric
  • 11. ADGEAR: DATA The data architecture plan, circa 2008 Step 1: Log the event locally on the server it occurs on Step 2: Harvest the events Step 3: ???? Step 4: Profit!
  • 12. ADGEAR: DATA Step 1: Log the event locally on the server it occurs on Step 2: Harvest the events Step 3: ???? (How hard can this really be ?) Step 4: Profit! The data architecture plan, circa 2008
  • 13. ADGEAR: DATA 2008 2009 2010 The elusive Step 3 Raw event management Home-grown "Harvester" library Raw event warehousing Single unix filesystem, .json.gz files, .sqlite files Raw event analysis+aggregation "Harvester" library streaming abstraction, custom jobs Aggregate metrics warehousing PostgreSQL (app-db) tables, key-value design Reporting Primary web-based app accessing aggregates key-values table
  • 14. ADGEAR: DATA 2009 2010 2011 2012 Raw event management Home-grown "Harvester" library Raw event warehousing Single unix filesystem, .json.gz files, .sqlite CEROD files Raw event analysis+aggregation "Harvester" library streaming abstraction, custom jobs Aggregate metrics warehousing PostgreSQL (app-db) tables, key-value design Reporting Primary web-based app accessing aggregates key-values table The elusive Step 3
  • 15. ADGEAR: DATA 2009 2010 2011 2012 Raw event management Home-grown "Harvester" + "DDAL" libraries Raw event warehousing Multiple servers, unix filesystem, .json.gz files, .sqlite CEROD files Raw event analysis+aggregation "Harvester" + "DDAL" libraries streaming abstraction, custom jobs Aggregate metrics warehousing PostgreSQL (app-db) tables, key-value design Reporting Primary web-based app accessing aggregates key-values table The elusive Step 3
  • 16. ADGEAR: DATA 2010 2011 2012 2013 Raw event management Home-grown "Harvester" + "DDAL" libraries Raw event warehousing Multiple servers, unix filesystem, .json.gz files, .sqlite CEROD files Raw event analysis+aggregation "Harvester" + "DDAL" libraries streaming abstraction, custom jobs Aggregate metrics warehousing Dedicated MongoDB server, hourly documents Reporting Dedicated reporting service abstracting away Mongo DB The elusive Step 3
  • 17. ADGEAR: DATA 2011 2012 2013 2014 Raw event management Home-grown "Harvester" + "DDAL" libraries Raw event warehousing Multiple servers, unix filesystem, .json.gz files, .sqlite CEROD files Raw event analysis+aggregation "Harvester" + "DDAL" libraries streaming abstraction, custom jobs Aggregate metrics warehousing Dedicated PostgreSQL reporting DB, star schema Reporting Dedicated reporting service abstracting away PG DB The elusive Step 3
  • 18. ADGEAR: DATA 1 2012 2013 2014 2015 Raw event management Home-grown push mechanism Raw event warehousing HDFS, .json.gz files, .avro files Raw event analysis+aggregation Hadoop M+R, Pig, Hive Aggregate metrics warehousing Dedicated PostgreSQL reporting DB, star schema Reporting Dedicated reporting service abstracting away PG DB The elusive Step 3
  • 19. ADGEAR: DATA 2012 2013 2014 2015 Raw event management Home-grown push mechanism Raw event warehousing HDFS, .json.gz files, .avro files Raw event analysis+aggregation Hadoop M+R, Pig, Hive Aggregate metrics warehousing Vertica Reporting Dedicated reporting service abstracting away Vertica DB The elusive Step 3
  • 20. ADGEAR: DATA 2015 Raw event management Home-grown push mechanism, Kafka Raw event warehousing HDFS, .json.gz files, .avro files Raw event analysis+aggregation Hadoop, HP Vertica, Hive Aggregate metrics warehousing HP Vertica Reporting Dedicated reporting service abstracting awayVertica DB The elusive Step 3
  • 21. ADGEAR: DATA = The "Secret Sauce" * * Actual unsolicited description used by myself and other Vertica customers
  • 22. From a dev/ops perspective, Vertica is: • A columnar database • Offers a familiar DB/Schema/Table/Row/Column paradigm • Distributed + Horizontally scalable • Easily accessible from the CLI and many programming languages • Extremely fast • SOLID SQL support. Not 100% ANSI SQL-99 Compliant, but more than enough for our use cases • Stable, predictable, easy to administer • Well documented • Enterprise-ready, in production at many large companies
  • 23. From a dev/ops perspective, Vertica is: • A columnar database • Offers a familiar DB/Schema/Table/Row/Column paradigm • Distributed + Horizontally scalable • Easily accessible from the CLI and many programming languages • Extremely fast • SOLID SQL support. Not 100% ANSI SQL-99 Compliant, but more than enough for our use cases • Stable, predictable, easy to administer • Well documented • Enterprise-ready, in production at many large companies
  • 24. From a product perspective: Extremely fast
  • 25. At AdGear FactTable N Hour Dimension1 Dimension2 Dimension3 Dimension...N Metric1 Metric2 Metric...N 2015-08-05-01 1 55 105 9 1 0 0 2015-08-05-01 1 56 106 9 3551 6 9 2015-08-05-01 1 56 107 9 2382 6 66 2015-08-05-01 2 901 107 33 23 4 0 Growth via Append-Only row insertion
  • 26. At AdGear FactTable 1 FactTable 2 FactTable 3 DimensionTable 1 DimensionTable 3 DimensionTable 5 DimensionTable 2 DimensionTable 4 Simple SQL joins
  • 27. At AdGear Let's see it in action
  • 28. To download and try: https://my.vertica.com/community/ Free, up to 1TB, 3 nodes, no time limit Get in touch: http://adgear.com/ mina@adgear.com Mina Naguib To learn more: http://www.vertica.com/ Thank you