Despite how fantastic pigs look with lipstick on and how magical elephants look with wings attached, there remains a large gap between what popular big data stacks offer and what end users demand in terms of reporting agility and speed. Join us to learn how Montreal-based AdGear, an advertising technology company, faced challenges as its data volume increased. You will hear how AdGear's data stack evolved to meet these challenges, and how HP Vertica's architecture and features changed the game.
(by Mina Naguib, Technical Director of Platform Engineering at AdGear).
https://youtu.be/tzQUUCuVjVc
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers can truly appreciate - by Mina Naguib, AdGear
1. HP VERTICA BI
SUB-SECOND BIG DATA ANALYTICS
YOUR USERS AND DEVELOPERS CAN
TRULY APPRECIATE
PRESENTED BY MINA NAGUIB
BIG DATA MONTRÉAL AUGUST 2015
2. Director, Platform Engineering
@AdGear
Background:
Software hacker
Network enthusiast
Web designer, SQL weaver, kernel debugger, PM, RE, SRE, QA, ...
What I do:
Hire great people at AdGear
Offer technical leadership
Get out of their way
Observe, optimize, rinse, repeat
ABOUT ME
3. AdGear is a digital advertising technology company, providing
platforms, ad technology and services to publishers,
advertisers, media agencies and ad tech providers.
AdGear delivers a full-stack advertising platform that includes:
Demand-Side Platform, Supply-Side Platform, 1st and 3rd Party
Ad Server, Attribution and Analytics, and multiple retargeting
offerings.
ABOUT ADGEAR
4. ABOUT ADGEAR
2008 year founded
40 employees
2 offices (514, 416)
~10 billion impressions served per month
0.5 Trillion Bid Requests per month
7. ADGEAR: DATA
Internet advertising generates lots of data.
The majority of which is transactional data
that must be accurately accounted.
If you can't account for it, it
didn't happen. The data
generated is often more
important than the
occurrence of the event
itself.
8. ADGEAR: SOME NUMBERS
September 2008 First event served in production
2008 2 events / second
2010 250 events / second
2012 2,500 events / second
2014 5,500 events / second
9. ADGEAR: SOME NUMBERS
September 2008 First event served in production
2008 2 events / second
2010 250 events / second
2012 80,000 events / second
2014 200,000 events / second
RTB Changed the game:
10. ADGEAR: DATA
From Day 1:
Offer customers a self-serve reporting section in the UI to report on
what happened
Make it responsive, pivotable, discoverable, useful and insightful
We're competing against dinosaurs with closed-day banking
mentality - go for realtime and semi-realtime
Safe and correct - better say N/A than offer a partial metric
11. ADGEAR: DATA
The data architecture plan, circa 2008
Step 1: Log the event locally on the server it occurs on
Step 2: Harvest the events
Step 3: ????
Step 4: Profit!
12. ADGEAR: DATA
Step 1: Log the event locally on the server it occurs on
Step 2: Harvest the events
Step 3: ???? (How hard can this really be ?)
Step 4: Profit!
The data architecture plan, circa 2008
13. ADGEAR: DATA
2008 2009 2010
The elusive Step 3
Raw event management Home-grown "Harvester" library
Raw event warehousing Single unix filesystem, .json.gz files, .sqlite files
Raw event analysis+aggregation "Harvester" library streaming abstraction, custom jobs
Aggregate metrics warehousing PostgreSQL (app-db) tables, key-value design
Reporting Primary web-based app accessing aggregates key-values table
14. ADGEAR: DATA
2009 2010 2011 2012
Raw event management Home-grown "Harvester" library
Raw event warehousing Single unix filesystem, .json.gz files, .sqlite CEROD files
Raw event analysis+aggregation "Harvester" library streaming abstraction, custom jobs
Aggregate metrics warehousing PostgreSQL (app-db) tables, key-value design
Reporting Primary web-based app accessing aggregates key-values table
The elusive Step 3
15. ADGEAR: DATA
2009 2010 2011 2012
Raw event management Home-grown "Harvester" + "DDAL" libraries
Raw event warehousing Multiple servers, unix filesystem, .json.gz files, .sqlite CEROD files
Raw event analysis+aggregation "Harvester" + "DDAL" libraries streaming abstraction, custom
jobs
Aggregate metrics warehousing PostgreSQL (app-db) tables, key-value design
Reporting Primary web-based app accessing aggregates key-values table
The elusive Step 3
16. ADGEAR: DATA
2010 2011 2012 2013
Raw event management Home-grown "Harvester" + "DDAL" libraries
Raw event warehousing Multiple servers, unix filesystem, .json.gz files, .sqlite CEROD files
Raw event analysis+aggregation "Harvester" + "DDAL" libraries streaming abstraction, custom jobs
Aggregate metrics warehousing Dedicated MongoDB server, hourly documents
Reporting Dedicated reporting service abstracting away Mongo DB
The elusive Step 3
17. ADGEAR: DATA
2011 2012 2013 2014
Raw event management Home-grown "Harvester" + "DDAL" libraries
Raw event warehousing Multiple servers, unix filesystem, .json.gz files, .sqlite CEROD files
Raw event analysis+aggregation "Harvester" + "DDAL" libraries streaming abstraction, custom jobs
Aggregate metrics warehousing Dedicated PostgreSQL reporting DB, star schema
Reporting Dedicated reporting service abstracting away PG DB
The elusive Step 3
18. ADGEAR: DATA
1 2012 2013 2014 2015
Raw event management Home-grown push mechanism
Raw event warehousing HDFS, .json.gz files, .avro files
Raw event analysis+aggregation Hadoop M+R, Pig, Hive
Aggregate metrics warehousing Dedicated PostgreSQL reporting DB, star schema
Reporting Dedicated reporting service abstracting away PG DB
The elusive Step 3
19. ADGEAR: DATA
2012 2013 2014 2015
Raw event management Home-grown push mechanism
Raw event warehousing HDFS, .json.gz files, .avro files
Raw event analysis+aggregation Hadoop M+R, Pig, Hive
Aggregate metrics warehousing Vertica
Reporting Dedicated reporting service abstracting away Vertica DB
The elusive Step 3
20. ADGEAR: DATA
2015
Raw event management Home-grown push mechanism, Kafka
Raw event warehousing HDFS, .json.gz files, .avro files
Raw event analysis+aggregation Hadoop, HP Vertica, Hive
Aggregate metrics warehousing HP Vertica
Reporting Dedicated reporting service abstracting awayVertica DB
The elusive Step 3
21. ADGEAR: DATA
= The "Secret Sauce" *
* Actual unsolicited description used by myself and other Vertica customers
22. From a dev/ops perspective,
Vertica is:
• A columnar database
• Offers a familiar DB/Schema/Table/Row/Column
paradigm
• Distributed + Horizontally scalable
• Easily accessible from the CLI and many programming
languages
• Extremely fast
• SOLID SQL support. Not 100% ANSI SQL-99
Compliant, but more than enough for our use cases
• Stable, predictable, easy to administer
• Well documented
• Enterprise-ready, in production at many large
companies
23. From a dev/ops perspective,
Vertica is:
• A columnar database
• Offers a familiar DB/Schema/Table/Row/Column
paradigm
• Distributed + Horizontally scalable
• Easily accessible from the CLI and many programming
languages
• Extremely fast
• SOLID SQL support. Not 100% ANSI SQL-99
Compliant, but more than enough for our use cases
• Stable, predictable, easy to administer
• Well documented
• Enterprise-ready, in production at many large
companies
28. To download and try:
https://my.vertica.com/community/
Free, up to 1TB, 3 nodes, no time limit
Get in touch:
http://adgear.com/
mina@adgear.com
Mina Naguib
To learn more:
http://www.vertica.com/
Thank you