SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
Citrusleaf
“Sifting, Sorting and Scanning for Gold –
       g,       g            g
   The Real‐Time User Data Problem”

  Brian Bulkowski, CEO and Founder


             © 2011 Citrusleaf. All rights reserved.   1
NoSQL in Digital Advertising

• Display advertising – today’s state of the art
  Display advertising  today s state of the art
• Real‐time data in digital advertising
• C
  Case study  
          d
       – User data store & cookie mapping
• Market differentiation
• Moving beyond the last click
       g y



Citrusleaf Confidential         © 2011 Citrusleaf. All rights reserved.   2
Display Advertising
• Lie:
   – Content determines the value of an impression
• Lie: 
   – The value of an individual user is constant
   – The decision funnel
     The decision funnel
• Truth:
   – The right ad to the right user at the right time
     The right ad, to the right user, at the right time



                    © 2011 Citrusleaf. All rights reserved.   3
Today’s state of the art

•   Impression logs => user segments (warehouse)
•   Third part data => user segments
•   Region segments
•   Real‐time per‐user factor
    • Frequency caps
    • Session management for cookie less users
      Session management for cookie‐less users
• Mapping external partner IDs
• Build optimization tables based on log analysis
  Build optimization tables based on log analysis

    Simple math determines highest value ad
    Simple math determines highest value ad
                   © 2011 Citrusleaf. All rights reserved.   4
Real‐time data requirements
• Scalable and 
  flexible
  fl ibl
• 100% availability
• Billion object
  support
• High performance
  with low hardware
  cost
• Sophisticated data
  eviction policies
                  © 2010 Citrusleaf. All rights reserved.   5
Case Study
Map store                                         User store
Key
  y       cookie string or
                      g                           Key                internal user id
          partner id                              Value              Segment data
Value     internal user id (8 bytes)                                 Frequency caps
                                                                     Other optimization data

North America                                     North America
    1B ~ 1.5B objects                                 500M ~ 800M objects
    Write load:
    Write load:                                       Write load:
                                                      Write load:
        1k ~ 2k per second                                10K ~ 20K per second
    Read load:                                        Read load:
        10k ~ 50k per second
                  p                                       20k ~ 100k per second

Configuration                                     Configuration
   DRAM backed by disk
   DRAM backed by disk                               1T  4T user storage
                                                     1T ~ 4T user storage
   Lowest latency (0.4 ms)                           Low latency (0.8 ms)
                           © 2011 Citrusleaf. All rights reserved.                     6
Market differentiation

• Control over impressions
  Control over impressions

• Quality and variety of inventory
        y           y            y

• Better insight generation & tracking
• And:
  – High quality user understanding


           Trend: multiple user stores

                  © 2011 Citrusleaf. All rights reserved.   7
The next challenge
Weakness: “last ad takes all”
   Performance is judged externally by the last 
   P f          i j d d t         ll b th l t
   advertiser before a conversion

So… game the system to place strategically
   New technology allows real‐time understanding of 
   user behavior

But …
   Will advertisers stand for it?

                  © 2011 Citrusleaf. All rights reserved.   8
Indexed user behavior storage
• Store 100B+ user 
  behavior objects 
  b h i      bj t
  (1B per day, 90 
  days)
• Index by user_id, 
  giving sub‐second 
  access to entire 
  behavior chain
• O ti i d f
  Optimized for 
  rotational media
• Time‐based eviction
  Time‐based eviction
                    © 2010 Citrusleaf. All rights reserved.   9
Indexed user behavior storage
•    Solves the “needle in a haystack” problem
•    Allows immediate behavioral triggering
     All    i     di    b h i l i        i
•    Removes ETL and Map/Reduce scans
•    Allows sophisticated attribution models,
    (and advanced optimization and reporting)

                   The right ad,
                         g      ,
                  to the right user,
                   at the right time
                     t th i ht ti
                  © 2011 Citrusleaf. All rights reserved.   10
Citrusleaf
Making Web Scale Easy and Affordable
M ki W b S l E          d Aff d bl

 Contact Info: brian@citrusleaf.com
 Contact Info: brian@citrusleaf com



          © 2011 Citrusleaf. All rights reserved.   11

Más contenido relacionado

Más de DATAVERSITY

The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 

Más de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Último

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Sifting, Sorting and Scanning for Gold – The Real-Time User Data Problem

  • 1. Citrusleaf “Sifting, Sorting and Scanning for Gold – g, g g The Real‐Time User Data Problem” Brian Bulkowski, CEO and Founder © 2011 Citrusleaf. All rights reserved. 1
  • 2. NoSQL in Digital Advertising • Display advertising – today’s state of the art Display advertising  today s state of the art • Real‐time data in digital advertising • C Case study   d – User data store & cookie mapping • Market differentiation • Moving beyond the last click g y Citrusleaf Confidential © 2011 Citrusleaf. All rights reserved. 2
  • 3. Display Advertising • Lie: – Content determines the value of an impression • Lie:  – The value of an individual user is constant – The decision funnel The decision funnel • Truth: – The right ad to the right user at the right time The right ad, to the right user, at the right time © 2011 Citrusleaf. All rights reserved. 3
  • 4. Today’s state of the art • Impression logs => user segments (warehouse) • Third part data => user segments • Region segments • Real‐time per‐user factor • Frequency caps • Session management for cookie less users Session management for cookie‐less users • Mapping external partner IDs • Build optimization tables based on log analysis Build optimization tables based on log analysis Simple math determines highest value ad Simple math determines highest value ad © 2011 Citrusleaf. All rights reserved. 4
  • 5. Real‐time data requirements • Scalable and  flexible fl ibl • 100% availability • Billion object support • High performance with low hardware cost • Sophisticated data eviction policies © 2010 Citrusleaf. All rights reserved. 5
  • 6. Case Study Map store User store Key y cookie string or g Key internal user id partner id Value   Segment data Value   internal user id (8 bytes) Frequency caps Other optimization data North America North America 1B ~ 1.5B objects 500M ~ 800M objects Write load: Write load: Write load: Write load: 1k ~ 2k per second 10K ~ 20K per second Read load: Read load: 10k ~ 50k per second p 20k ~ 100k per second Configuration Configuration DRAM backed by disk DRAM backed by disk 1T  4T user storage 1T ~ 4T user storage Lowest latency (0.4 ms) Low latency (0.8 ms) © 2011 Citrusleaf. All rights reserved. 6
  • 7. Market differentiation • Control over impressions Control over impressions • Quality and variety of inventory y y y • Better insight generation & tracking • And: – High quality user understanding Trend: multiple user stores © 2011 Citrusleaf. All rights reserved. 7
  • 8. The next challenge Weakness: “last ad takes all” Performance is judged externally by the last  P f i j d d t ll b th l t advertiser before a conversion So… game the system to place strategically New technology allows real‐time understanding of  user behavior But … Will advertisers stand for it? © 2011 Citrusleaf. All rights reserved. 8
  • 9. Indexed user behavior storage • Store 100B+ user  behavior objects  b h i bj t (1B per day, 90  days) • Index by user_id,  giving sub‐second  access to entire  behavior chain • O ti i d f Optimized for  rotational media • Time‐based eviction Time‐based eviction © 2010 Citrusleaf. All rights reserved. 9
  • 10. Indexed user behavior storage • Solves the “needle in a haystack” problem • Allows immediate behavioral triggering All i di b h i l i i • Removes ETL and Map/Reduce scans • Allows sophisticated attribution models, (and advanced optimization and reporting) The right ad, g , to the right user, at the right time t th i ht ti © 2011 Citrusleaf. All rights reserved. 10
  • 11. Citrusleaf Making Web Scale Easy and Affordable M ki W b S l E d Aff d bl Contact Info: brian@citrusleaf.com Contact Info: brian@citrusleaf com © 2011 Citrusleaf. All rights reserved. 11