SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
!
!
!
Every Second – in over 50,000 Categories
eBay Analytics

   >50 TB/day             new data                                >100k data elements
                                          >100 Trillion pairs of information
>150 PB/day            Processed
                                                >50k chains of logic
                                                                                         >7500
                                                                              business users & analysts

       Structured/Unstructured

                                       turning over a TB every             second
  24   x7x365
       Always online                                          Millions of queries/day
                                 99.98+% Availability
                                                                       Near-Real-time
                                                                                    3
Big
Detail
Designing for the Unknown
>85% of analytical workload is NEW & Unknown

The metrics you know are cheap

The metrics you don’t know are expensive – but high in potential ROI

Exploration & Testing are core pillars of an analytics-driven
  organization
incremental   storage


        Volume

       DATA
incremental   storage


        Volume

       DATA
                 Velocity      processing

                            change
incremental   storage


                            Volume

                            DATA
    structured    Variety            Velocity      processing
semi-structured
                                                change
        un-structured
Value > Cost
                         $’s per year in incremental revenue




www.wallpapertimes.com
!    Data Growing Faster
•    Impact
Data


         questions later
         structure later



              ($0.04/GB, $80/2TB)

single HDFS instances >50PB




Value > Cost                        16
Synonyms	
  derived	
  from	
  top	
  queries	
  in	
  item	
  query	
  clusters	
  
texas	
  instruments	
  ba	
  ii	
  plus	
  
                                          /	
  ba	
  ii	
  plus	
  
brighton	
  handbag	
                     brighton	
  purse	
  
lenovo	
  x200	
                          thinkpad	
  x200	
  
king	
  bedspread	
                       king	
  coverlet	
  
rockabilly	
  dress	
                     swing	
  dress	
  
1963	
  ford	
  falcon	
                  63	
  falcon	
  
jessica	
  simpson	
  hair	
  extensions	
  
                                          jessica	
  simpson	
  hairdo	
  
                                        	
  
              Abbrevia7ons/acronym	
  derived	
  from	
  query	
  transi7ons	
  
stanford	
  ky	
                          stanford	
  kentucky	
  
dc	
  sub	
                               dc	
  subwoofer	
  
snowboard	
  helmet	
  l	
                snowboard	
  helmet	
  large	
  
motorcycle	
  cam	
                       motorcycle	
  camera	
  
diamond	
  amp	
                          diamond	
  amplifier	
  
Toys and Hobbies
ATC   >   Artist trading card   in ART
ATC   >   Automatic Tool Change in Business and Industrial
Offline                   Online                            Clients


Editorial                         Service
                                                                   Search

                                   Code
                                                                   Selling

                                    Small
                                    Data                           Others…


               Behavioral Logs
                                  Big Data Store
               Document Data      NoSQL



            Human Judgment

                                 <3 milliseconds per query
                                 1.2 billion queries per day
                                 1,000’s of queries per second per machine
German Compound Words
 •    German compound words can be arbitrarily created and extremely long
          Adidastrainingsanzug (Adidas track suit)
          Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
                   (beef labeling regulation & delegation of supervision law)
 •    Syntactically, words can be combined and split in many ways.
 •    Some words shouldn’t be de-compounded.
          beiden (both) – bei(at) den(the)
 •    Too many candidates for
          Granitpflastersteine (granite paving stones)
          Granit(granite) pflastersteine(cobblestones)
          Granit(granite) pflaster(paving/band-aid) steine(stones)
 •    Binding characters
      Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de)
      Hochzeitschuhe (129 hits on ebay.de).
Analyze & Report
                                                                         Discover & Explore


      Structured                               Semi-Structured                                  Unstructured
         SQL                                       SQL++                                      Java/C++/Pig/Hive
Production Data Warehousing                Contextual-Complex Analytics                       Structure the Unstructured
Large Concurrent User-base             Deep, Seasonal, Consumable Data Sets                        Detect Patterns




  Data Warehouse                            Data Warehouse +                                         Hadoop
                                               Behavioral



Enterprise-class System                Low End Enterprise-class System                    Commodity Hardware System



        8+PB                                      60+PB                                              40+PB
Brian knows the satisfaction and importance of good search results,
and his team is responsible for ensuring that the millions of queries
entered onto the eBay website provide just that. The words “Did you
mean…?” are incredibly meaningful to Brian as he combs through a
universe of queries altered by synonyms, acronyms, attributes, and
expansions. He’s been doing this sort of work since he joined eBay
nine years ago. Brian has loved technology ever since junior high
school, when he played the game “Lunar Lander” on a paper
teletype before video games existed, and pulled pranks in the local
Radio Shack. When Brian gets outside, he goes backpacking on
Mount Whitney, enters triathlons, and walks on water (barefoot water
skiing).
2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy

Más contenido relacionado

Similar a 2011 x.commerce Innovate Data Alchemy

CloudCon Data Mining Presentation
CloudCon Data Mining PresentationCloudCon Data Mining Presentation
CloudCon Data Mining Presentation
Brian Johnson
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
Amazon Web Services
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
David Walker
 
Not about the Big in Big Data
Not about the Big in Big DataNot about the Big in Big Data
Not about the Big in Big Data
DataWorks Summit
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computing
deimos
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastner
liqiang xu
 

Similar a 2011 x.commerce Innovate Data Alchemy (20)

The New Alchemy: Turning Data into Gold By Brian Johnson Engineering Director...
The New Alchemy: Turning Data into Gold By Brian Johnson Engineering Director...The New Alchemy: Turning Data into Gold By Brian Johnson Engineering Director...
The New Alchemy: Turning Data into Gold By Brian Johnson Engineering Director...
 
The New Alchemy Turning Data into Gold
The New Alchemy Turning Data into GoldThe New Alchemy Turning Data into Gold
The New Alchemy Turning Data into Gold
 
CloudCon Data Mining Presentation
CloudCon Data Mining PresentationCloudCon Data Mining Presentation
CloudCon Data Mining Presentation
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
 
The Agile Data Warehouse Webinar – Next Generation BI
The Agile Data Warehouse Webinar – Next Generation BIThe Agile Data Warehouse Webinar – Next Generation BI
The Agile Data Warehouse Webinar – Next Generation BI
 
Kurukshetra - Big Data
Kurukshetra - Big DataKurukshetra - Big Data
Kurukshetra - Big Data
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database
 
AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당:: AWS Summit Online Korea 2020
AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당::  AWS Summit Online Korea 2020AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당::  AWS Summit Online Korea 2020
AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당:: AWS Summit Online Korea 2020
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
 
SIL rapid capture
SIL rapid captureSIL rapid capture
SIL rapid capture
 
2012: The End of the World?
2012: The End of the World?2012: The End of the World?
2012: The End of the World?
 
Not about the Big in Big Data
Not about the Big in Big DataNot about the Big in Big Data
Not about the Big in Big Data
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computing
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
 
Measure Data Quality
Measure Data QualityMeasure Data Quality
Measure Data Quality
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastner
 

Más de Brian Johnson

2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms
Brian Johnson
 

Más de Brian Johnson (7)

Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
 
Treemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataTreemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical Data
 
11 964 181 System And Method For Providi
11 964 181 System And Method For Providi11 964 181 System And Method For Providi
11 964 181 System And Method For Providi
 
11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency Assignmen11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency Assignmen
 
10 977 279 Method And System For Categor
10 977 279 Method And System For Categor10 977 279 Method And System For Categor
10 977 279 Method And System For Categor
 
11 869 290 Electronic Publication System
11 869 290 Electronic Publication System11 869 290 Electronic Publication System
11 869 290 Electronic Publication System
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

2011 x.commerce Innovate Data Alchemy

  • 2. Every Second – in over 50,000 Categories
  • 3. eBay Analytics >50 TB/day new data >100k data elements >100 Trillion pairs of information >150 PB/day Processed >50k chains of logic >7500 business users & analysts Structured/Unstructured turning over a TB every second 24 x7x365 Always online Millions of queries/day 99.98+% Availability Near-Real-time 3
  • 4. Big
  • 6. Designing for the Unknown >85% of analytical workload is NEW & Unknown The metrics you know are cheap The metrics you don’t know are expensive – but high in potential ROI Exploration & Testing are core pillars of an analytics-driven organization
  • 7. incremental storage Volume DATA
  • 8. incremental storage Volume DATA Velocity processing change
  • 9. incremental storage Volume DATA structured Variety Velocity processing semi-structured change un-structured
  • 10. Value > Cost $’s per year in incremental revenue www.wallpapertimes.com
  • 11. !  Data Growing Faster
  • 12.
  • 13. •  Impact
  • 14.
  • 15.
  • 16. Data questions later structure later ($0.04/GB, $80/2TB) single HDFS instances >50PB Value > Cost 16
  • 17.
  • 18. Synonyms  derived  from  top  queries  in  item  query  clusters   texas  instruments  ba  ii  plus   /  ba  ii  plus   brighton  handbag   brighton  purse   lenovo  x200   thinkpad  x200   king  bedspread   king  coverlet   rockabilly  dress   swing  dress   1963  ford  falcon   63  falcon   jessica  simpson  hair  extensions   jessica  simpson  hairdo     Abbrevia7ons/acronym  derived  from  query  transi7ons   stanford  ky   stanford  kentucky   dc  sub   dc  subwoofer   snowboard  helmet  l   snowboard  helmet  large   motorcycle  cam   motorcycle  camera   diamond  amp   diamond  amplifier  
  • 19. Toys and Hobbies ATC > Artist trading card in ART ATC > Automatic Tool Change in Business and Industrial
  • 20.
  • 21.
  • 22. Offline Online Clients Editorial Service Search Code Selling Small Data Others… Behavioral Logs Big Data Store Document Data NoSQL Human Judgment <3 milliseconds per query 1.2 billion queries per day 1,000’s of queries per second per machine
  • 23.
  • 24. German Compound Words •  German compound words can be arbitrarily created and extremely long Adidastrainingsanzug (Adidas track suit) Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz (beef labeling regulation & delegation of supervision law) •  Syntactically, words can be combined and split in many ways. •  Some words shouldn’t be de-compounded. beiden (both) – bei(at) den(the) •  Too many candidates for Granitpflastersteine (granite paving stones) Granit(granite) pflastersteine(cobblestones) Granit(granite) pflaster(paving/band-aid) steine(stones) •  Binding characters Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de) Hochzeitschuhe (129 hits on ebay.de).
  • 25. Analyze & Report Discover & Explore Structured Semi-Structured Unstructured SQL SQL++ Java/C++/Pig/Hive Production Data Warehousing Contextual-Complex Analytics Structure the Unstructured Large Concurrent User-base Deep, Seasonal, Consumable Data Sets Detect Patterns Data Warehouse Data Warehouse + Hadoop Behavioral Enterprise-class System Low End Enterprise-class System Commodity Hardware System 8+PB 60+PB 40+PB
  • 26.
  • 27. Brian knows the satisfaction and importance of good search results, and his team is responsible for ensuring that the millions of queries entered onto the eBay website provide just that. The words “Did you mean…?” are incredibly meaningful to Brian as he combs through a universe of queries altered by synonyms, acronyms, attributes, and expansions. He’s been doing this sort of work since he joined eBay nine years ago. Brian has loved technology ever since junior high school, when he played the game “Lunar Lander” on a paper teletype before video games existed, and pulled pranks in the local Radio Shack. When Brian gets outside, he goes backpacking on Mount Whitney, enters triathlons, and walks on water (barefoot water skiing).