SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
The Facebook Data team
Infrastructure and insight

Jeff Hammerbacher
Engineering Manager
January 16, 2008
Agenda

1   Motivations


2   People


3   Products


4   Philosophy
Motivations

▪   Source data living on a horizontally partitioned MySQL tier
▪   Intensive historical analysis difficult
▪   Difficult to assess impact of changes to the site


▪   First try: Reporting and Analysis
▪   Second try: Data Infrastructure and Data Insight
▪   Third try: Data
People
Infrastructure
▪   Rama Ramasamy
▪   Suresh Antony
▪   Avinash Lakshman
▪   Hao Liu
▪   Joydeep Sen Sarma
▪   Ashish Thusoo
▪   Pete Wyckoff
People
Insight
▪   Rupen Parikh
▪   Itamar Rosenn
▪   Ravi Grover
▪   Roddy Lindsay
▪   Cameron Marlow
▪   Danny Ferrante
▪   Ding Zhou
Products
Infrastructure
▪   Hive
    ▪   Hive Shell
    ▪   Pyhive
    ▪   Apidictor
▪   Nest
▪   ???
Products
Insight
 ▪   Experimentation Platform
 ▪   Argus
 ▪   Insights
 ▪   InvEst
 ▪   Growth Report

 ▪   More coming soon...
Philosophy
▪   Data management as important as data analysis
▪   Know your data center: compute, storage, interconnect
▪   Shared nothing means the interconnect is the bottleneck
▪   Engineers are not analysts
▪   You must write code
▪   Don’t collect data without a purpose
▪   Basic statistics is more useful than advanced machine learning
▪   Visualization is as important as basic statistics
▪   Analysis and not reporting
▪   Developing products might be better than YAS
(c) 2008 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Más contenido relacionado

Similar a 20080115yahoobrickhouse

SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
eavanesian
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
BigMine
 

Similar a 20080115yahoobrickhouse (20)

Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team Building
 
You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Free your metadata
Free your metadataFree your metadata
Free your metadata
 
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
 
20080115yahoobrickhouse
20080115yahoobrickhouse20080115yahoobrickhouse
20080115yahoobrickhouse
 
How Organizations can gain Strategic Advantage when Everyone is applying AI
How Organizations can gain Strategic Advantage when Everyone is applying AIHow Organizations can gain Strategic Advantage when Everyone is applying AI
How Organizations can gain Strategic Advantage when Everyone is applying AI
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Data - Science and Engineering slide at Bandungpy Sharing Session
Data - Science and Engineering slide at Bandungpy Sharing SessionData - Science and Engineering slide at Bandungpy Sharing Session
Data - Science and Engineering slide at Bandungpy Sharing Session
 
Associate Software Engineer | Unison
Associate Software Engineer | UnisonAssociate Software Engineer | Unison
Associate Software Engineer | Unison
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
How to Build Data Science Teams that Deliver Business Value
How to Build Data Science Teams that Deliver Business ValueHow to Build Data Science Teams that Deliver Business Value
How to Build Data Science Teams that Deliver Business Value
 
Data Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInData Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedIn
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 

Más de Jeff Hammerbacher (20)

20120223keystone
20120223keystone20120223keystone
20120223keystone
 
20100714accel
20100714accel20100714accel
20100714accel
 
20100608sigmod
20100608sigmod20100608sigmod
20100608sigmod
 
20100423sage
20100423sage20100423sage
20100423sage
 
20100418sos
20100418sos20100418sos
20100418sos
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091110startup2startup
20091110startup2startup20091110startup2startup
20091110startup2startup
 
20091030nasajpl
20091030nasajpl20091030nasajpl
20091030nasajpl
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
20081022cca
20081022cca20081022cca
20081022cca
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

20080115yahoobrickhouse

  • 1.
  • 2. The Facebook Data team Infrastructure and insight Jeff Hammerbacher Engineering Manager January 16, 2008
  • 3. Agenda 1 Motivations 2 People 3 Products 4 Philosophy
  • 4. Motivations ▪ Source data living on a horizontally partitioned MySQL tier ▪ Intensive historical analysis difficult ▪ Difficult to assess impact of changes to the site ▪ First try: Reporting and Analysis ▪ Second try: Data Infrastructure and Data Insight ▪ Third try: Data
  • 5. People Infrastructure ▪ Rama Ramasamy ▪ Suresh Antony ▪ Avinash Lakshman ▪ Hao Liu ▪ Joydeep Sen Sarma ▪ Ashish Thusoo ▪ Pete Wyckoff
  • 6. People Insight ▪ Rupen Parikh ▪ Itamar Rosenn ▪ Ravi Grover ▪ Roddy Lindsay ▪ Cameron Marlow ▪ Danny Ferrante ▪ Ding Zhou
  • 7. Products Infrastructure ▪ Hive ▪ Hive Shell ▪ Pyhive ▪ Apidictor ▪ Nest ▪ ???
  • 8. Products Insight ▪ Experimentation Platform ▪ Argus ▪ Insights ▪ InvEst ▪ Growth Report ▪ More coming soon...
  • 9. Philosophy ▪ Data management as important as data analysis ▪ Know your data center: compute, storage, interconnect ▪ Shared nothing means the interconnect is the bottleneck ▪ Engineers are not analysts ▪ You must write code ▪ Don’t collect data without a purpose ▪ Basic statistics is more useful than advanced machine learning ▪ Visualization is as important as basic statistics ▪ Analysis and not reporting ▪ Developing products might be better than YAS
  • 10. (c) 2008 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0