2. Evolving a New Analytical Platform
What Works and What’s Missing
Jeff Hammerbacher
Chief Scientist, Cloudera
June 8, 2010
Saturday, June 12, 2010
3. My Background
Thanks for Asking
▪ hammer@cloudera.com
▪ Studied Mathematics at Harvard
▪ Worked as a Quant on Wall Street
▪ Conceived, built, and led Data team at Facebook
▪ Nearly 30 amazing engineers and data scientists
▪ Several open source projects and research papers
▪ Founder of Cloudera
▪ Chief Scientist
▪ Also, check out the book “Beautiful Data”
Saturday, June 12, 2010
4. Presentation Outline
▪ BI: Science for Profit
▪ Need tools for whole research cycle
▪ SQL Server 2008 R2: defining the platform
▪ State of the Platform Ecosystem
▪ New Foundations: Hadoop
▪ Boiling the Frog
▪ Future developments
▪ Questions and Discussion
Saturday, June 12, 2010
5. BI is looking more like science (for profit)
Saturday, June 12, 2010
6. Jim Gray: Science entering Fourth Paradigm
“We have to do better at producing tools to
support the whole research cycle”
Saturday, June 12, 2010
7. RDBMS only a small part of this tool set
Saturday, June 12, 2010
10. ETL: SQL Server Integration Services
RDBMS: SQL Server
Saturday, June 12, 2010
11. ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Saturday, June 12, 2010
12. ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Saturday, June 12, 2010
13. ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
Saturday, June 12, 2010
14. CEP: StreamInsight
ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
Saturday, June 12, 2010
15. CEP: StreamInsight
ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
OLAP: PowerPivot
Saturday, June 12, 2010
16. MDM: Master Data Services
CEP: StreamInsight
ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
OLAP: PowerPivot
Saturday, June 12, 2010
17. Collaboration: SharePoint
MDM: Master Data Services
CEP: StreamInsight
ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
OLAP: PowerPivot
Saturday, June 12, 2010
18. What do we call this unified suite?
Saturday, June 12, 2010
37. 2007: Make Hadoop scale
Yahoo! makes Pig open source
Saturday, June 12, 2010
38. Jim Gray’s “Fourth Paradigm” lecture
2007: Make Hadoop scale
Yahoo! makes Pig open source
Saturday, June 12, 2010
39. Randy Bryant’s “DISC” lecture
Jim Gray’s “Fourth Paradigm” lecture
2007: Make Hadoop scale
Yahoo! makes Pig open source
Saturday, June 12, 2010
40. Randy Bryant’s “DISC” lecture
Jim Gray’s “Fourth Paradigm” lecture
2007: Make Hadoop scale
Yahoo! makes Pig open source
Powerset makes HBase open source
Saturday, June 12, 2010
42. 2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Saturday, June 12, 2010
43. First Hadoop Summit
2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Saturday, June 12, 2010
44. First Hadoop Summit
2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Yahoo! builds production webmap with Hadoop
Saturday, June 12, 2010
45. Facebook makes Hive open source
First Hadoop Summit
2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Yahoo! builds production webmap with Hadoop
Saturday, June 12, 2010
46. “MapReduce: A Major Step Backwards”
Facebook makes Hive open source
First Hadoop Summit
2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Yahoo! builds production webmap with Hadoop
Saturday, June 12, 2010
48. 2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Saturday, June 12, 2010
49. First Hadoop World NYC
2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Saturday, June 12, 2010
50. Yahoo! sorts a petabyte with Hadoop
First Hadoop World NYC
2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Saturday, June 12, 2010
51. Yahoo! sorts a petabyte with Hadoop
First Hadoop World NYC
2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Cloudera adds training, support, services
Saturday, June 12, 2010
52. “The Unreasonable Effectiveness of Data”
Yahoo! sorts a petabyte with Hadoop
First Hadoop World NYC
2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Cloudera adds training, support, services
Saturday, June 12, 2010
54. 2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Saturday, June 12, 2010
55. Yahoo! completes enterprise-class security
2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Saturday, June 12, 2010
56. Yahoo! completes enterprise-class security
2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Datameer and Karmasphere funded
Saturday, June 12, 2010
57. Teradata, Pentaho, and others integrate
Yahoo! completes enterprise-class security
2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Datameer and Karmasphere funded
Saturday, June 12, 2010
58. Hive adds JDBC and ODBC
Teradata, Pentaho, and others integrate
Yahoo! completes enterprise-class security
2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Datameer and Karmasphere funded
Saturday, June 12, 2010
59. Hadoop will be an Analytical Data Platform
Saturday, June 12, 2010