This document discusses Databricks' goal of democratizing access to Spark. It introduces the Databricks cloud platform, which provides a hosted model for Spark with rapid releases, dynamic scaling, and security controls. The platform is used for just-in-time data warehousing, advanced analytics, and real-time use cases. Many companies struggle with the steep learning curve and costs of big data projects. To empower more developers, Databricks trained thousands on Spark and launched online courses with over 100,000 students. They are announcing the Databricks Community Edition, a free version of their platform, to further democratize access to Spark through mini clusters, notebooks, APIs, and continuous delivery of learning content.
3. 3
Databricks Cloud Platform
Hosted Model
We ensureeverything works end-to-end
Rapid releases
Iterate quickly based on customer feedback
Dynamicuse
Customers scaledynamically based on needs
4. 4
Databricks Cloud Platform
FS
CLOUD HADOOP DATA WAREHOUSE
Your Storage
DBMS
Databricks Platform
OPEN SOURCE
MANAGEMENT
Security Controls
BI connectivity
24x7 SLAs
Multi-tenancy
Production Jobs
Managed Clusters
SQL
Machine Learning
R
Graph
Streaming
Integrated Workspace NOTEBOOKS COLLABORATION REST APIsDASHBOARDS
5. 5
How is this used so far?
Just-in-time DataWarehouse Use Case
• Separate compute from storage
• 3 top 10 mass media company shortenedtime fromidea-to-app
Advanced Analytics Use Case
• Machine learningand graph processing
• Top 2 gamingcompanies,Radius modeling of 20M companies
Real-time Use Case
• Data productusingspark streaming
• Top 5 creditcard company is doing loan approvals in real-time
6. 6
Main Lesson
Many companies struggle with big dataprojects
• Steep learning curve formany developers
Getting trained on big datais costly and time consuming
• Acquiringmachines
• Setting up and configuringinfrastructure
• Build systemswithoutaccess to much documentation
7. 7
How do we empower more developers?
Trained 2,000 on Sparkin 2014
Launched two Massive Open Online Courses (MOOCs) in 2015
• ~125,000 tookourcourses
• ~20,000 finishedthe course
• ~500,000 hoursspentlearningSpark
How do we multiply this to democratize access to Spark?
8. 8
Announcing
Databricks Community Edition (beta)
Free edition of Databricks Platform
• Mini Spark clusters
• Notebooks,Dashboards
• REST APIs
Continuous delivery of content
• Course and MOOC material
• Spark how-to’sand documentation
10. 10
Democratizing Big Data
for Organizations
Will provide seamless transition to production
• Large clusters
• Productionpipelines
• Security and Governance