SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Kim Curtis, Brian Knauss, eBay
Evolution of eBay’s Enterprise Data
Ecosystem with Apache Spark
#EntSAIS13
Evolution
2#EntSAIS13
Principles
Opportunities
Investment
Principles
• Expand Capabilities
• Increase Flexibility
• Optimize Cost/Performance
3#EntSAIS13
• Expand Capabilities
• Increase Flexibility
• Optimize Cost/Performance
Principles
4#EntSAIS13
DO MORE
WITH FEWER BARRIERS
CHEAPER/FASTER
Spark in the EDW
• Increase Flexibility
– Investment and Engineering
• Expand Capabilities
– Pre-Load Transformation
• Optimize Cost/Performance
– Let’s see…
5#EntSAIS13
TCO Comparison
6#EntSAIS13
Vendor Open Source
HW Depr. HW Depr.
HW/SW Maint.HW/SW Maint.
DC Costs
DC Costs
Opportunity Assessment
7#EntSAIS13
2014 2015 2016 2017 2018 2019
Vendor TCO Open Source
TCO
Scope Design Implement Optimize
Investment
8#EntSAIS13
WHAT HOW WHEN
THEN
WHAT?
DO IT
9#EntSAIS13
Scope
• Engage with Customers
– Isolated our impact
• Define Boundaries
– Relational Batch Processing
• Set Intermediate Targets
– Offset 2017 growth
10#EntSAIS13
Design
• Extensible Framework
– (ELnTn to ETLn)
• Optimize Hardware
– Processing and Storage Nodes
• Optimize Software
– Automated Spark SQL Tuning
11#EntSAIS13
Load/MergeLoad/MergeTransform/MergeTransform/MergeLoadLoadExtract Load Transform/Merge Extract Load/Merge
Before (ELnTn )
Load/MergeLoad/Merge
Extract Transform Load/Merge
After (ETLn)
Implement
• Production HDFS Data Environment
– Tight Alignment with Platform
• Prioritized Effort
– Minimize effort to hit goals
• Distributed Engineering (internal Open Source)
– Built and tested framework additions
14#EntSAIS13
Optimize
• Scaling/DR(DA)
– Multi-platform architecture
• Cost/Performance Optimization
– HW/SW tuning
• Feature Expansion
– ML on platform
15#EntSAIS13
Challenges
• Scale of Data and Workload
– Data Validation Between Targets
• Migration Automation
– Migration of Non-Standard Processes
• Enterprise Readiness of Open Source
– Job-Level Workload Tracking/Management
16#EntSAIS13
The End Beginning…
17#EntSAIS13
Opportunity
Investment
Optimization

Más contenido relacionado

La actualidad más candente

More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn confluent
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastDatabricks
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningDavid Stein
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesDatabricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
Spark streaming
Spark streamingSpark streaming
Spark streamingWhiteklay
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream ProcessingSafe Software
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 

La actualidad más candente (20)

More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 
Spark at Zillow
Spark at ZillowSpark at Zillow
Spark at Zillow
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream Processing
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 

Similar a Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platform at eBay with Kim Curtis and Brian Knauss

Leveraging projectsfeaturesandfunctionalitytobillcustomersmodeledinmultiplewa...
Leveraging projectsfeaturesandfunctionalitytobillcustomersmodeledinmultiplewa...Leveraging projectsfeaturesandfunctionalitytobillcustomersmodeledinmultiplewa...
Leveraging projectsfeaturesandfunctionalitytobillcustomersmodeledinmultiplewa...Project Control | PROJ CTRL
 
Benchmark in 3 Steps - BenchmarkMyBuilding.com
Benchmark in 3 Steps - BenchmarkMyBuilding.comBenchmark in 3 Steps - BenchmarkMyBuilding.com
Benchmark in 3 Steps - BenchmarkMyBuilding.comLucid
 
Ac2017 2. added value!
Ac2017   2. added value!Ac2017   2. added value!
Ac2017 2. added value!Nesma
 
Make Design A First Class Citizen To Ensure Analytics Success
Make Design A First Class Citizen To Ensure Analytics SuccessMake Design A First Class Citizen To Ensure Analytics Success
Make Design A First Class Citizen To Ensure Analytics SuccessSiteworx LLC
 
Oracle Essbase in the Cloud A Mercer Advisors Success Story
Oracle Essbase in the Cloud A Mercer Advisors Success StoryOracle Essbase in the Cloud A Mercer Advisors Success Story
Oracle Essbase in the Cloud A Mercer Advisors Success StoryPerficient, Inc.
 
EPBCS - A New Approach to Planning Implementations
EPBCS - A New Approach to Planning ImplementationsEPBCS - A New Approach to Planning Implementations
EPBCS - A New Approach to Planning ImplementationsJoseph Alaimo Jr
 
The Case Studies for Project Control Platform - Maurizio M. Granata Business ...
The Case Studies for Project Control Platform - Maurizio M. Granata Business ...The Case Studies for Project Control Platform - Maurizio M. Granata Business ...
The Case Studies for Project Control Platform - Maurizio M. Granata Business ...Mediehuset Ingeniøren Live
 
Enablement Session - IBP Transformation Final_C.pptx
Enablement Session - IBP Transformation Final_C.pptxEnablement Session - IBP Transformation Final_C.pptx
Enablement Session - IBP Transformation Final_C.pptxKaustubh M
 
Webinar: The OpEx Business Plan for NoSQL
 Webinar: The OpEx Business Plan for NoSQL Webinar: The OpEx Business Plan for NoSQL
Webinar: The OpEx Business Plan for NoSQLMongoDB
 
AWS re:Invent 2016: How Pitney Bowes is transforming their business in the cl...
AWS re:Invent 2016: How Pitney Bowes is transforming their business in the cl...AWS re:Invent 2016: How Pitney Bowes is transforming their business in the cl...
AWS re:Invent 2016: How Pitney Bowes is transforming their business in the cl...Amazon Web Services
 
IT + Line of Business - Driving Faster, Deeper Insights Together
IT + Line of Business - Driving Faster, Deeper Insights TogetherIT + Line of Business - Driving Faster, Deeper Insights Together
IT + Line of Business - Driving Faster, Deeper Insights TogetherDATAVERSITY
 
Architecting an Open Data Lake for the Enterprise
 Architecting an Open Data Lake for the Enterprise  Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise Amazon Web Services
 
Automating Big Data Technologies for Faster Time-to-Value
 Automating Big Data Technologies for Faster Time-to-Value Automating Big Data Technologies for Faster Time-to-Value
Automating Big Data Technologies for Faster Time-to-ValueAmazon Web Services
 
AWS Summit 2013 | India - Running Lean with Optimized Architecture, Pieter Kemps
AWS Summit 2013 | India - Running Lean with Optimized Architecture, Pieter KempsAWS Summit 2013 | India - Running Lean with Optimized Architecture, Pieter Kemps
AWS Summit 2013 | India - Running Lean with Optimized Architecture, Pieter KempsAmazon Web Services
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseAmazon Web Services
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics OverviewKevin Dingle
 
NLB Services Data Analytics Overview
NLB Services Data Analytics OverviewNLB Services Data Analytics Overview
NLB Services Data Analytics OverviewKevin Dingle
 
Quantifying Business Value with ITIL® and ITSM in the Modern Software Factory
Quantifying Business Value with ITIL® and ITSM in the Modern Software FactoryQuantifying Business Value with ITIL® and ITSM in the Modern Software Factory
Quantifying Business Value with ITIL® and ITSM in the Modern Software FactoryCA Technologies
 

Similar a Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platform at eBay with Kim Curtis and Brian Knauss (20)

Leveraging projectsfeaturesandfunctionalitytobillcustomersmodeledinmultiplewa...
Leveraging projectsfeaturesandfunctionalitytobillcustomersmodeledinmultiplewa...Leveraging projectsfeaturesandfunctionalitytobillcustomersmodeledinmultiplewa...
Leveraging projectsfeaturesandfunctionalitytobillcustomersmodeledinmultiplewa...
 
Achieving Profitability on AWS
Achieving Profitability on AWSAchieving Profitability on AWS
Achieving Profitability on AWS
 
CESMII KickOff - Launch of CESMII - Ray Collett - CESMII
CESMII KickOff - Launch of CESMII - Ray Collett - CESMIICESMII KickOff - Launch of CESMII - Ray Collett - CESMII
CESMII KickOff - Launch of CESMII - Ray Collett - CESMII
 
Benchmark in 3 Steps - BenchmarkMyBuilding.com
Benchmark in 3 Steps - BenchmarkMyBuilding.comBenchmark in 3 Steps - BenchmarkMyBuilding.com
Benchmark in 3 Steps - BenchmarkMyBuilding.com
 
Ac2017 2. added value!
Ac2017   2. added value!Ac2017   2. added value!
Ac2017 2. added value!
 
Make Design A First Class Citizen To Ensure Analytics Success
Make Design A First Class Citizen To Ensure Analytics SuccessMake Design A First Class Citizen To Ensure Analytics Success
Make Design A First Class Citizen To Ensure Analytics Success
 
Oracle Essbase in the Cloud A Mercer Advisors Success Story
Oracle Essbase in the Cloud A Mercer Advisors Success StoryOracle Essbase in the Cloud A Mercer Advisors Success Story
Oracle Essbase in the Cloud A Mercer Advisors Success Story
 
EPBCS - A New Approach to Planning Implementations
EPBCS - A New Approach to Planning ImplementationsEPBCS - A New Approach to Planning Implementations
EPBCS - A New Approach to Planning Implementations
 
The Case Studies for Project Control Platform - Maurizio M. Granata Business ...
The Case Studies for Project Control Platform - Maurizio M. Granata Business ...The Case Studies for Project Control Platform - Maurizio M. Granata Business ...
The Case Studies for Project Control Platform - Maurizio M. Granata Business ...
 
Enablement Session - IBP Transformation Final_C.pptx
Enablement Session - IBP Transformation Final_C.pptxEnablement Session - IBP Transformation Final_C.pptx
Enablement Session - IBP Transformation Final_C.pptx
 
Webinar: The OpEx Business Plan for NoSQL
 Webinar: The OpEx Business Plan for NoSQL Webinar: The OpEx Business Plan for NoSQL
Webinar: The OpEx Business Plan for NoSQL
 
AWS re:Invent 2016: How Pitney Bowes is transforming their business in the cl...
AWS re:Invent 2016: How Pitney Bowes is transforming their business in the cl...AWS re:Invent 2016: How Pitney Bowes is transforming their business in the cl...
AWS re:Invent 2016: How Pitney Bowes is transforming their business in the cl...
 
IT + Line of Business - Driving Faster, Deeper Insights Together
IT + Line of Business - Driving Faster, Deeper Insights TogetherIT + Line of Business - Driving Faster, Deeper Insights Together
IT + Line of Business - Driving Faster, Deeper Insights Together
 
Architecting an Open Data Lake for the Enterprise
 Architecting an Open Data Lake for the Enterprise  Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Automating Big Data Technologies for Faster Time-to-Value
 Automating Big Data Technologies for Faster Time-to-Value Automating Big Data Technologies for Faster Time-to-Value
Automating Big Data Technologies for Faster Time-to-Value
 
AWS Summit 2013 | India - Running Lean with Optimized Architecture, Pieter Kemps
AWS Summit 2013 | India - Running Lean with Optimized Architecture, Pieter KempsAWS Summit 2013 | India - Running Lean with Optimized Architecture, Pieter Kemps
AWS Summit 2013 | India - Running Lean with Optimized Architecture, Pieter Kemps
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics Overview
 
NLB Services Data Analytics Overview
NLB Services Data Analytics OverviewNLB Services Data Analytics Overview
NLB Services Data Analytics Overview
 
Quantifying Business Value with ITIL® and ITSM in the Modern Software Factory
Quantifying Business Value with ITIL® and ITSM in the Modern Software FactoryQuantifying Business Value with ITIL® and ITSM in the Modern Software Factory
Quantifying Business Value with ITIL® and ITSM in the Modern Software Factory
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Último

wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 

Último (20)

wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 

Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platform at eBay with Kim Curtis and Brian Knauss