SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Anahita Bhiwandiwalla – Intel Corporation
Karthik Vadla – Intel Corporation
A Predictive Analytics
Workflow on DICOM
Images using Apache
Spark
Who we are?
Agenda
• Use-case overview
• Challenges
• What is our data?
• Deriving Insights
• Machine Learning
• Tools
• Spark-tk
• Demo
• spark-tensorflow-connector
• Performance
Use-case overview
• Vast amount of medical image data available
• Meta-data accompanying the image
– Patient information
– Status of condition
– Method of image capture
– Time to events(if part of a study)
Create distributed technology needed to efficiently
store, scan and analyze data in healthcare.
Challenges
• Getting data for experimentation is hard!
• Combine data from multiple public sources for simulation
• Multiple tools – image processing libraries, visualization
tools, distributed engines, ML libraries
• Compatibility across different tools
How do we make Machine Learning easy to use?
Data: DICOM
- Digital Imaging and
COMmunications in Medicine
- International standard format to
store, exchange, and transmit
medical images
- Standards for imaging modalities -
radiography, ultrasonography,
computed tomography (CT),
magnetic resonance imaging (MRI),
and radiation therapy.
- Protocols for image exchange, image
compression, 3-D visualization,
image presentation etc.
Data: Meta-data
• Image related: storage, date, relevant parts of
the image etc.
• Patient related: age, weight, height, DOB,
medical history
• Device related: manufacturer information,
operator, method of capture
Deriving insights
• Can we develop smart filtering techniques based on
image meta-data?
• Can we conclude and predict certain conditions based
on patient meta-data?
• Can we classify image capture techniques and suggest
improvements based on device meta-data?
• Can we combine image & patient meta-data to conclude
and predict?
• Can analytics help automate/improve image capture
techniques?
Machine Learning
• Patient data
– Train classifier model on patient data to predict presence
of a condition
• Logistic Regression, Random Forest, Naïve Bayes
– Train regressor model on patient data to predict probability
of a condition
• Linear Regression, Random Forest
– Train survival analysis model on patient data to predict
time to event, compare risk of populations
• Accelerate Failure Time Model, Cox Proportional Hazards Model
Machine Learning
• Image data
– Compute the Eigen decomposition of the image
– Compute the Covariance Matrix of the image
– Compute the Principal Component Analysis of the
image pixel data
Machine Learning
• Device data:
– Predict trends in image capture techniques
• Classification/Regression techniques
– Predict device time to failure
• Cox Proportional Hazards Model, Accelerated Failure Time
– Recommendations to device capture methods
• Collaborative filtering
Tools
• pydicom
– Python package for working with DICOM files such as medical images, reports, and radiotherapy
– Single threaded
• dcm4che
– Collection of open source applications and utilities for the healthcare enterprise.
– Developed in the Java programming language for performance and portability, supporting
deployment on JDK 1.6 and up
• matplotlib
– Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats
and interactive environments across platforms.
– Can be used in Python scripts, Python and IPython shell, jupyter notebook, web application servers
• Machine Learning libraries
– Classifiers, regressors, failure analysis analysis
– Eigen decomposition
Spark-tk Introduction
• Open source library enhancing the Spark experience
• APIs for feature engineering, ETL, graph construction,
machine learning, image processing, scoring
• Abstraction level familiar to data scientists; removes
complexity of parallel processing
• Lower level Spark APIs seamlessly exposed
• Easy to build apps plugging in other tools/libraries
https://github.com/trustedanalytics/spark-tk
http://trustedanalytics.github.io/spark-tk/
Spark-tk components
• Frames
– Scalable data frame representation
– More intuitive than low level HDFS file and Spark
RDD/DataFrame/DataSet formats; schema inference
– APIs to manipulate the data frames for feature engineering and
exploration, such as joins and aggregations
– Run user-defined transformations and filters using distributed
processing
– Input to our models
– Easy to convert to/from Spark-tk Frames - Spark representations
Spark-tk components
• Graphs
– Scalable graph representation based on Frames for
vertices and edges
– In house distributed algorithms for graph analytics
using GraphX
– Supports importing/exporting to OrientDB’s scalable
graph databases – visualize, real-time graph querying
– Store massive graphs
Spark-tk components
• Machine Learning & Streaming
– Time series analysis
– Recommender systems
– Topic Modeling
– Clustering
– Classification/Regression
– Image processing
– Scikit Models
– Streaming via Scoring engine
How Spark-tk helps
• Brings in support for image analytics to Spark
– Support for ingesting and processing DICOM images in a
distributed environment
– Queries, filters, and analytics on image collections
– Integrated & distributed dcm4che3 via. Spark
– Tested against pydicom
– Used matplotlib for visualization
• Machine learning for image and health records
– Created distributed Cox Survival Proportional Hazards Model to
compare sets of populations
– Recommend algorithm and hyper-parameters
Live Demo
spark-tensorflow-connector
• Developed library for loading and storing
TensorFlow records with Apache Spark
• Implements data import from the standard
TensorFlow record format (TFRecords) into
Spark SQL DataFrames, and data export from
DataFrames to TensorFlow records
https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-
connector
Performance
0
5
10
15
20
25
30
0.25 0.5 1 2 4 8 16 32
Data(GB) Vs. Time(Min)
Processing time increases with data(Constant #Partitions=1000)
Performance
24.5
25
25.5
26
26.5
27
27.5
28
28.5
29
29.5
300 450 675 1012 2278 3147 5125 7688
#Partitions Vs. Time(Min)
Increasing partitions reduced the time and then increased time(overhead)
Performance
0
10
20
30
40
50
60
70
80
90
100
7 11 24 35 56
#Executors Vs. Time(Min)
Increasing number of executors reduces the time significantly
Performance
0
5
10
15
20
25
30
35
1 2 3 4 5
#Executor Cores Vs. Time(Min)
Increasing executor cores did not significantly change time
Thank You.
https://github.com/trustedanalytics/spark-tk
http://trustedanalytics.github.io/spark-tk/

Más contenido relacionado

La actualidad más candente

JSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebJSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebGregg Kellogg
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpJoseph Arriola
 
Model Monitoring at Scale with Apache Spark and Verta
Model Monitoring at Scale with Apache Spark and VertaModel Monitoring at Scale with Apache Spark and Verta
Model Monitoring at Scale with Apache Spark and VertaDatabricks
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks
 
Beginner's Guide to APEX
Beginner's Guide to APEXBeginner's Guide to APEX
Beginner's Guide to APEXAnthony Rayner
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkSpark Summit
 
Microsoft power platform
Microsoft power platformMicrosoft power platform
Microsoft power platformJenkins NS
 
Angular 2 Essential Training
Angular 2 Essential Training Angular 2 Essential Training
Angular 2 Essential Training Patrick Schroeder
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
Microsoft Power BI Technical Overview
Microsoft Power BI Technical OverviewMicrosoft Power BI Technical Overview
Microsoft Power BI Technical OverviewDavid J Rosenthal
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseMo Patel
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopParashar Shah
 
Building Data Lakehouse.pdf
Building Data Lakehouse.pdfBuilding Data Lakehouse.pdf
Building Data Lakehouse.pdfLuis Jimenez
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIsHydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIsMarkus Lanthaler
 

La actualidad más candente (20)

JSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebJSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social Web
 
Informatica Cloud Overview
Informatica Cloud OverviewInformatica Cloud Overview
Informatica Cloud Overview
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Intro to power apps
Intro to power appsIntro to power apps
Intro to power apps
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
Model Monitoring at Scale with Apache Spark and Verta
Model Monitoring at Scale with Apache Spark and VertaModel Monitoring at Scale with Apache Spark and Verta
Model Monitoring at Scale with Apache Spark and Verta
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Beginner's Guide to APEX
Beginner's Guide to APEXBeginner's Guide to APEX
Beginner's Guide to APEX
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
 
Microsoft power platform
Microsoft power platformMicrosoft power platform
Microsoft power platform
 
Angular 2 Essential Training
Angular 2 Essential Training Angular 2 Essential Training
Angular 2 Essential Training
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Microsoft Power BI Technical Overview
Microsoft Power BI Technical OverviewMicrosoft Power BI Technical Overview
Microsoft Power BI Technical Overview
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshop
 
Building Data Lakehouse.pdf
Building Data Lakehouse.pdfBuilding Data Lakehouse.pdf
Building Data Lakehouse.pdf
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIsHydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
 
Fastapi
FastapiFastapi
Fastapi
 

Similar a A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahita Bhiwandiwalla and Karthik Vadla

AI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxAI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxDivyaGaurav4
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution WSO2
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...Alex Zeltov
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningNEEVEE Technologies
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...Databricks
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningAli Alkan
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Maurice Nsabimana
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpcDr Reeja S R
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 

Similar a A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahita Bhiwandiwalla and Karthik Vadla (20)

AI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxAI IN PATH final PPT.pptx
AI IN PATH final PPT.pptx
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahita Bhiwandiwalla and Karthik Vadla

  • 1. Anahita Bhiwandiwalla – Intel Corporation Karthik Vadla – Intel Corporation A Predictive Analytics Workflow on DICOM Images using Apache Spark
  • 3. Agenda • Use-case overview • Challenges • What is our data? • Deriving Insights • Machine Learning • Tools • Spark-tk • Demo • spark-tensorflow-connector • Performance
  • 4. Use-case overview • Vast amount of medical image data available • Meta-data accompanying the image – Patient information – Status of condition – Method of image capture – Time to events(if part of a study) Create distributed technology needed to efficiently store, scan and analyze data in healthcare.
  • 5. Challenges • Getting data for experimentation is hard! • Combine data from multiple public sources for simulation • Multiple tools – image processing libraries, visualization tools, distributed engines, ML libraries • Compatibility across different tools How do we make Machine Learning easy to use?
  • 6. Data: DICOM - Digital Imaging and COMmunications in Medicine - International standard format to store, exchange, and transmit medical images - Standards for imaging modalities - radiography, ultrasonography, computed tomography (CT), magnetic resonance imaging (MRI), and radiation therapy. - Protocols for image exchange, image compression, 3-D visualization, image presentation etc.
  • 7. Data: Meta-data • Image related: storage, date, relevant parts of the image etc. • Patient related: age, weight, height, DOB, medical history • Device related: manufacturer information, operator, method of capture
  • 8. Deriving insights • Can we develop smart filtering techniques based on image meta-data? • Can we conclude and predict certain conditions based on patient meta-data? • Can we classify image capture techniques and suggest improvements based on device meta-data? • Can we combine image & patient meta-data to conclude and predict? • Can analytics help automate/improve image capture techniques?
  • 9. Machine Learning • Patient data – Train classifier model on patient data to predict presence of a condition • Logistic Regression, Random Forest, Naïve Bayes – Train regressor model on patient data to predict probability of a condition • Linear Regression, Random Forest – Train survival analysis model on patient data to predict time to event, compare risk of populations • Accelerate Failure Time Model, Cox Proportional Hazards Model
  • 10. Machine Learning • Image data – Compute the Eigen decomposition of the image – Compute the Covariance Matrix of the image – Compute the Principal Component Analysis of the image pixel data
  • 11. Machine Learning • Device data: – Predict trends in image capture techniques • Classification/Regression techniques – Predict device time to failure • Cox Proportional Hazards Model, Accelerated Failure Time – Recommendations to device capture methods • Collaborative filtering
  • 12. Tools • pydicom – Python package for working with DICOM files such as medical images, reports, and radiotherapy – Single threaded • dcm4che – Collection of open source applications and utilities for the healthcare enterprise. – Developed in the Java programming language for performance and portability, supporting deployment on JDK 1.6 and up • matplotlib – Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. – Can be used in Python scripts, Python and IPython shell, jupyter notebook, web application servers • Machine Learning libraries – Classifiers, regressors, failure analysis analysis – Eigen decomposition
  • 13. Spark-tk Introduction • Open source library enhancing the Spark experience • APIs for feature engineering, ETL, graph construction, machine learning, image processing, scoring • Abstraction level familiar to data scientists; removes complexity of parallel processing • Lower level Spark APIs seamlessly exposed • Easy to build apps plugging in other tools/libraries https://github.com/trustedanalytics/spark-tk http://trustedanalytics.github.io/spark-tk/
  • 14. Spark-tk components • Frames – Scalable data frame representation – More intuitive than low level HDFS file and Spark RDD/DataFrame/DataSet formats; schema inference – APIs to manipulate the data frames for feature engineering and exploration, such as joins and aggregations – Run user-defined transformations and filters using distributed processing – Input to our models – Easy to convert to/from Spark-tk Frames - Spark representations
  • 15. Spark-tk components • Graphs – Scalable graph representation based on Frames for vertices and edges – In house distributed algorithms for graph analytics using GraphX – Supports importing/exporting to OrientDB’s scalable graph databases – visualize, real-time graph querying – Store massive graphs
  • 16. Spark-tk components • Machine Learning & Streaming – Time series analysis – Recommender systems – Topic Modeling – Clustering – Classification/Regression – Image processing – Scikit Models – Streaming via Scoring engine
  • 17. How Spark-tk helps • Brings in support for image analytics to Spark – Support for ingesting and processing DICOM images in a distributed environment – Queries, filters, and analytics on image collections – Integrated & distributed dcm4che3 via. Spark – Tested against pydicom – Used matplotlib for visualization • Machine learning for image and health records – Created distributed Cox Survival Proportional Hazards Model to compare sets of populations – Recommend algorithm and hyper-parameters
  • 19. spark-tensorflow-connector • Developed library for loading and storing TensorFlow records with Apache Spark • Implements data import from the standard TensorFlow record format (TFRecords) into Spark SQL DataFrames, and data export from DataFrames to TensorFlow records https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow- connector
  • 20. Performance 0 5 10 15 20 25 30 0.25 0.5 1 2 4 8 16 32 Data(GB) Vs. Time(Min) Processing time increases with data(Constant #Partitions=1000)
  • 21. Performance 24.5 25 25.5 26 26.5 27 27.5 28 28.5 29 29.5 300 450 675 1012 2278 3147 5125 7688 #Partitions Vs. Time(Min) Increasing partitions reduced the time and then increased time(overhead)
  • 22. Performance 0 10 20 30 40 50 60 70 80 90 100 7 11 24 35 56 #Executors Vs. Time(Min) Increasing number of executors reduces the time significantly
  • 23. Performance 0 5 10 15 20 25 30 35 1 2 3 4 5 #Executor Cores Vs. Time(Min) Increasing executor cores did not significantly change time