SlideShare una empresa de Scribd logo
1 de 40
© Cloudera, Inc. All rights reserved.
FEDERATED LEARNING
Chris J Wallace • Data Scientist • Cloudera Fast Forward Labs
@_cjwallace
Available to
Fast Forward Labs
clients
Play at turbofan.fastforwardlabs.com
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
TODAY
● WHY CARE?
● FEDERATED AVERAGING
● PROTOTYPE
● CHALLENGES
● TOOLS
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
WHY CARE ABOUT FEDERATED LEARNING?
7
© Cloudera, Inc. All rights reserved. 8
© Cloudera, Inc. All rights reserved. 9
© Cloudera, Inc. All rights reserved.
1
0
© Cloudera, Inc. All rights reserved.
1
1
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
REQUIREMENTS FOR FEDERATED LEARNING
● Performance improves with more data.
● Models can be meaningfully combined.
● Nodes can train models, not only predict.
amount of data
performance
© Cloudera, Inc. All rights reserved.
1
3
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
FEDERATED AVERAGING
Communication-Efficient Learning of Deep Networks from Decentralized Data
McMahan et al. 2016
1
4
© Cloudera, Inc. All rights reserved.
A network of nodes shares models rather than training data with the server
© Cloudera, Inc. All rights reserved. 16
The server has an untrained model
© Cloudera, Inc. All rights reserved. 17
It sends a copy of that model to the nodes
© Cloudera, Inc. All rights reserved. 18
The nodes now also have the untrained model
© Cloudera, Inc. All rights reserved. 19
The nodes have data on which to train their model
© Cloudera, Inc. All rights reserved. 20
Each node trains the model to fit the data they have
© Cloudera, Inc. All rights reserved. 21
Each node sends a copy of its trained model back to the
server
© Cloudera, Inc. All rights reserved. 22
The server combines these models by taking an average
We repeat the whole process many times.
© Cloudera, Inc. All rights reserved. 23
The server now has a model that captures
the patterns in the training data on all the nodes
But at no point did the nodes share their training data
which increases privacy and saves on bandwidth.
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
FEDERATED AVERAGING CAN HANDLE...
● Non-IID data
○ Training data on each node can be idiosyncratic.
● Unbalanced data
○ Unequal amount of data on each node.
● Massively distributed data
○ Can have many more devices than training examples per node.
● Limited communication
○ Cannot guarantee availability of nodes. Communication-Efficient Learning of Deep Networks from Decentralized Data
McMahan et al. 2016
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
TURBOFAN TYCOON
25
turbofan.fastforwardlabs.com
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
Corrective Preventative Predictive
PREDICTIVE MAINTENANCE
© Cloudera, Inc. All rights reserved.
turbofan.fastforwardlabs.com
CMAPPS data set
https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
CHALLENGES
28
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 29
Power consumption Dropped connections Stragglers
SYSTEMS ISSUES
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 30
PRIVACY
Adversary can’t inspect data, but can inspect model.
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 31
Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning, Hitaj et al. (2017)
PRIVACY
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
TOOLS
32
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
OpenMined
https://www.openmined.org/
“OpenMined is an open-source community focused on
researching, developing, and promoting tools for secure,
privacy-preserving, value-aligned artificial intelligence.”
● More than federated learning.
● PySyft is a library for privacy preserving deep learning.
● Grid is a peer-to-peer platform for decentralized data
science.
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
TensorFlow Federated
https://www.tensorflow.org/federated
● Federated Learning API
○ Wrap TensorFlow models in included FL
implementations.
○ High level, with attention paid to separating the
concerns of models, communication, and so on.
● Federated Core API
○ Low level interfaces for building novel FL algorithms.
Local simulation runtime only right now.
New, but promising.
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
SUMMARY
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
Federated Learning is machine learning on
decentralized data
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
● Privacy is needed (FL not the whole solution)
● Bandwidth or power consumption are concerns
● High cost of data transfer
● Your model improves with more data
YOU MIGHT HAVE A USE CASE IF …
37
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
● Predictive maintenance/industrial IOT
● Smartphones
● Healthcare (wearables, drug discovery, prognostics, etc.)
● Enterprise/corporate IT (chat, issue trackers, email, etc.)
EXAMPLES
38
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
Cloudera Fast Forward Labs
• An introduction to Federated Learning (Cloudera VISION blog, business audience)
• Federated learning: distributed machine learning with data locality and privacy (FFL blog, more technical)
• Turbofan Tycoon (working prototype, see FFL blog post for some details)
Other blog posts
• Collaborative Machine Learning without Centralized Training Data (Google research blog)
• Federated Learning for Firefox (Firefox on florian.github.io)
• Federated Learning for wake word detection (snips.ai on medium.com)
Papers
• Communication-Efficient Learning of Deep Networks from Decentralized Data by McMahan et al. (Google, 2016)
• Practical Secure Aggregation for Privacy-Preserving Machine Learning by Bonawitz et al. (Google, 2017)
• Federated Multi-Task Learning by Smith et al. (2017)
• A generic framework for privacy preserving deep learning by Ryffel et al. (2018, and see also github.com/OpenMined/PySyft)
• Federated Learning for Mobile Keyboard Prediction by Hard et al. (Google, 2018)
39
© Cloudera, Inc. All rights reserved.
THANK YOU
cffl@cloudera.com
@_cjwallace

Más contenido relacionado

La actualidad más candente

Privacy preserving machine learning
Privacy preserving machine learningPrivacy preserving machine learning
Privacy preserving machine learningMichał Kuźba
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overviewColleen Farrelly
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learningStanley Wang
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Saurabh Kaushik
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Machine Learning
Machine LearningMachine Learning
Machine LearningShrey Malik
 
Federated learning based_trafiic_flow_prediction.ppt
Federated learning based_trafiic_flow_prediction.pptFederated learning based_trafiic_flow_prediction.ppt
Federated learning based_trafiic_flow_prediction.pptkhalidhassan105
 
Explainable AI
Explainable AIExplainable AI
Explainable AIDinesh V
 
Poisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection SystemPoisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection SystemSai Kiran Kadam
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Hayim Makabee
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Federated Learning
Federated LearningFederated Learning
Federated Learningmiloudiamara
 

La actualidad más candente (20)

Privacy preserving machine learning
Privacy preserving machine learningPrivacy preserving machine learning
Privacy preserving machine learning
 
Machine learning
Machine learning Machine learning
Machine learning
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Cnn
CnnCnn
Cnn
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Deep learning
Deep learning Deep learning
Deep learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Federated learning based_trafiic_flow_prediction.ppt
Federated learning based_trafiic_flow_prediction.pptFederated learning based_trafiic_flow_prediction.ppt
Federated learning based_trafiic_flow_prediction.ppt
 
cluster computing
cluster computingcluster computing
cluster computing
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Poisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection SystemPoisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection System
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 

Similar a Federated Learning

Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019Timothy Spann
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...Timothy Spann
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWDataWorks Summit
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitRafael Arana
 
Cloud expo 10 myths rex wang oracle ss
Cloud expo 10 myths rex wang oracle ssCloud expo 10 myths rex wang oracle ss
Cloud expo 10 myths rex wang oracle ssRex Wang
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaNeo4j
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadDataWorks Summit
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchCloudera, Inc.
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the EnterpriseThe Hive
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Cloudera, Inc.
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 

Similar a Federated Learning (20)

Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
 
Cloud expo 10 myths rex wang oracle ss
Cloud expo 10 myths rex wang oracle ssCloud expo 10 myths rex wang oracle ss
Cloud expo 10 myths rex wang oracle ss
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Federated Learning

  • 1. © Cloudera, Inc. All rights reserved. FEDERATED LEARNING Chris J Wallace • Data Scientist • Cloudera Fast Forward Labs @_cjwallace
  • 2.
  • 3. Available to Fast Forward Labs clients Play at turbofan.fastforwardlabs.com
  • 4.
  • 5.
  • 6. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. TODAY ● WHY CARE? ● FEDERATED AVERAGING ● PROTOTYPE ● CHALLENGES ● TOOLS
  • 7. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. WHY CARE ABOUT FEDERATED LEARNING? 7
  • 8. © Cloudera, Inc. All rights reserved. 8
  • 9. © Cloudera, Inc. All rights reserved. 9
  • 10. © Cloudera, Inc. All rights reserved. 1 0
  • 11. © Cloudera, Inc. All rights reserved. 1 1
  • 12. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. REQUIREMENTS FOR FEDERATED LEARNING ● Performance improves with more data. ● Models can be meaningfully combined. ● Nodes can train models, not only predict. amount of data performance
  • 13. © Cloudera, Inc. All rights reserved. 1 3
  • 14. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. FEDERATED AVERAGING Communication-Efficient Learning of Deep Networks from Decentralized Data McMahan et al. 2016 1 4
  • 15. © Cloudera, Inc. All rights reserved. A network of nodes shares models rather than training data with the server
  • 16. © Cloudera, Inc. All rights reserved. 16 The server has an untrained model
  • 17. © Cloudera, Inc. All rights reserved. 17 It sends a copy of that model to the nodes
  • 18. © Cloudera, Inc. All rights reserved. 18 The nodes now also have the untrained model
  • 19. © Cloudera, Inc. All rights reserved. 19 The nodes have data on which to train their model
  • 20. © Cloudera, Inc. All rights reserved. 20 Each node trains the model to fit the data they have
  • 21. © Cloudera, Inc. All rights reserved. 21 Each node sends a copy of its trained model back to the server
  • 22. © Cloudera, Inc. All rights reserved. 22 The server combines these models by taking an average We repeat the whole process many times.
  • 23. © Cloudera, Inc. All rights reserved. 23 The server now has a model that captures the patterns in the training data on all the nodes But at no point did the nodes share their training data which increases privacy and saves on bandwidth.
  • 24. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. FEDERATED AVERAGING CAN HANDLE... ● Non-IID data ○ Training data on each node can be idiosyncratic. ● Unbalanced data ○ Unequal amount of data on each node. ● Massively distributed data ○ Can have many more devices than training examples per node. ● Limited communication ○ Cannot guarantee availability of nodes. Communication-Efficient Learning of Deep Networks from Decentralized Data McMahan et al. 2016
  • 25. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. TURBOFAN TYCOON 25 turbofan.fastforwardlabs.com
  • 26. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. Corrective Preventative Predictive PREDICTIVE MAINTENANCE
  • 27. © Cloudera, Inc. All rights reserved. turbofan.fastforwardlabs.com CMAPPS data set https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/
  • 28. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. CHALLENGES 28
  • 29. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 29 Power consumption Dropped connections Stragglers SYSTEMS ISSUES
  • 30. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 30 PRIVACY Adversary can’t inspect data, but can inspect model.
  • 31. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 31 Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning, Hitaj et al. (2017) PRIVACY
  • 32. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. TOOLS 32
  • 33. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. OpenMined https://www.openmined.org/ “OpenMined is an open-source community focused on researching, developing, and promoting tools for secure, privacy-preserving, value-aligned artificial intelligence.” ● More than federated learning. ● PySyft is a library for privacy preserving deep learning. ● Grid is a peer-to-peer platform for decentralized data science.
  • 34. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. TensorFlow Federated https://www.tensorflow.org/federated ● Federated Learning API ○ Wrap TensorFlow models in included FL implementations. ○ High level, with attention paid to separating the concerns of models, communication, and so on. ● Federated Core API ○ Low level interfaces for building novel FL algorithms. Local simulation runtime only right now. New, but promising.
  • 35. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. SUMMARY
  • 36. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. Federated Learning is machine learning on decentralized data
  • 37. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. ● Privacy is needed (FL not the whole solution) ● Bandwidth or power consumption are concerns ● High cost of data transfer ● Your model improves with more data YOU MIGHT HAVE A USE CASE IF … 37
  • 38. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. ● Predictive maintenance/industrial IOT ● Smartphones ● Healthcare (wearables, drug discovery, prognostics, etc.) ● Enterprise/corporate IT (chat, issue trackers, email, etc.) EXAMPLES 38
  • 39. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. Cloudera Fast Forward Labs • An introduction to Federated Learning (Cloudera VISION blog, business audience) • Federated learning: distributed machine learning with data locality and privacy (FFL blog, more technical) • Turbofan Tycoon (working prototype, see FFL blog post for some details) Other blog posts • Collaborative Machine Learning without Centralized Training Data (Google research blog) • Federated Learning for Firefox (Firefox on florian.github.io) • Federated Learning for wake word detection (snips.ai on medium.com) Papers • Communication-Efficient Learning of Deep Networks from Decentralized Data by McMahan et al. (Google, 2016) • Practical Secure Aggregation for Privacy-Preserving Machine Learning by Bonawitz et al. (Google, 2017) • Federated Multi-Task Learning by Smith et al. (2017) • A generic framework for privacy preserving deep learning by Ryffel et al. (2018, and see also github.com/OpenMined/PySyft) • Federated Learning for Mobile Keyboard Prediction by Hard et al. (Google, 2018) 39
  • 40. © Cloudera, Inc. All rights reserved. THANK YOU cffl@cloudera.com @_cjwallace