SlideShare una empresa de Scribd logo
1 de 35
ML on Big Data: Real Time
Analysis on Time Series
Machine Learning
on Big Data
YASWANTH YADLAPALLI
Topics
 Business use case
 Training phase of the algorithm
 Tech stack
 Real time implementation
 Demonstration on a force sensor
Data Model
 We are currently working on these data models :
 Unstructured data
 Structured Data
 Time series Data
 For this talk we are going to concentrate on Time series data
Problem Statement
 To build a reactive application which trains on limited amount of data.
Business use case
 Main use case is in preventive maintenance systems.
 Calendar based maintenance schedules and holding excessive inventory to reduce
downtime all lead to inefficiencies and increase costs.
 Recent failures in machinery of oil rigs, car manufacturing plants have cost their
respective industries millions of dollars in down time and repairs.
 Condition Based Monitoring systems are implemented with the goal of
eliminating unplanned downtime and reducing operations cost by maintaining the
proper equipment at the proper time.
 As they say a stitch in time saves nine.
Sample Data
Our solution
Our solution
Our solution
Time series analytics
 Any analytics algorithm should be a mathematical model that should:
 Data Compression: Compact representation of data
 Signal Processing: extracting signal(sequences) even in presence of noise
 Prediction: using model predict the future values of time series
Terminology
 Patterns
 Block of graph where values are within a
range
 Patterns are grown from pairs of sequential
points till the block conform given
thresholds
 Clusters
 Similar type of patterns
Terminology
 Sequences
 A recurring series of patterns belonging to
a set of clusters.
 Concepts
 Sequences which are tagged as relevant to
the user.
 Knowledge Base
 Inference drawn from concepts.
 This is the compressed representation of
the time series
Phases
 Training phase
 Objective is to build a Knowledge base.
 Bulk historical data is given as input.
 Parameters of the algorithm are fine tuned to match the use case.
 Concepts are identified and assigned an action.
 Validation Phase
 Bulk
 Bulk data is given.
 Patterns are found and classified according to knowledge base.
 Used to identify and tag scenarios over a known timeline.
Phases
 Decision phase
 Real Time
 For example a Kafka source is provided.
 Received data is processed in batches.
 Patterns spanning multiples batches are stitched.
 If a sequence is identified as a concept, the specified action is triggered.
Architecture
Source
•Google Drive
•Kafka
•Local file
Ingestion
•Filters
•Transformation
•Manipulation
•Materialization
Computation
•Patterns
•Clusters
•Sequences
•Concepts
Actions
•SMS
•Email
Tech Architecture
Ui Server FrontendBackend Server
Benchmarks
0
2
4
6
8
10
12
0 10 20 30 40 50 60
Duration(Hours)
Size (GB)
Duration vs Size
Driver memory (GB) Total executor memory (GB) Slave cores (procs)
Real Time Analysis
on Time Series
ROHITH YERAVOTHULA
Training phase output
 Knowledge Base properties:
 Data Compression: Compact representation of data
 Signal Processing: extracting signal(sequences) even in presence of noise
 Prediction: using model predict the future values of time series
Real time system
 Light weight Computation framework
 Ability to handle 3V’s (Volume, Velocity and Variety) of Big Data
 Computation framework with micro batch processing architecture
Tech Architecture
Ui Server FrontendBackend Server
Data Source
 Data source that can keep the data from the source and ingest into computation
framework which can
 Take Advantage of distributed computation framework
 Store data in a fault tolerant manner
Tech Architecture
Ui Server FrontendBackend Server
Using Spark and Kafka
Frontend
Via Server
Connecting with IoT
 Connect Mobile accelerometer to AWS IoT and stream data.
 Train the system to predict an user’s behavior using accelerometer data.
Mobile IoT architecture
600 ms
5 sec
3 sec
2 sec
4 sec
Frontend
Via Server
Bottlenecks
 Small File Issues: writing and reading huge number of small files.
 Sharing data between batches.
Fix: Small Files Problem
 Implemented a in memory queue to hold data for several batches and then
compile everything into a single file and write to storage system
 Can also serve UI requests from in-memory queue.
 This eliminates the extra read calls from storage system to serve UI requests
 Allows the writes in first place to be asynchronous
Why Share data between batches
 In Real time data ingestion, data can be broken into different batches depending
upon the batch size we choose
 We need to take care of signals overflowing across batches
Sharing Data between batches
 UpdateStateByKey
 ssc.remember()
 Spark Accumulators
Mobile IoT architecture
600 ms
5 sec
3 sec
2 sec
4 sec
Frontend
Via Server
Mobile IoT architecture (updated)
600 ms
5 sec
1 sec
async
Frontend
Demo QUICK DEMONSTRATION
WITH FORCE SENSOR
“
”
Keep calm and ask questions
Q & A session

Más contenido relacionado

La actualidad más candente

Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
Anant Rustagi
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
Databricks
 

La actualidad más candente (20)

Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug Grall
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise Search
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on Databricks
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
 

Destacado

Can we know the future? By John Wilkins
Can we know the future?  By John WilkinsCan we know the future?  By John Wilkins
Can we know the future? By John Wilkins
Adam Ford
 
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
Justin Hunter
 

Destacado (20)

Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Joining Large data at Scale
Joining Large data at ScaleJoining Large data at Scale
Joining Large data at Scale
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big Data
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
 
R-Shiny Cheat sheet
R-Shiny Cheat sheetR-Shiny Cheat sheet
R-Shiny Cheat sheet
 
Hive function-cheat-sheet
Hive function-cheat-sheetHive function-cheat-sheet
Hive function-cheat-sheet
 
2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...
2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...
2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...
 
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
 
Can we know the future? By John Wilkins
Can we know the future?  By John WilkinsCan we know the future?  By John Wilkins
Can we know the future? By John Wilkins
 
The Big 5- Twitter
The Big 5- TwitterThe Big 5- Twitter
The Big 5- Twitter
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with Hadoop
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Follow up SPARK
Follow up SPARKFollow up SPARK
Follow up SPARK
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with R
 

Similar a ML on Big Data: Real-Time Analysis on Time Series

Similar a ML on Big Data: Real-Time Analysis on Time Series (20)

AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 Summary
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
Proposal for google summe of code 2016
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update Conference
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
DataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT WorkshopDataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT Workshop
 

Más de Sigmoid

Real-Time Stock Market Analysis using Spark Streaming
 Real-Time Stock Market Analysis using Spark Streaming Real-Time Stock Market Analysis using Spark Streaming
Real-Time Stock Market Analysis using Spark Streaming
Sigmoid
 

Más de Sigmoid (20)

Monitoring and tuning Spark applications
Monitoring and tuning Spark applicationsMonitoring and tuning Spark applications
Monitoring and tuning Spark applications
 
Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1
 
Real-Time Stock Market Analysis using Spark Streaming
 Real-Time Stock Market Analysis using Spark Streaming Real-Time Stock Market Analysis using Spark Streaming
Real-Time Stock Market Analysis using Spark Streaming
 
Levelling up in Akka
Levelling up in AkkaLevelling up in Akka
Levelling up in Akka
 
Expression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutionsExpression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutions
 
Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...
 
Failsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they workFailsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they work
 
WEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERSWEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERS
 
Angular js performance improvements
Angular js performance improvementsAngular js performance improvements
Angular js performance improvements
 
Building high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesosBuilding high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesos
 
Equation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-sparkEquation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-spark
 
Introduction to apache nutch
Introduction to apache nutchIntroduction to apache nutch
Introduction to apache nutch
 
Approaches to text analysis
Approaches to text analysisApproaches to text analysis
Approaches to text analysis
 
Graph computation
Graph computationGraph computation
Graph computation
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Dashboard design By Anu Vijayan
Dashboard design By Anu VijayanDashboard design By Anu Vijayan
Dashboard design By Anu Vijayan
 
Introduction to Spark R with R studio - Mr. Pragith
Introduction to Spark R with R studio - Mr. Pragith Introduction to Spark R with R studio - Mr. Pragith
Introduction to Spark R with R studio - Mr. Pragith
 
Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. Jyotiska
 
Tale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark StreamingTale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark Streaming
 
Sparkstreaming with kafka and h base at scale (1)
Sparkstreaming with kafka and h base at scale (1)Sparkstreaming with kafka and h base at scale (1)
Sparkstreaming with kafka and h base at scale (1)
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

ML on Big Data: Real-Time Analysis on Time Series

  • 1. ML on Big Data: Real Time Analysis on Time Series
  • 2. Machine Learning on Big Data YASWANTH YADLAPALLI
  • 3. Topics  Business use case  Training phase of the algorithm  Tech stack  Real time implementation  Demonstration on a force sensor
  • 4. Data Model  We are currently working on these data models :  Unstructured data  Structured Data  Time series Data  For this talk we are going to concentrate on Time series data
  • 5. Problem Statement  To build a reactive application which trains on limited amount of data.
  • 6. Business use case  Main use case is in preventive maintenance systems.  Calendar based maintenance schedules and holding excessive inventory to reduce downtime all lead to inefficiencies and increase costs.  Recent failures in machinery of oil rigs, car manufacturing plants have cost their respective industries millions of dollars in down time and repairs.  Condition Based Monitoring systems are implemented with the goal of eliminating unplanned downtime and reducing operations cost by maintaining the proper equipment at the proper time.  As they say a stitch in time saves nine.
  • 11. Time series analytics  Any analytics algorithm should be a mathematical model that should:  Data Compression: Compact representation of data  Signal Processing: extracting signal(sequences) even in presence of noise  Prediction: using model predict the future values of time series
  • 12. Terminology  Patterns  Block of graph where values are within a range  Patterns are grown from pairs of sequential points till the block conform given thresholds  Clusters  Similar type of patterns
  • 13. Terminology  Sequences  A recurring series of patterns belonging to a set of clusters.  Concepts  Sequences which are tagged as relevant to the user.  Knowledge Base  Inference drawn from concepts.  This is the compressed representation of the time series
  • 14. Phases  Training phase  Objective is to build a Knowledge base.  Bulk historical data is given as input.  Parameters of the algorithm are fine tuned to match the use case.  Concepts are identified and assigned an action.  Validation Phase  Bulk  Bulk data is given.  Patterns are found and classified according to knowledge base.  Used to identify and tag scenarios over a known timeline.
  • 15. Phases  Decision phase  Real Time  For example a Kafka source is provided.  Received data is processed in batches.  Patterns spanning multiples batches are stitched.  If a sequence is identified as a concept, the specified action is triggered.
  • 17. Tech Architecture Ui Server FrontendBackend Server
  • 18. Benchmarks 0 2 4 6 8 10 12 0 10 20 30 40 50 60 Duration(Hours) Size (GB) Duration vs Size Driver memory (GB) Total executor memory (GB) Slave cores (procs)
  • 19. Real Time Analysis on Time Series ROHITH YERAVOTHULA
  • 20. Training phase output  Knowledge Base properties:  Data Compression: Compact representation of data  Signal Processing: extracting signal(sequences) even in presence of noise  Prediction: using model predict the future values of time series
  • 21. Real time system  Light weight Computation framework  Ability to handle 3V’s (Volume, Velocity and Variety) of Big Data  Computation framework with micro batch processing architecture
  • 22. Tech Architecture Ui Server FrontendBackend Server
  • 23. Data Source  Data source that can keep the data from the source and ingest into computation framework which can  Take Advantage of distributed computation framework  Store data in a fault tolerant manner
  • 24. Tech Architecture Ui Server FrontendBackend Server
  • 25. Using Spark and Kafka Frontend Via Server
  • 26. Connecting with IoT  Connect Mobile accelerometer to AWS IoT and stream data.  Train the system to predict an user’s behavior using accelerometer data.
  • 27. Mobile IoT architecture 600 ms 5 sec 3 sec 2 sec 4 sec Frontend Via Server
  • 28. Bottlenecks  Small File Issues: writing and reading huge number of small files.  Sharing data between batches.
  • 29. Fix: Small Files Problem  Implemented a in memory queue to hold data for several batches and then compile everything into a single file and write to storage system  Can also serve UI requests from in-memory queue.  This eliminates the extra read calls from storage system to serve UI requests  Allows the writes in first place to be asynchronous
  • 30. Why Share data between batches  In Real time data ingestion, data can be broken into different batches depending upon the batch size we choose  We need to take care of signals overflowing across batches
  • 31. Sharing Data between batches  UpdateStateByKey  ssc.remember()  Spark Accumulators
  • 32. Mobile IoT architecture 600 ms 5 sec 3 sec 2 sec 4 sec Frontend Via Server
  • 33. Mobile IoT architecture (updated) 600 ms 5 sec 1 sec async Frontend
  • 35. “ ” Keep calm and ask questions Q & A session

Notas del editor

  1. In this section I am going go give brief introduction of architecture and business use case of our system. Our goal is to make sense of any given sensor data may it be a pressure sensor in a valve or a camera on a self-driving car. By which we may be able to take smart decisions or make predictions about the future.
  2. Unstructured data doesn't have relations between columns.
  3. Whatever may be the data source, we want to make a generalized solution which can handle any type of variation and enable the user to get a specialized system for this own use case
  4. Assume there is oil rig with 10 machine and 100 sensors each. Say, we know that a component in a machine needs maintainance every 3 months, but in many real life situations the component may breakdown pre maturely, which may cause the company millions in down time. Having a person monitor all the sensor outputs and determine whether any component needs maintainance is not a viable solution. Our system is built to handle this use case.
  5. Please have a look this data from a pressure sensor in a valve. Say as an user we know that the first anomaly is caused by miss orientation of the spring and second is caused when seal of the valve is broken. Can you suggest any methods to isolate these 2 phenomenon. Most of the traditional approaches wouldn't take in to consideration if a new type pattern emerges. next slide our solution.
  6. loss of similarity
  7. Knowledge base is inferences drawn from the given data.
  8. This is the pipeline all the phases of our application go through. First you provide a data source, currently you can upload a local file in your computer or select it from your google chrome, formats supported are CSV and TSV. Then the data is ingested using a schema provided. You can type cast variable, join columns from multiple files, etc. Using this ingested data as our time series we can compute PCSC,
  9. Simple example for say y=sin(x) time series model Prediction on time series data is one of the use case for real time time series data analytics talk 1 explains about how did we train the system and teach it the make decisions. We use the trained system for real time analytics We need some streaming or live computation framework
  10. Spark streaming is a micro batch processing architecture Collects stream data into small batches and processes it Job Creation and scheduling overhead is in order of milliseconds Batch interval can be as small as 1 Second
  11. We can’t rely on original data source as it cannot provide recent data once lost
  12. Apache kafka is a Distributed streaming platform Publish and subscribe to streams of records. Store streams of records in a fault-tolerant way
  13. Streams of time series data will be pumping data into kafka Spark will connect to kafka brokers and consume data Processes data and store to database UI server will pull data for visualization Spark is the computation layer while kafka acts as data source for streaming data We now have a streaming end to end application :  a data source to stream, a compute framework and a storage system We can connect any real time streaming source One such demo: simple aws iot with moble sensors
  14. Every batch is writing a lot of small files into storage system (HDFS) We use parquet as it is one of the best compressed fomat of data available Spark parquet format small files writing is adding up to extra overhead Reading several small files from Storage to serve UI requests is also adding up to delay
  15. Sharing data between batches
  16. State maintain
  17. Make one line Basic problem with live streaming is data will be broken into batches. Our mathematical model can’t rely upon a batch, needs to wait on for next batch to see if data is overflown