SlideShare a Scribd company logo
1 of 58
Download to read offline
Machine Learning, Deep
Learning, Big Data
Hands-On
by Dony Riyanto
Prepared and Presented to Panin Asset Management
January 2019
Hands-on Agenda
• Machine Learning Re-Visited
• Python Example of Machine Learning
• Introduction to Deep Learning
• Immentation of 'Big Data' (Hadoop Ecosystem)
• Hadoop File System
• Hadoop Map Reduce
• Case Study
• More advance implementation of Big Data
The Learning Problem
The essence of ML:
1. We have data
2. Patterns exist in data
3. We can't do math formula
(don't know the formula yet)
Examples:
 Movie Rating
 Credit Approval
 Hand Written Recognition
Domain Areas
 Computer Vision
 Natural Language Processing
 Business Intelligence
Components of Learning
Example in Banking: Credit Card Approval
Input : x (customer application)
Output : y (good/bad customer)
Unknown Target Function f :XY
Dataset {x, y} (customers record database)
Hypothesis Set: H : X Y
Final Hypothesis g
Learning Model = Hypothesis Set + Learning Algorithm
Machine Learning Model
Spatial Data
(Text, Image)
{x, y}
Sequence or
Time Series
Data
{x, t}
Classifier
Class Score
Regression
Cont. Values
Main Paradigms
Automatic discovery of patterns in data through computer algorithms and the use of
those patterns to take actions such as classifying or clustering the data into
categories.
Supervised Learning: Learning by labeled example
E.g. An email spam detector
We have (input, correct output), and we can predict (new input, predicted output)
Amazingly effective if you have lots of data
Unsupervised Learning: Discovering Patterns
E.g. Data clustering
Instead of (input, correct output), we get (input, ?)
Difficult in practices but useful if we lack labeled data
Reinforcement Learning: Feedback & Error
E.g. Learning to play chess
Instead of (input, correct output), we get (input, only some output, grade of this output)
Works well in some domains, becoming more important
What/why is Python
Python is an interpreted, high-level programming language, general-
purpose programming language.
Its high-level built in data structures, combined with dynamic typing and
dynamic binding, make it very attractive for Rapid Application Development,
as well as for use as a scripting or glue language to connect existing
components together. Python's simple, easy to learn syntax emphasizes
readability and therefore reduces the cost of program maintenance. Python
supports modules and packages, which encourages program modularity and
code reuse. The Python interpreter and the extensive standard library are
available in source or binary form without charge for all major platforms, and
can be freely distributed.
Machine Learning with Python
• We need Python 2.7.x or 3.7.x
• Libraries, ex.:
• numpy (fundamental package for scientific computing with Python)
• matplotlib (plotting library for the Python programming language and its
numerical mathematics extension NumPy)
• pandas (software library written for the Python programming language for
data manipulation and analysis)
• seaborn (Python data visualization library based on matplotlib)
• sklearn (Scikit-learn is a machine learning library for the Python programming
language)
• IDE, ex: pycharm
• Alternatives, install Anaconda (distribution of the Python programming
languages for scientific computing (data science, machine learning applications,
large-scale data processing, predictive analytics, etc.)
Machine Learning with Python
• Python 3 installation
• Introduction to pip (python package installer)
• Install PyCharm
• or install Anaconda
Lesson 1
*data preprocessing
Lesson 1 (with Anaconda)
Lesson 2
*Class labeling with preprocessing
Lesson 3
*Load CSV and data observation
Lesson 4
Introduction to Deep Learning
• Deep learning has produced good results for a few applications
such as computer vision, language translation, image captioning,
audio transcription, molecular biology, speech recognition,
natural language processing, self-driving cars, brain tumour
detection, real-time speech translation, music composition,
automatic game playing and so on.
• Deep learning is the next big leap after machine learning with a
more advanced implementation. Currently, it is heading towards
becoming an industry standard bringing a strong promise of
being a game changer when dealing with raw unstructured data.
Introduction to Deep Learning
• Deep learning is currently one of the best solution providers fora wide
range of real-world problems. Developers are building AI programs that,
instead of using previously given rules, learn from examples to solve
complicated tasks. With deep learning being used by many data scientists,
deeper neural networks are delivering results that are ever more accurate.
• The idea is to develop deep neural networks by increasing the number of
training layers for each network; machine learns more about the data until
it is as accurate as possible. Developers can use deep learning techniques
to implement complex machine learning tasks, and train AI networks to
have high levels of perceptual recognition.
Introduction to Deep Learning
• Deep learning finds its popularity in Computer vision. Here one
of the tasks achieved is image classification where given input
images are classified as cat, dog, etc. or as a class or label that
best describe the image. We as humans learn how to do this
task very early in our lives and have these skills of quickly
recognizing patterns, generalizing from prior knowledge, and
adapting to different image environments.
Deep Learning Performance
Deep Learning with TensorFlow
• Googles TensorFlow is a python library. This library is a great choice for
building commercial grade deep learning applications.
• TensorFlow grew out of another library DistBelief V2 that was a part of
Google Brain Project. This library aims to extend the portability of
machine learning so that research models could be applied to
commercial-grade applications.
• Much like the Theano library, TensorFlow is based on computational
graphs where a node represents persistent data or math operation and
edges represent the flow of data between nodes, which is a
multidimensional array or tensor; hence the name TensorFlow
Deep Learning Implementation with
Tensorflow and Python
• Preparation (Python + libraries)
• Installing Tensorflow
• Running Several Tensorflow built-in example, ex.:
• Regression
• Image Classification
Introduction to Hadoop
Hadoop
Hadoop is:
• - scalable.
• - a “Framework”.
• - not a drop in replacement for RDBMS.
• - great for pipelining massive amounts of data to achieve the
end result.
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
• example of file/text search
Hadoop
• Planning
• Installation step
• Using HDFS
• Using Map Reduce
Hadoop Map Reduce
• MapReduce is a processing technique and a program model for distributed computing
based on java. The MapReduce algorithm contains two important tasks, namely Map
and Reduce. Map takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key/value pairs). Secondly, reduce task,
which takes the output from a map as an input and combines those data tuples into a
smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task
is always performed after the map job.
VS
Hadoop Map Reduce
• The MapReduce framework operates on <key, value> pairs, that is, the framework views the input to the job
as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job, conceivably
of different types.
• The key and the value classes should be in serialized manner by the framework and hence, need to
implement the Writable interface. Additionally, the key classes have to implement the Writable-Comparable
interface to facilitate sorting by the framework. Input and Output types of a MapReduce job − (Input) <k1, v1>
→ map → <k2, v2> → reduce → <k3, v3>(Output).
Hadoop Map Reduce
• Words Count (without map-reduce)
Hadoop Map Reduce
• Words Count (mapper)
Hadoop Map Reduce
• Words Count (reducer)
Hadoop Map Reduce
• Run on HadoopMR
input file from local or HDFS
mapper application (see prev. slide)
reducer application (see prev. slide)
*mapper and recuder apps can be written in Python , R, Java, Scala, etc
Hadoop Map Reduce
Hadoop Map Reduce
• Map Reduce is not magic. It's a method
• Map Reduce is not always about big data (ex: find pi value)
• Map Reduce is not silver bullet. (e.g: batch vs streaming data)
• Map Reduce is usually solved:
• Batch processing flow
• Unstructured/Semi-structured data
Bigger Image of Hadoop (Hadoop Ecosystem)
Data Stream
Why Stream Processing?
• Processing unbounded data sets, or "stream processing", is a new way
of looking at what has always been done as batch in the past. Whilst
intra-day ETL and frequent batch executions have brought latencies
down, they are still independent executions with optional bespoke code
in place to handle intra-batch accumulations. With a platform such as
Spark Streaming we have a framework that natively supports
processing both within-batch and across-batch (windowing).
• By taking a stream processing approach we can benefit in several ways.
The most obvious is reducing latency between an event occurring and
taking an action driven by it, whether automatic or via analytics
presented to a human. Other benefits include a more smoothed out
resource consumption profile.
Introducing Spark
• Better speed compared to HadoopMR
• Minimized disk read-write (on memory processing)
• Comes with Spark Streaming (later, Hadoop also create
Hadoop Stream)
• Still in Hadoop Ecosystem
Data Stream with Spark Streaming
Simple Spark Streaming Implementation Example
near realtime dashboard
data stream processing and analytics
(bigger/reliable capabilities)
multiple channel/type of data
Different programming style.
Spark libraries included in app
returned data of processing/analytics
Infinite run
Spark Streaming Implementation
• Review some spark streaming example
• Review some Spark Streaming architecture
Example of Bukalapak
• Save all data from 2014 'til now
• >1.5PB data including:
• Product images
• Products data
• Messaging
Buka Lapak 'Big Data' Implementation
Example: Application Health Monitoring
Example: Recomender Engine
source:
Example: Recomender Engine
Example: Gojek Data Visualization
Example Gojek Problem
Example Gojek Problem
Example Gojek Problem
Example Gojek Problem

More Related Content

What's hot

LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
OzgurOscarOzkan
 

What's hot (20)

Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Cloud Computing Fundamental
Cloud Computing FundamentalCloud Computing Fundamental
Cloud Computing Fundamental
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
IBM Watson
IBM WatsonIBM Watson
IBM Watson
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Mongo db report
Mongo db reportMongo db report
Mongo db report
 
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 

Similar to Big Data Analytics (ML, DL, AI) hands-on

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Similar to Big Data Analytics (ML, DL, AI) hands-on (20)

Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Toolboxes for data scientists
Toolboxes for data scientistsToolboxes for data scientists
Toolboxes for data scientists
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
hadoop
hadoophadoop
hadoop
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
How to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdfHow to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdf
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Python ml
Python mlPython ml
Python ml
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 

More from Dony Riyanto

More from Dony Riyanto (20)

KNIME For Enterprise Data Analytics.pdf
KNIME For Enterprise Data Analytics.pdfKNIME For Enterprise Data Analytics.pdf
KNIME For Enterprise Data Analytics.pdf
 
Implementasi Teknologi Industri 4.0 pada TNI AD
Implementasi Teknologi Industri 4.0 pada TNI ADImplementasi Teknologi Industri 4.0 pada TNI AD
Implementasi Teknologi Industri 4.0 pada TNI AD
 
Blockchain untuk Big Data
Blockchain untuk Big DataBlockchain untuk Big Data
Blockchain untuk Big Data
 
Mengenal ROS2 Galactic
Mengenal ROS2 GalacticMengenal ROS2 Galactic
Mengenal ROS2 Galactic
 
Membuat Desain Roket Amatir dan Menjalankan Simulasi
Membuat Desain Roket Amatir dan Menjalankan SimulasiMembuat Desain Roket Amatir dan Menjalankan Simulasi
Membuat Desain Roket Amatir dan Menjalankan Simulasi
 
Creating UDP Broadcast App Using Python Socket on WIndows & Linux
Creating UDP Broadcast App Using Python Socket on WIndows & LinuxCreating UDP Broadcast App Using Python Socket on WIndows & Linux
Creating UDP Broadcast App Using Python Socket on WIndows & Linux
 
Desain ground control & Sistem Pendukung untuk Male UAV/UCAV
Desain ground control & Sistem Pendukung untuk Male UAV/UCAVDesain ground control & Sistem Pendukung untuk Male UAV/UCAV
Desain ground control & Sistem Pendukung untuk Male UAV/UCAV
 
Application Performance, Test and Monitoring
Application Performance, Test and MonitoringApplication Performance, Test and Monitoring
Application Performance, Test and Monitoring
 
Cloud Service Design for Computer Vision, Image & Video Processing+Analytics
Cloud Service Design for Computer Vision, Image & Video Processing+AnalyticsCloud Service Design for Computer Vision, Image & Video Processing+Analytics
Cloud Service Design for Computer Vision, Image & Video Processing+Analytics
 
RealNetworks - SAFR Platform Whitepaper
RealNetworks - SAFR Platform WhitepaperRealNetworks - SAFR Platform Whitepaper
RealNetworks - SAFR Platform Whitepaper
 
Dl6960 Demo Software User's Guide v1.4
Dl6960 Demo Software User's Guide v1.4Dl6960 Demo Software User's Guide v1.4
Dl6960 Demo Software User's Guide v1.4
 
Review of Existing Response System & Technology.
Review of Existing Response System & Technology.Review of Existing Response System & Technology.
Review of Existing Response System & Technology.
 
Beberapa Studi Kasus Fintech Micro Payment
Beberapa Studi Kasus Fintech Micro PaymentBeberapa Studi Kasus Fintech Micro Payment
Beberapa Studi Kasus Fintech Micro Payment
 
Rencana Pengembangan REST API dan Microservice pada MONEVRISBANG
Rencana Pengembangan REST API dan Microservice pada MONEVRISBANGRencana Pengembangan REST API dan Microservice pada MONEVRISBANG
Rencana Pengembangan REST API dan Microservice pada MONEVRISBANG
 
Implementasi Full Textsearch pada Database
Implementasi Full Textsearch pada DatabaseImplementasi Full Textsearch pada Database
Implementasi Full Textsearch pada Database
 
Beberapa strategi implementasi open api untuk legacy system existing app
Beberapa strategi implementasi open api untuk legacy system existing appBeberapa strategi implementasi open api untuk legacy system existing app
Beberapa strategi implementasi open api untuk legacy system existing app
 
Pengenalan Big Data untuk Pemula
Pengenalan Big Data untuk PemulaPengenalan Big Data untuk Pemula
Pengenalan Big Data untuk Pemula
 
Introduction to BACnet: Building Automation & Control Network
Introduction to BACnet: Building Automation & Control NetworkIntroduction to BACnet: Building Automation & Control Network
Introduction to BACnet: Building Automation & Control Network
 
Enterprise Microservices
Enterprise MicroservicesEnterprise Microservices
Enterprise Microservices
 
Edge Exploration of QR Code Technology Implementation
Edge Exploration of QR Code Technology ImplementationEdge Exploration of QR Code Technology Implementation
Edge Exploration of QR Code Technology Implementation
 

Recently uploaded

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Recently uploaded (20)

Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

Big Data Analytics (ML, DL, AI) hands-on

  • 1. Machine Learning, Deep Learning, Big Data Hands-On by Dony Riyanto Prepared and Presented to Panin Asset Management January 2019
  • 2. Hands-on Agenda • Machine Learning Re-Visited • Python Example of Machine Learning • Introduction to Deep Learning • Immentation of 'Big Data' (Hadoop Ecosystem) • Hadoop File System • Hadoop Map Reduce • Case Study • More advance implementation of Big Data
  • 3. The Learning Problem The essence of ML: 1. We have data 2. Patterns exist in data 3. We can't do math formula (don't know the formula yet) Examples:  Movie Rating  Credit Approval  Hand Written Recognition Domain Areas  Computer Vision  Natural Language Processing  Business Intelligence
  • 4. Components of Learning Example in Banking: Credit Card Approval Input : x (customer application) Output : y (good/bad customer) Unknown Target Function f :XY Dataset {x, y} (customers record database) Hypothesis Set: H : X Y Final Hypothesis g Learning Model = Hypothesis Set + Learning Algorithm
  • 5. Machine Learning Model Spatial Data (Text, Image) {x, y} Sequence or Time Series Data {x, t} Classifier Class Score Regression Cont. Values
  • 6. Main Paradigms Automatic discovery of patterns in data through computer algorithms and the use of those patterns to take actions such as classifying or clustering the data into categories. Supervised Learning: Learning by labeled example E.g. An email spam detector We have (input, correct output), and we can predict (new input, predicted output) Amazingly effective if you have lots of data Unsupervised Learning: Discovering Patterns E.g. Data clustering Instead of (input, correct output), we get (input, ?) Difficult in practices but useful if we lack labeled data Reinforcement Learning: Feedback & Error E.g. Learning to play chess Instead of (input, correct output), we get (input, only some output, grade of this output) Works well in some domains, becoming more important
  • 7. What/why is Python Python is an interpreted, high-level programming language, general- purpose programming language. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.
  • 8. Machine Learning with Python • We need Python 2.7.x or 3.7.x • Libraries, ex.: • numpy (fundamental package for scientific computing with Python) • matplotlib (plotting library for the Python programming language and its numerical mathematics extension NumPy) • pandas (software library written for the Python programming language for data manipulation and analysis) • seaborn (Python data visualization library based on matplotlib) • sklearn (Scikit-learn is a machine learning library for the Python programming language) • IDE, ex: pycharm • Alternatives, install Anaconda (distribution of the Python programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.)
  • 9. Machine Learning with Python • Python 3 installation • Introduction to pip (python package installer) • Install PyCharm • or install Anaconda
  • 11. Lesson 1 (with Anaconda)
  • 12. Lesson 2 *Class labeling with preprocessing
  • 13. Lesson 3 *Load CSV and data observation
  • 15. Introduction to Deep Learning • Deep learning has produced good results for a few applications such as computer vision, language translation, image captioning, audio transcription, molecular biology, speech recognition, natural language processing, self-driving cars, brain tumour detection, real-time speech translation, music composition, automatic game playing and so on. • Deep learning is the next big leap after machine learning with a more advanced implementation. Currently, it is heading towards becoming an industry standard bringing a strong promise of being a game changer when dealing with raw unstructured data.
  • 16. Introduction to Deep Learning • Deep learning is currently one of the best solution providers fora wide range of real-world problems. Developers are building AI programs that, instead of using previously given rules, learn from examples to solve complicated tasks. With deep learning being used by many data scientists, deeper neural networks are delivering results that are ever more accurate. • The idea is to develop deep neural networks by increasing the number of training layers for each network; machine learns more about the data until it is as accurate as possible. Developers can use deep learning techniques to implement complex machine learning tasks, and train AI networks to have high levels of perceptual recognition.
  • 17. Introduction to Deep Learning • Deep learning finds its popularity in Computer vision. Here one of the tasks achieved is image classification where given input images are classified as cat, dog, etc. or as a class or label that best describe the image. We as humans learn how to do this task very early in our lives and have these skills of quickly recognizing patterns, generalizing from prior knowledge, and adapting to different image environments.
  • 19. Deep Learning with TensorFlow • Googles TensorFlow is a python library. This library is a great choice for building commercial grade deep learning applications. • TensorFlow grew out of another library DistBelief V2 that was a part of Google Brain Project. This library aims to extend the portability of machine learning so that research models could be applied to commercial-grade applications. • Much like the Theano library, TensorFlow is based on computational graphs where a node represents persistent data or math operation and edges represent the flow of data between nodes, which is a multidimensional array or tensor; hence the name TensorFlow
  • 20. Deep Learning Implementation with Tensorflow and Python • Preparation (Python + libraries) • Installing Tensorflow • Running Several Tensorflow built-in example, ex.: • Regression • Image Classification
  • 22. Hadoop Hadoop is: • - scalable. • - a “Framework”. • - not a drop in replacement for RDBMS. • - great for pipelining massive amounts of data to achieve the end result.
  • 32. Hadoop • example of file/text search
  • 33. Hadoop • Planning • Installation step • Using HDFS • Using Map Reduce
  • 34. Hadoop Map Reduce • MapReduce is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). Secondly, reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task is always performed after the map job. VS
  • 35. Hadoop Map Reduce • The MapReduce framework operates on <key, value> pairs, that is, the framework views the input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job, conceivably of different types. • The key and the value classes should be in serialized manner by the framework and hence, need to implement the Writable interface. Additionally, the key classes have to implement the Writable-Comparable interface to facilitate sorting by the framework. Input and Output types of a MapReduce job − (Input) <k1, v1> → map → <k2, v2> → reduce → <k3, v3>(Output).
  • 36. Hadoop Map Reduce • Words Count (without map-reduce)
  • 37. Hadoop Map Reduce • Words Count (mapper)
  • 38. Hadoop Map Reduce • Words Count (reducer)
  • 39. Hadoop Map Reduce • Run on HadoopMR input file from local or HDFS mapper application (see prev. slide) reducer application (see prev. slide) *mapper and recuder apps can be written in Python , R, Java, Scala, etc
  • 41. Hadoop Map Reduce • Map Reduce is not magic. It's a method • Map Reduce is not always about big data (ex: find pi value) • Map Reduce is not silver bullet. (e.g: batch vs streaming data) • Map Reduce is usually solved: • Batch processing flow • Unstructured/Semi-structured data
  • 42. Bigger Image of Hadoop (Hadoop Ecosystem)
  • 43. Data Stream Why Stream Processing? • Processing unbounded data sets, or "stream processing", is a new way of looking at what has always been done as batch in the past. Whilst intra-day ETL and frequent batch executions have brought latencies down, they are still independent executions with optional bespoke code in place to handle intra-batch accumulations. With a platform such as Spark Streaming we have a framework that natively supports processing both within-batch and across-batch (windowing). • By taking a stream processing approach we can benefit in several ways. The most obvious is reducing latency between an event occurring and taking an action driven by it, whether automatic or via analytics presented to a human. Other benefits include a more smoothed out resource consumption profile.
  • 44. Introducing Spark • Better speed compared to HadoopMR • Minimized disk read-write (on memory processing) • Comes with Spark Streaming (later, Hadoop also create Hadoop Stream) • Still in Hadoop Ecosystem
  • 45. Data Stream with Spark Streaming
  • 46. Simple Spark Streaming Implementation Example near realtime dashboard data stream processing and analytics (bigger/reliable capabilities) multiple channel/type of data
  • 47. Different programming style. Spark libraries included in app returned data of processing/analytics Infinite run
  • 48. Spark Streaming Implementation • Review some spark streaming example • Review some Spark Streaming architecture
  • 49. Example of Bukalapak • Save all data from 2014 'til now • >1.5PB data including: • Product images • Products data • Messaging
  • 50. Buka Lapak 'Big Data' Implementation
  • 54. Example: Gojek Data Visualization