SlideShare una empresa de Scribd logo
1 de 22
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Real-Time Ingesting and
Transforming Sensor Data and
Social Data with NiFi and
TensorFlow
Timothy Spann
Hortonworks
@PaaSDev
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
• What do we want to do?
• Why?
• How?
• Apache NiFi
• TensorFlow
• Natural Language Processing
• Demo
• Questions
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What do we want to do?
• MiniFi ingests camera images and
sensor data
• Run TensorFlow Inception v3 to
recognize objects in image
• NiFi stores images, metadata and
enriched data in Hadoop
• NiFi ingests social data and feeds
• NiFi analyzes sentiment of textual
data
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Why Gather and Analyze Social Media Stream?
- Automate processes to maximize Social
Media team’s time
- Improved response time to requests,
complaints and emergencies in social
media
- Predictive analytics to know when and
where problems will happen
- Learn where unhappy customers are and
address instantly
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Aggregate all data from sensors, geo-location devices, machines and social
feeds
Collect: Bring Together
Mediate point-to-point and bi-directional data flows, delivering data
reliably to HBase, Hive, Slack and Email.
Conduct: Mediate the Data Flow
Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather,
location, NLP and TensorFlow.
Curate: Gain Insights
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a fifty sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
DATA
ENRICHMENT
DATA
DISCOVERY
Inception
v3
PREDICTIVE
ANALYTICS
Sentiment
Analysis
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Why TensorFlow?
• Google
• Multiple platform
support
• Hadoop integration
• Spark integration
• Keras
• Large Community
• Python and Java APIs
• GPU Support
• Mobile Support
• Inception v3
• Clustering
• Fully functional demos
• Open Source
• Apache Licensed
• Large Model Library
• Buzz
• Extensive Documentation
• Raspberry Pi Support
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
• TensorFlow (C++, Python, Java)
via ExecuteStreamCommand
• TensorFlow NiFi Java Custom Processor
• TensorFlow Running on Edge Nodes (MiniFi)
Apache NiFi Integration with TensorFlow Options
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
• TensorFlow Mobile (iOS, Android, RPi)
• TensorFlow on Spark (Yahoo) via Livy, S2S, Kafka
• TensorFlow Running in Containers in YARN 3.0 on Hadoop
• gRPC Call to TensorFlow Serving
Apache NiFi Integration with TensorFlow Options
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
ExecuteStreamCommand To TensorFlow
https://community.hortonworks.com/articles/58265/analyzing-images-in-hdf-20-using-tensorflow.html
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
python classify_image.py --image_file /dir/solarroofpanel.jpg
solar dish, solar collector, solar furnace (score = 0.98316)
window screen (score = 0.00196)
manhole cover (score = 0.00070)
radiator (score = 0.00041)
doormat, welcome mat (score = 0.00041)
TensorFlow via Python
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
TensorFlow Java Processor in NiFi
https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-
apache-nifi-12-for.html
https://github.com/tspannhw/nifi-tensorflow-processor
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
TensorFlow Running on Edge Nodes (MiniFi)
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
pip install -U textblob
python -m textblob.download_corpora
Installing TextBlob for Python
Installing spaCy for Python
https://community.hortonworks.com/articles/76935/using-sentiment-analysis-and-nlp-tools-with-hdp-25.html
pip install -U spacy
python -m spacy.en.download all
Installing NLTK for Python 2.7
http://www.nltk.org/install.html
pip install -U nltk
pip install -U numpy
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
run.sh
python sentiment.py "$@”
sentiment.py
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import sys
sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(sys.argv[1])
print('Compound {0} Negative {1} Neutral {2} Positive {3} '.format(
ss['compound'],ss['neg'],ss['neu'],ss['pos']))
Local Sentiment Analysis via Python
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache OpenNLP for Entity Resolution
Processor
https://github.com/tspannhw/nifi-nlp-
processor
Requires installation of NAR and Apache
OpenNLP BINs
This is a non-supported processor that I wrote
and put into the community.
Installing Apache OpenNLP NiFi Processor
https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Stanford CoreNLP Processor
https://github.com/tspannhw/nifi-corenlp-processor
Requires install of NAR and Stanford English Models
http://nlp.stanford.edu/software/stanford-english-
corenlp-2017-06-09-models.jar
This is a non-supported processor that I wrote and put
into the community.
Installing Stanford CoreNLP Processor
https://community.hortonworks.com/articles/81270/adding-stanford-corenlp-to-big-data-pipelines-apac-1.html
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Code and Demo
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Contact:
Timothy Spann
@PaaSDeV
http://www.meetup.com/futureofdata-princeton
https://dzone.com/users/297029/bunkertor.html
http://community.hortonworks.com/users/9304/tspann.html
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Community Connection
Read access for everyone, join to participate and be recognized
• Full Q&A Platform (like StackOverflow)
• Knowledge Base Articles
• Code Samples and Repositories
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Community Engagement
Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved
4,000+
Registered Users
10,000+
Answers
15,000+
Technical Assets
One Website!

Más contenido relacionado

La actualidad más candente

Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkCloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
DataWorks Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
DataWorks Summit
 

La actualidad más candente (20)

Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
 
Creating the Internet of Your Things
Creating the Internet of Your ThingsCreating the Internet of Your Things
Creating the Internet of Your Things
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Next Generation Execution for Apache Storm
Next Generation Execution for Apache StormNext Generation Execution for Apache Storm
Next Generation Execution for Apache Storm
 
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkCloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
SparkR best practices for R data scientist
SparkR best practices for R data scientistSparkR best practices for R data scientist
SparkR best practices for R data scientist
 

Similar a Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi and TensorFlow

IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
DataWorks Summit
 

Similar a Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi and TensorFlow (20)

Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
 
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
 
Apache deep learning 101
Apache deep learning 101Apache deep learning 101
Apache deep learning 101
 
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
Apache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFiApache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFi
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Enterprise Data Science at Scale
Enterprise Data Science at ScaleEnterprise Data Science at Scale
Enterprise Data Science at Scale
 
Hands-On Deep Dive with MiniFi and Apache MXNet
Hands-On Deep Dive with MiniFi and Apache MXNetHands-On Deep Dive with MiniFi and Apache MXNet
Hands-On Deep Dive with MiniFi and Apache MXNet
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
 
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
 
Enterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFiEnterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFi
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
 
Social Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and SupersetSocial Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and Superset
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi and TensorFlow

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi and TensorFlow Timothy Spann Hortonworks @PaaSDev
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda • What do we want to do? • Why? • How? • Apache NiFi • TensorFlow • Natural Language Processing • Demo • Questions
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved What do we want to do? • MiniFi ingests camera images and sensor data • Run TensorFlow Inception v3 to recognize objects in image • NiFi stores images, metadata and enriched data in Hadoop • NiFi ingests social data and feeds • NiFi analyzes sentiment of textual data
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Why Gather and Analyze Social Media Stream? - Automate processes to maximize Social Media team’s time - Improved response time to requests, complaints and emergencies in social media - Predictive analytics to know when and where problems will happen - Learn where unhappy customers are and address instantly
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Aggregate all data from sensors, geo-location devices, machines and social feeds Collect: Bring Together Mediate point-to-point and bi-directional data flows, delivering data reliably to HBase, Hive, Slack and Email. Conduct: Mediate the Data Flow Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather, location, NLP and TensorFlow. Curate: Gain Insights
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a fifty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved DATA ENRICHMENT DATA DISCOVERY Inception v3 PREDICTIVE ANALYTICS Sentiment Analysis
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Why TensorFlow? • Google • Multiple platform support • Hadoop integration • Spark integration • Keras • Large Community • Python and Java APIs • GPU Support • Mobile Support • Inception v3 • Clustering • Fully functional demos • Open Source • Apache Licensed • Large Model Library • Buzz • Extensive Documentation • Raspberry Pi Support
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved • TensorFlow (C++, Python, Java) via ExecuteStreamCommand • TensorFlow NiFi Java Custom Processor • TensorFlow Running on Edge Nodes (MiniFi) Apache NiFi Integration with TensorFlow Options
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved • TensorFlow Mobile (iOS, Android, RPi) • TensorFlow on Spark (Yahoo) via Livy, S2S, Kafka • TensorFlow Running in Containers in YARN 3.0 on Hadoop • gRPC Call to TensorFlow Serving Apache NiFi Integration with TensorFlow Options
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved ExecuteStreamCommand To TensorFlow https://community.hortonworks.com/articles/58265/analyzing-images-in-hdf-20-using-tensorflow.html
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved python classify_image.py --image_file /dir/solarroofpanel.jpg solar dish, solar collector, solar furnace (score = 0.98316) window screen (score = 0.00196) manhole cover (score = 0.00070) radiator (score = 0.00041) doormat, welcome mat (score = 0.00041) TensorFlow via Python
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved TensorFlow Java Processor in NiFi https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in- apache-nifi-12-for.html https://github.com/tspannhw/nifi-tensorflow-processor
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved TensorFlow Running on Edge Nodes (MiniFi)
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved pip install -U textblob python -m textblob.download_corpora Installing TextBlob for Python Installing spaCy for Python https://community.hortonworks.com/articles/76935/using-sentiment-analysis-and-nlp-tools-with-hdp-25.html pip install -U spacy python -m spacy.en.download all Installing NLTK for Python 2.7 http://www.nltk.org/install.html pip install -U nltk pip install -U numpy
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved run.sh python sentiment.py "$@” sentiment.py from nltk.sentiment.vader import SentimentIntensityAnalyzer import sys sid = SentimentIntensityAnalyzer() ss = sid.polarity_scores(sys.argv[1]) print('Compound {0} Negative {1} Neutral {2} Positive {3} '.format( ss['compound'],ss['neg'],ss['neu'],ss['pos'])) Local Sentiment Analysis via Python
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache OpenNLP for Entity Resolution Processor https://github.com/tspannhw/nifi-nlp- processor Requires installation of NAR and Apache OpenNLP BINs This is a non-supported processor that I wrote and put into the community. Installing Apache OpenNLP NiFi Processor https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Stanford CoreNLP Processor https://github.com/tspannhw/nifi-corenlp-processor Requires install of NAR and Stanford English Models http://nlp.stanford.edu/software/stanford-english- corenlp-2017-06-09-models.jar This is a non-supported processor that I wrote and put into the community. Installing Stanford CoreNLP Processor https://community.hortonworks.com/articles/81270/adding-stanford-corenlp-to-big-data-pipelines-apac-1.html
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Code and Demo
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Contact: Timothy Spann @PaaSDeV http://www.meetup.com/futureofdata-princeton https://dzone.com/users/297029/bunkertor.html http://community.hortonworks.com/users/9304/tspann.html
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hortonworks Community Connection Read access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Community Engagement Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved 4,000+ Registered Users 10,000+ Answers 15,000+ Technical Assets One Website!

Notas del editor

  1. Monitor Time Follow—ups Q/A at end Defer additional questions to later, we are short on time Ingest – multiple options, different types of data (rdbms, streams, files) HDF, Sqoop, Flume, Kafka Streaming Script vs UI + Mgmt. Data Movement tool. Streamlined.
  2. https://community.hortonworks.com/content/kbentry/108966/minifi-for-sensor-data-ingest-from-devices.html
  3. https://github.com/USCDataScience/dl4j-kerasimport-examples/tree/master/dl4j-import-example Also: https://github.com/adatao/tensorspark https://arimo.com/machine-learning/deep-learning/2016/arimo-distributed-tensorflow-on-spark/ https://caffe2.ai/docs/AI-Camera-demo-android
  4. https://github.com/USCDataScience/dl4j-kerasimport-examples/tree/master/dl4j-import-example Also: https://github.com/adatao/tensorspark https://arimo.com/machine-learning/deep-learning/2016/arimo-distributed-tensorflow-on-spark/ https://caffe2.ai/docs/AI-Camera-demo-android
  5. Monitor Time Follow—ups Q/A at end Defer additional questions to later, we are short on time Ingest – multiple options, different types of data (rdbms, streams, files) HDF, Sqoop, Flume, Kafka Streaming Script vs UI + Mgmt. Data Movement tool. Streamlined.
  6. Monitor Time Follow—ups Q/A at end Defer additional questions to later, we are short on time Ingest – multiple options, different types of data (rdbms, streams, files) HDF, Sqoop, Flume, Kafka Streaming Script vs UI + Mgmt. Data Movement tool. Streamlined.
  7. Monitor Time Follow—ups Q/A at end Defer additional questions to later, we are short on time Ingest – multiple options, different types of data (rdbms, streams, files) HDF, Sqoop, Flume, Kafka Streaming Script vs UI + Mgmt. Data Movement tool. Streamlined.
  8. Monitor Time Follow—ups Q/A at end Defer additional questions to later, we are short on time Ingest – multiple options, different types of data (rdbms, streams, files) HDF, Sqoop, Flume, Kafka Streaming Script vs UI + Mgmt. Data Movement tool. Streamlined.