SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
1 © Hortonworks Inc. 2011–2018. All rights reserved.
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Machine Learning and Deep Learning
on HDP 3.0.1 and HDF 3.2
Timothy Spann, Senior Solutions Engineer
Hortonworks @PaaSDev
November 14, 2018
Future of Data –
Princeton
2 © Hortonworks Inc. 2011–2018. All rights reserved.
Disclaimer
• This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
developed.
• Technical feasibility, market demand, user feedback, and the Apache Software
Foundation community development process can all effect timing and final delivery.
• This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to deliver
these features in any generally available product.
• Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
• Since this document contains an outline of general product development plans,
customers should not rely upon it when making a purchase decision.
3 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Deep Learning Flow
4 © Hortonworks Inc. 2011–2018. All rights reserved.
Global Data Management With Hortonworks
Globally Manage, Secure, Govern, Consume
DATAPLANE SERVICE (DPS)
MANAGE, GOVERN, SECURE
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
ISV
SERVICES
EXTENSIBLE SERVICES
IBM DSXCLOUD-
BREAK
DATA
ANALYTICS
STUDIO
CONNECTED DATA PLATFORMS
HORTONWORKS DATA PLATFORM
(HDP®) DATA-AT-REST
HORTONWORKS DATAFLOW
(HDF™) DATA-IN-MOTION
MODERN DATA USE CASES
EDW
OPTIMIZATION
CYBERSECURITY DATA SCIENCE
ADVANCED
ANALYTICS
IOT/ STREAMING
ANALYTICS
HORTONWORKS
CONNECTION
ENTERPRISE SUPPORT
PREMIER SUPPORT
EDUCATIONAL SERVICES
PROFESSIONAL SERVICES
COMMUNITY CONNECTION
HORTONWORKS
PLATFORM SERVICES
OPERATIONAL SERVICES
SMARTSENSE™
DATA
SOURCES
DATA CENTER CLOUD EDGE
Exception
Monitoring
360 View of
Operations
Cyber
Security
Telemetry –
Connected
Devices
Time Series
Sensors,
Control
Systems
5 © Hortonworks Inc. 2011–2018. All rights reserved.
NiFi - Terminology
• FlowFile
• Unit of data moving through the system
• Content + Attributes (key/value pairs)
• Processor
• Performs the work, can access FlowFiles
• Connection
• Links between processors
• Queues that can be dynamically prioritized
• Process Group
• Set of processors and their connections
• Receive data via input ports, send data via output ports
6 © Hortonworks Inc. 2011–2018. All rights reserved.
NiFi is based on Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing, transformation,
or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages the
threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send data via
ports. A process group allows creation of entirely new component simply by
composition of its components.
7 © Hortonworks Inc. 2011–2018. All rights reserved.
Visual Command and Control
• Drag and drop processors to build a flow
• Start, stop, and configure components in real time
• View errors and corresponding error messages
• View statistics and health of data flow
• Create templates of common processor & connections
8 © Hortonworks Inc. 2011–2018. All rights reserved.
Provenance/Lineage
9 © Hortonworks Inc. 2011–2018. All rights reserved.
Prioritization
• Configure a prioritizer per
connection
• Determine what is important for
your data – time based, arrival
order, importance of a data set
• Funnel many connections down to
a single connection to prioritize
across data sets
• Develop your own prioritizer if
needed
10 © Hortonworks Inc. 2011–2018. All rights reserved.
11 © Hortonworks Inc. 2011–2018. All rights reserved.
Latency vs. Throughput
• Choose between lower latency, or higher throughput on each processor
12 © Hortonworks Inc. 2011–2018. All rights reserved.
NiFi UI
13 © Hortonworks Inc. 2011–2018. All rights reserved.
Edge Intelligence with Apache MiNiFi
à Guaranteed delivery
à Data buffering
‒ Backpressure
‒ Pressure release
à Prioritized queuing
à Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
à Data provenance
à Recovery / recording a rolling log
of fine-grained history
à Designed for extension
Different from Apache NiFi
à Design and Deploy
à Warm re-deploys
Key Features
14 © Hortonworks Inc. 2011–2018. All rights reserved.
Integrating TensorFlow with Streaming
https://community.hortonworks.com/articles/198855/executing-tensorflow-classifications-from-apache-n.html
https://community.hortonworks.com/articles/116803/building-a-custom-processor-in-apache-nifi-12-for.html
https://community.hortonworks.com/articles/224268/running-tensorflow-on-yarn-31-with-or-without-gpu.html
https://community.hortonworks.com/articles/183806/using-a-tensorflow-person-blocker-with-apache-nifi.html
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
http://mxnet.incubator.apache.org/
• Cloud ready
• Experienced team (XGBoost)
• AWS, Microsoft, NVIDIA, Baidu, Intel backing
• Apache Incubator Project
• Run distributed on YARN
• In my early tests, faster than TensorFlow.
• Runs on Raspberry PI, NVidia Jetson TX1
and other constrained devices
• Great documentation
• Gluon
• Great Python Interaction
• Model Server Available
• ONNX Support
• Now in Version 1.1!
• Great Model Zoo
https://mxnet.incubator.apache.org/how_to/cloud.html
https://github.com/apache/incubator-mxnet/tree/1.1.0/example
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
• Apache MXNet Running in Apache Zeppelin Notebooks
• Apache MXNet Running on YARN 3.1 In Hadoop 3.1 In Dockerized Containers
• Apache MXNet Running on YARN
Apache NiFi Integration with Apache Hadoop Options
https://community.hortonworks.com/articles/176789/apache-deep-learning-101-using-apache-mxnet-in-apa.html
https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-mxnet-on-apa.html
https://www.slideshare.net/Hadoop_Summit/deep-learning-on-yarn-running-distributed-tensorflow-etc-on-hadoop-cluster-v3
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Instance Segmentation: Mask RCNN with GluonCV
net = model_zoo.get_model('mask_rcnn_resnet50_v1b_coco', pretrained=True)
Mask RCNN model trained on COCO dataset with ResNet-50 backbone
https://gluon-cv.mxnet.io/build/examples_instance/demo_mask_rcnn.html
https://arxiv.org/abs/1703.06870
https://github.com/matterport/Mask_RCNN
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Semantic Segmentation: Fully Convolutional Networks
model = gluoncv.model_zoo.get_model(‘fcn_resnet101_voc ', pretrained=True)
GluonCV FCN model on PASCAL VOC dataset
https://gluon-cv.mxnet.io/build/examples_segmentation/demo_fcn.html
run1.sh demo_fcn_webcam.py
https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Introducing HDP 3.0
FASTER
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Faster Time to Deployment—
Containerization
Why containerization?
à Overcomes limits of data architecture
à Allows for agility and elasticity to process data
à Developers can build data intensive apps
quickly
à Ensure apps deploy quickly, reliably and
consistently across deployment environments
Result: Faster time to deployment and increased
developer productivity -> competitive advantage
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Easy to create and access a
containerized HBase service
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
New Abstraction: YARN Container Runtimes
• Challenge: Run existing process container in the same cluster as Docker containers
• Solution: Container Runtimes – specify the container runtime to use at application
submission time.
DefaultLinuxContainerRuntime DockerLinuxContainerRuntime
Existing Linux process-
based execution.
Using Docker to run and
monitor a container.
Early versions shipped in Apache Hadoop 2.8
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Key Improvements in HDP 3.0
• Improved container lifecycle management – reliably run, stop, and remove Docker containers
• Delayed deletion of exited containers for debugging
• ACLs for privileged containers, with the ability to disable privileged containers system wide
• Default untrusted mode for running unmodified images out of the box
• Support for bind mounting host files into containers, validated against an admin supplied whitelist
• Ambari Integration for configuring YARN containerization features and security
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Use Case—Cloud Portability with Containers
Any company that wants faster time to deployment
• Containerization helps companies
• Move apps from on-prem to cloud, or between cloud environments
• Deploy apps quickly
• Helpful for migrating to cloud or looking to adopt a hybrid cloud strategy
Result: Maximizes portability: self-contained runtime environments can be taken
anywhere
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Introducing HDP 3.0
SMARTER
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Smarter Decisions Made Based on
Support for Deep Learning Workloads
Why GPU support?
à Enhances the performance of computations
needed for enterprise ML/DL apps
à DL requires intense computational algorithms
à Containerized software powered by GPUs helps
data processing at scale
Result: Data Scientists can run DL models in days
vs months, hours vs days, minutes vs hours
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deploy
Capture Billions of images in
data lake in Core
Pool GPUs and CPUs
- think a giant super computer
for 100x faster processing
Deploy data intensive containerized
deep learning micro-services in minutes
Train deep learning models using
GPUs & images in data lake
Edge
Nvidia Drive PX 2
Use Case — Autonomous Driving Car
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Easy to enable GPUs
with a sliderEnable GPUs
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Use Case - Manufacturing
Machine Learning & Deep Learning Using GPUs
à Monitor factory equipment in real-time using ML/DL apps
à Capture sensor data including temperature, vibration, internal pressures to
perform preventative maintenance
à Minimize costly downtime on machinery before problems occur
Result: Improve bottom line - less downtime
30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Science and In-Memory
Performance Improvements
Securely & seamlessly integrate with
other services including Ranger & Atlas
TensorFlow Tech Preview will complement GPU
pooling to support deep learning use cases
Spark testing with S3Guard to support cloud
Spark/Hive integration to connect easily
to the cloud
31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Enable TensorFlow
TensorFlow training metrics &
TensorFlow on YARN
32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache MXNet on Apache YARN 3.1 Native No Spark
yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-
distributedshell.jar -jar /usr/hdp/current/hadoop-yarn-client/hadoop-
yarn-applications-distributedshell.jar -shell_command python3.6 -
shell_args "/opt/demo/analyzex.py /opt/images/cat.jpg" -
container_resources memory-mb=512,vcores=1
Uses: Python Any
33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache MXNet on Apache YARN 3.1 Native No Spark
https://community.hortonworks.com/content/kbentry/222242/running-apache-mxnet-deep-learning-on-yarn-31-
hdp.html
https://github.com/tspannhw/ApacheDeepLearning101/blob/master/analyzehdfs.py
34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache MXNet on YARN 3.2 in Docker Using “Submarine”
https://github.com/apache/hadoop/tree/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
yarn jar hadoop-yarn-applications-submarine-<version>.jar job run 
--name xyz-job-001 --docker_image <your docker image> 
--input_path hdfs://default/dataset/cifar-10-data 
--checkpoint_path hdfs://default/tmp/cifar-10-jobdir 
--num_workers 1 
--worker_resources memory=8G,vcores=2,gpu=2 
--worker_launch_cmd "shell for Apache MXNet"
Wangda Tan (wangda@apache.org)
Hadoop {Submarine} Project: Running deep learning workloads on YARN
https://issues.apache.org/jira/browse/YARN-8135
35 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Introducing HDP 3.0
HYBRID DATA
Cloud Native
36 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
A hybrid architecture is a key requirement of a modern data architecture,
and is composed of on prem + multi cloud + edge
- DATA AT R EST + DATA IN M OT IO N
- M A NAG ES T H E ENT IR E LIF ECYCLE O F DATA
- S PA NS ACRO S S O N P R EM IS ES , CLO U D A ND M U LT I - CLO U D
- P RO CES S A ND DR IV ES INS IGH T
- CO NS IST ENT S ECU R IT Y, G OV ER NA NCE A ND O P ER AT IO NS
A M O D E R N D ATA A R C H I T E C T U R E requires
37 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
SOLUTION
M O D E R N H Y B R I D D ATA A R C H I T E C T U R E
Cloud-native Data Architecture
Extend to The Edge
Seamless Architecture
Consistent Security and Governance
HORTONWORKS DATA PLATFORM
HORTONWORKS DATAFLOW
HORTONWORKS DATAPLANE
NEW, OPEN HYBRID ARCHITECTURE
INITIATIVE
CORE REQUIREMENTS
38 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Focus on extending data science and machine learning to
analyze the data in Apache Hadoop systems
Provides Data
Science & Machine
Learning
Provides Open
Hadoop Data
Platform
Make our clients competitive in their markets using
advanced analytics faster and at scale
+
... Deliver Data Science at Scale
Stronger Together
39 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
• Top Technology
– Top Hadoop Engine
– Top SQL on Hadoop Engine
– Top Data Science Platform
• Open Source Approach Leads to Future Integration
– 100% Pure Open Source Hadoop Distribution
– Big SQL Maintains the Integrity of Open Source Hadoop
– Data Science leverages Open Source Analytics
Solve Real Business Problems
Provides Best in Class Technology
Gives Clients Freedom Today and Innovation Tomorrow
Together we Lead
40 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDP Technical Differentiators
• Cloudbreak: Deployment made easy – USABILITY
• Elasticity with long running
• Support GCP, AWS, Azure AND Private cloud (OpenStack)
Hive 3.0
• Support for TEZ, LLAP and Druid Integration
•Allow to do ACID – PERFORMANCE is there
•Complete coverage of ANSI SQL 2011 – SIMPLIFY ETL development
•Competition use Hive 2.x, and support only HIVE on Sparks and does not allow TEZ
• Knox: SSO across ecosystem - SIMPLIFY security management
• Ranger: Attribute based tagging – SIMPLIFY security deployment
• Complete platform integration: HDFS, Yarn, Hive, Hbase, Storm, Atlas, Kafka
Security
Hybrid Cloud deployment
41 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDP Technical Differentiators
• Full support for Docker Containers in Yarn - INTELLIGENT
• Make the most of your applications with GPU support!
Hadoop 3.1
• Zeppelin: Powerful notebook made availaible with HDP
• No need for an advanced science subscription
• Broad Partner integration
• TensorFlow can run on the cluster (Docker)
• IBM Watson are market leader
Data Science
42 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDF Technical Differentiators
• SAM: UI for Storm – USABILITY improvement
• Druid: OLAP cube – REAL TIME analytics
Data Ingestion
• NiFi: Fully Open Source project
• Very mature (est. 2006 - NSA) - USABILITY
• Home grown – no OEM
• Edge deployment via MiniFi - PORTABILITY
• Schema registry: Ability to apply same schema to NiFi, Kafka, Storm, SAM - PRODUCTIVITY
• Solve Kafka blindness with SMM!
• Single Monitoring Dashboard for all your Kafka Clusters across 4 entities:
Stream Processing
Streaming Analytics
Broker Producer Topic Consumer
43 © Hortonworks Inc. 2011–2018. All rights reserved.
HCC – community.hortonworks.com
• Full Q&A Platform (like StackOverflow)
• Knowledge Base Articles
• Code Gallery and Samples
Read access for everyone, join to participate and be recognized
44 © Hortonworks Inc. 2011–2018. All rights reserved.
Community Engagement
9,000+
Registered Users
25,000+
Answers
40,000+
Technical Assets
One Website!
https://community.hortonworks.com
45 © Hortonworks Inc. 2011–2018. All rights reserved.
Thanks!!!
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://twitter.com/PaaSDev
https://github.com/tspannhw/ApacheDeepLearning201
https://www.youtube.com/watch?v=ksDKNp6Z4BE
https://community.hortonworks.com/articles/193835/detecting-language-with-apache-nifi.html
https://community.hortonworks.com/content/kbentry/189213/etl-with-lookups-with-apache-hbase-and-apache-nifi.html

Más contenido relacionado

Más de Timothy Spann

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsTimothy Spann
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101Timothy Spann
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
 
CoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfTimothy Spann
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationTimothy Spann
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 

Más de Timothy Spann (20)

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
 
CoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 

Último

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Último (20)

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 

Machine Learning and Deep learning on HDP 3.0.1 and HDF 3.2

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved. © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Machine Learning and Deep Learning on HDP 3.0.1 and HDF 3.2 Timothy Spann, Senior Solutions Engineer Hortonworks @PaaSDev November 14, 2018 Future of Data – Princeton
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved. Disclaimer • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. • Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Deep Learning Flow
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved. Global Data Management With Hortonworks Globally Manage, Secure, Govern, Consume DATAPLANE SERVICE (DPS) MANAGE, GOVERN, SECURE DATA LIFECYCLE MANAGER DATA STEWARD STUDIO ISV SERVICES EXTENSIBLE SERVICES IBM DSXCLOUD- BREAK DATA ANALYTICS STUDIO CONNECTED DATA PLATFORMS HORTONWORKS DATA PLATFORM (HDP®) DATA-AT-REST HORTONWORKS DATAFLOW (HDF™) DATA-IN-MOTION MODERN DATA USE CASES EDW OPTIMIZATION CYBERSECURITY DATA SCIENCE ADVANCED ANALYTICS IOT/ STREAMING ANALYTICS HORTONWORKS CONNECTION ENTERPRISE SUPPORT PREMIER SUPPORT EDUCATIONAL SERVICES PROFESSIONAL SERVICES COMMUNITY CONNECTION HORTONWORKS PLATFORM SERVICES OPERATIONAL SERVICES SMARTSENSE™ DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring 360 View of Operations Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved. NiFi - Terminology • FlowFile • Unit of data moving through the system • Content + Attributes (key/value pairs) • Processor • Performs the work, can access FlowFiles • Connection • Links between processors • Queues that can be dynamically prioritized • Process Group • Set of processors and their connections • Receive data via input ports, send data via output ports
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved. NiFi is based on Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. Visual Command and Control • Drag and drop processors to build a flow • Start, stop, and configure components in real time • View errors and corresponding error messages • View statistics and health of data flow • Create templates of common processor & connections
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved. Provenance/Lineage
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved. Prioritization • Configure a prioritizer per connection • Determine what is important for your data – time based, arrival order, importance of a data set • Funnel many connections down to a single connection to prioritize across data sets • Develop your own prioritizer if needed
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved.
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved. Latency vs. Throughput • Choose between lower latency, or higher throughput on each processor
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved. NiFi UI
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved. Edge Intelligence with Apache MiNiFi à Guaranteed delivery à Data buffering ‒ Backpressure ‒ Pressure release à Prioritized queuing à Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance à Data provenance à Recovery / recording a rolling log of fine-grained history à Designed for extension Different from Apache NiFi à Design and Deploy à Warm re-deploys Key Features
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved. Integrating TensorFlow with Streaming https://community.hortonworks.com/articles/198855/executing-tensorflow-classifications-from-apache-n.html https://community.hortonworks.com/articles/116803/building-a-custom-processor-in-apache-nifi-12-for.html https://community.hortonworks.com/articles/224268/running-tensorflow-on-yarn-31-with-or-without-gpu.html https://community.hortonworks.com/articles/183806/using-a-tensorflow-person-blocker-with-apache-nifi.html
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved http://mxnet.incubator.apache.org/ • Cloud ready • Experienced team (XGBoost) • AWS, Microsoft, NVIDIA, Baidu, Intel backing • Apache Incubator Project • Run distributed on YARN • In my early tests, faster than TensorFlow. • Runs on Raspberry PI, NVidia Jetson TX1 and other constrained devices • Great documentation • Gluon • Great Python Interaction • Model Server Available • ONNX Support • Now in Version 1.1! • Great Model Zoo https://mxnet.incubator.apache.org/how_to/cloud.html https://github.com/apache/incubator-mxnet/tree/1.1.0/example
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved • Apache MXNet Running in Apache Zeppelin Notebooks • Apache MXNet Running on YARN 3.1 In Hadoop 3.1 In Dockerized Containers • Apache MXNet Running on YARN Apache NiFi Integration with Apache Hadoop Options https://community.hortonworks.com/articles/176789/apache-deep-learning-101-using-apache-mxnet-in-apa.html https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-mxnet-on-apa.html https://www.slideshare.net/Hadoop_Summit/deep-learning-on-yarn-running-distributed-tensorflow-etc-on-hadoop-cluster-v3
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Instance Segmentation: Mask RCNN with GluonCV net = model_zoo.get_model('mask_rcnn_resnet50_v1b_coco', pretrained=True) Mask RCNN model trained on COCO dataset with ResNet-50 backbone https://gluon-cv.mxnet.io/build/examples_instance/demo_mask_rcnn.html https://arxiv.org/abs/1703.06870 https://github.com/matterport/Mask_RCNN
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Semantic Segmentation: Fully Convolutional Networks model = gluoncv.model_zoo.get_model(‘fcn_resnet101_voc ', pretrained=True) GluonCV FCN model on PASCAL VOC dataset https://gluon-cv.mxnet.io/build/examples_segmentation/demo_fcn.html run1.sh demo_fcn_webcam.py https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Introducing HDP 3.0 FASTER
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Faster Time to Deployment— Containerization Why containerization? à Overcomes limits of data architecture à Allows for agility and elasticity to process data à Developers can build data intensive apps quickly à Ensure apps deploy quickly, reliably and consistently across deployment environments Result: Faster time to deployment and increased developer productivity -> competitive advantage
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Easy to create and access a containerized HBase service
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved New Abstraction: YARN Container Runtimes • Challenge: Run existing process container in the same cluster as Docker containers • Solution: Container Runtimes – specify the container runtime to use at application submission time. DefaultLinuxContainerRuntime DockerLinuxContainerRuntime Existing Linux process- based execution. Using Docker to run and monitor a container. Early versions shipped in Apache Hadoop 2.8
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Key Improvements in HDP 3.0 • Improved container lifecycle management – reliably run, stop, and remove Docker containers • Delayed deletion of exited containers for debugging • ACLs for privileged containers, with the ability to disable privileged containers system wide • Default untrusted mode for running unmodified images out of the box • Support for bind mounting host files into containers, validated against an admin supplied whitelist • Ambari Integration for configuring YARN containerization features and security
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Use Case—Cloud Portability with Containers Any company that wants faster time to deployment • Containerization helps companies • Move apps from on-prem to cloud, or between cloud environments • Deploy apps quickly • Helpful for migrating to cloud or looking to adopt a hybrid cloud strategy Result: Maximizes portability: self-contained runtime environments can be taken anywhere
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Introducing HDP 3.0 SMARTER
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Smarter Decisions Made Based on Support for Deep Learning Workloads Why GPU support? à Enhances the performance of computations needed for enterprise ML/DL apps à DL requires intense computational algorithms à Containerized software powered by GPUs helps data processing at scale Result: Data Scientists can run DL models in days vs months, hours vs days, minutes vs hours
  • 27. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Deploy Capture Billions of images in data lake in Core Pool GPUs and CPUs - think a giant super computer for 100x faster processing Deploy data intensive containerized deep learning micro-services in minutes Train deep learning models using GPUs & images in data lake Edge Nvidia Drive PX 2 Use Case — Autonomous Driving Car
  • 28. 28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Easy to enable GPUs with a sliderEnable GPUs
  • 29. 29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Use Case - Manufacturing Machine Learning & Deep Learning Using GPUs à Monitor factory equipment in real-time using ML/DL apps à Capture sensor data including temperature, vibration, internal pressures to perform preventative maintenance à Minimize costly downtime on machinery before problems occur Result: Improve bottom line - less downtime
  • 30. 30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Data Science and In-Memory Performance Improvements Securely & seamlessly integrate with other services including Ranger & Atlas TensorFlow Tech Preview will complement GPU pooling to support deep learning use cases Spark testing with S3Guard to support cloud Spark/Hive integration to connect easily to the cloud
  • 31. 31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Enable TensorFlow TensorFlow training metrics & TensorFlow on YARN
  • 32. 32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache MXNet on Apache YARN 3.1 Native No Spark yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications- distributedshell.jar -jar /usr/hdp/current/hadoop-yarn-client/hadoop- yarn-applications-distributedshell.jar -shell_command python3.6 - shell_args "/opt/demo/analyzex.py /opt/images/cat.jpg" - container_resources memory-mb=512,vcores=1 Uses: Python Any
  • 33. 33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache MXNet on Apache YARN 3.1 Native No Spark https://community.hortonworks.com/content/kbentry/222242/running-apache-mxnet-deep-learning-on-yarn-31- hdp.html https://github.com/tspannhw/ApacheDeepLearning101/blob/master/analyzehdfs.py
  • 34. 34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache MXNet on YARN 3.2 in Docker Using “Submarine” https://github.com/apache/hadoop/tree/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine yarn jar hadoop-yarn-applications-submarine-<version>.jar job run --name xyz-job-001 --docker_image <your docker image> --input_path hdfs://default/dataset/cifar-10-data --checkpoint_path hdfs://default/tmp/cifar-10-jobdir --num_workers 1 --worker_resources memory=8G,vcores=2,gpu=2 --worker_launch_cmd "shell for Apache MXNet" Wangda Tan (wangda@apache.org) Hadoop {Submarine} Project: Running deep learning workloads on YARN https://issues.apache.org/jira/browse/YARN-8135
  • 35. 35 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Introducing HDP 3.0 HYBRID DATA Cloud Native
  • 36. 36 © Hortonworks Inc. 2011 – 2017. All Rights Reserved A hybrid architecture is a key requirement of a modern data architecture, and is composed of on prem + multi cloud + edge - DATA AT R EST + DATA IN M OT IO N - M A NAG ES T H E ENT IR E LIF ECYCLE O F DATA - S PA NS ACRO S S O N P R EM IS ES , CLO U D A ND M U LT I - CLO U D - P RO CES S A ND DR IV ES INS IGH T - CO NS IST ENT S ECU R IT Y, G OV ER NA NCE A ND O P ER AT IO NS A M O D E R N D ATA A R C H I T E C T U R E requires
  • 37. 37 © Hortonworks Inc. 2011 – 2017. All Rights Reserved SOLUTION M O D E R N H Y B R I D D ATA A R C H I T E C T U R E Cloud-native Data Architecture Extend to The Edge Seamless Architecture Consistent Security and Governance HORTONWORKS DATA PLATFORM HORTONWORKS DATAFLOW HORTONWORKS DATAPLANE NEW, OPEN HYBRID ARCHITECTURE INITIATIVE CORE REQUIREMENTS
  • 38. 38 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Focus on extending data science and machine learning to analyze the data in Apache Hadoop systems Provides Data Science & Machine Learning Provides Open Hadoop Data Platform Make our clients competitive in their markets using advanced analytics faster and at scale + ... Deliver Data Science at Scale Stronger Together
  • 39. 39 © Hortonworks Inc. 2011 – 2017. All Rights Reserved • Top Technology – Top Hadoop Engine – Top SQL on Hadoop Engine – Top Data Science Platform • Open Source Approach Leads to Future Integration – 100% Pure Open Source Hadoop Distribution – Big SQL Maintains the Integrity of Open Source Hadoop – Data Science leverages Open Source Analytics Solve Real Business Problems Provides Best in Class Technology Gives Clients Freedom Today and Innovation Tomorrow Together we Lead
  • 40. 40 © Hortonworks Inc. 2011 – 2017. All Rights Reserved HDP Technical Differentiators • Cloudbreak: Deployment made easy – USABILITY • Elasticity with long running • Support GCP, AWS, Azure AND Private cloud (OpenStack) Hive 3.0 • Support for TEZ, LLAP and Druid Integration •Allow to do ACID – PERFORMANCE is there •Complete coverage of ANSI SQL 2011 – SIMPLIFY ETL development •Competition use Hive 2.x, and support only HIVE on Sparks and does not allow TEZ • Knox: SSO across ecosystem - SIMPLIFY security management • Ranger: Attribute based tagging – SIMPLIFY security deployment • Complete platform integration: HDFS, Yarn, Hive, Hbase, Storm, Atlas, Kafka Security Hybrid Cloud deployment
  • 41. 41 © Hortonworks Inc. 2011 – 2017. All Rights Reserved HDP Technical Differentiators • Full support for Docker Containers in Yarn - INTELLIGENT • Make the most of your applications with GPU support! Hadoop 3.1 • Zeppelin: Powerful notebook made availaible with HDP • No need for an advanced science subscription • Broad Partner integration • TensorFlow can run on the cluster (Docker) • IBM Watson are market leader Data Science
  • 42. 42 © Hortonworks Inc. 2011 – 2017. All Rights Reserved HDF Technical Differentiators • SAM: UI for Storm – USABILITY improvement • Druid: OLAP cube – REAL TIME analytics Data Ingestion • NiFi: Fully Open Source project • Very mature (est. 2006 - NSA) - USABILITY • Home grown – no OEM • Edge deployment via MiniFi - PORTABILITY • Schema registry: Ability to apply same schema to NiFi, Kafka, Storm, SAM - PRODUCTIVITY • Solve Kafka blindness with SMM! • Single Monitoring Dashboard for all your Kafka Clusters across 4 entities: Stream Processing Streaming Analytics Broker Producer Topic Consumer
  • 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved. HCC – community.hortonworks.com • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Gallery and Samples Read access for everyone, join to participate and be recognized
  • 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved. Community Engagement 9,000+ Registered Users 25,000+ Answers 40,000+ Technical Assets One Website! https://community.hortonworks.com
  • 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved. Thanks!!! https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://twitter.com/PaaSDev https://github.com/tspannhw/ApacheDeepLearning201 https://www.youtube.com/watch?v=ksDKNp6Z4BE https://community.hortonworks.com/articles/193835/detecting-language-with-apache-nifi.html https://community.hortonworks.com/content/kbentry/189213/etl-with-lookups-with-apache-hbase-and-apache-nifi.html