Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Hadoop summit 2016

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 18 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Anuncio

Similares a Hadoop summit 2016 (20)

Más de Adam Gibson (18)

Anuncio

Más reciente (20)

Hadoop summit 2016

  1. 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning using Spark and DL4J for fun and profit Adam Gibson and Dhruv Kumar 2015 Version 1.0
  2. 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Who are we? Adam Gibson - Co founder of Skymind - Wrote DeepLearning4J, ND4J Dhruv Kumar - Sr Solutions Architect, HWX - MS Umass, Mahout, ASF
  3. 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved In this talk - What’s Deep Learning? - Architectures - Implementation and Libraries in Real Life - Demo!
  4. 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning • One of the many pattern recognition techniques in Data Science • Excels at rich media applications: • Image recognition • Speech translation • Voice recognition • Loosely inspired by human brain models • Synonymous with Artificial Neural Networks, Multi Layer Networks
  5. 5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Enterprise use cases
  6. 6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Doing this in real life for enterprise
  7. 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP FOR DATA AT REST HDF FOR DATA IN MOTION ACTIONABLE INTELLIGENCE MODERN DATA APPS Modern Data Applications in Enterprise: Connected, Fast, Intelligent PERISHABLE INSIGHTS HISTORICAL INSIGHTS INTERNET OF ANYTHING
  8. 8. How do we realize MDA in a Hadoop Centric World? HDF Hadoop HDFS HBase Hive SOLR YARN Storm Service Management / Workflow SIEM Spark Raw Network Stream Network Metadata Stream Data Stores Syslog Raw Application Logs Other Streaming Telemetry
  9. 9. www.hortonworks.com NiFi 1 NiFi 2 Storm 1 Kafka 1 Storm 2 Kafka 2 Storm 3 Kafka 3 DataNode 1 HBase 1 Source 1 Source 2 Source 3 Source N NiFi Nodes Edge Nodes Master NodesClients 1 Clients 2 DataNode 2 Hbase 2 DataNode 3 Hbase 3 DataNode 4 Hbase 4 DataNode 5 Hbase 5 DataNode 6 Hbase 6 DataNode 7 Hbase 7 DataNode 8 Hbase 8 DataNode 9 DataNode 10 DataNode 31 DataNode 32 Master 1 Master 2 Master 3 Master 4 Master 5 Worker Nodes HDF HDP World Azure
  10. 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storm/Spark Streaming Storm Detailed Reference Architecture HDF Flume Sink to HDFS Transform Interactive UI Framework Hive Hive HDFS HDFS SOURCE DATA Server logs Application Logs Firewall Logs CRM/ERP Sensor Kafka Kafka Stream to HDF Forward to Storm Real Time Storage Spark-ML Pig Alerts Bolt to HDFS Dashboard Silk JMS Alerts Hive Server HiveServer Reporting BI Tools High Speed Ingest Real-Time Batch Interactive Machine Learning Models Spark Pig Alerts SQOOP Flume Iterative ML Hbase/Pheonix HBaseEvent Enrichment Spark-Thrift Pig
  11. 11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved For Model Building: Typical Workflow 11 1.Ingest training data and store it 2.Split data set into: training, testing and validation sets 3.Vectorize and extract features to go into next step 4.Architect multi layer network, initialize 5.Feed data and train 6.Test and Validate 7.Repeat steps 4 and 5 until desired 8.Store model 9.Put model in app, start generalizing on real data.
  12. 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So what do you get? 12 1.Ingest training data and store it using Nifi or other ingest tools 2.Split data set into: training, testing and validation sets 3.Vectorize and extract features to go into next step 4.Architect multi layer network, initialize 5.Feed data and train 6.Test and Validate 7.Repeat steps 4 and 5 until desired 8.Store model 9.Put model in app, start generalizing on real data. Steps 2, 3, 4 and 5: Use libraries such as Deeplearning4j
  13. 13. Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deeplearning4j Architecture 13
  14. 14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DL4J: Canova for Vectorization and Ingest • Canova uses an input/output format system (similar to how Hadoop uses MapReduce) • Supports all major types of input data (text, CSV, audio, image and video) • Can be extended for specialized input formats • Connects to Kafka 14
  15. 15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ND4J: • N-dimensional vector library • Scientific computing for JVM • DL4J uses it to do linear algebra for backpropagation • Supports GPUs via CUDA and Native via Jblas • Deploys on Android • DL4J code remains unchanged whether using GPU or CPU 15
  16. 16. Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 16 How to chose a Neural Net in DL4J core?
  17. 17. Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Demo!
  18. 18. Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thank You hortonworks.com

Notas del editor

  • TALK TRACK
    I’m about to go over the products, consulting and training that Hortonworks offers, and I want you to keep this image in mind.

    Remember:
    The Internet of Anything is doubling the amount of data in the world every 2 years.
    Connected Data Platforms deliver an open-architected solution to manage data, both in motion and at rest, empowering your organization to gain Actionable Intelligence delivered to your end users through Modern Data Apps.
    Hortonworks DataFlow (aka HDF) manages your data in motion—bringing it to where you need it for real-time analysis to capture perishable insights or into storage for historical analysis.
    Hortonworks Data Platform (aka HDP) stores the data at rest and provides historical insights through deep, detailed analysis of everything that’s already happened.
    Those historical insights from HDP help optimize your data ingest with HDF, which in turn optimizes your data at rest.
    This is how HDF, HDP, and Modern Data Applications deliver actionable intelligence to your end users.
    And Actionable Intelligence is the beating heart animating the Future of Data.

    [NEXT SLIDE]
  • CapOne – Ingesting from everywhere

    Email, Syslog, Applog, Netflow…

    Moving to “Cloud Only model”….even looking to use “docker Containers” in Amazon…
  • The team puts together a detailed architecture of the proposed solution using HDP and HDF. The architecture considers sources data from the numerous sources including Server Logs, Application Logs, XML and Senso data. This data is easily accepted into the flexible schema of HDP using HDF and Sqoop. The data is processed using Pig and analyzed using Spark. Then the data is made available in a real-time dashboard as well as to visualization and reporting tools.

    [NEXT SLIDE]

×