Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Big Data and Predictive Analysis

194 visualizaciones

Publicado el

IDEAS Live Webinar 2019
May 4 2019
Jongwook Woo, PhD
Big Data AI Center (BigDAI)
California State University Los Angeles

Publicado en: Datos y análisis

Big Data and Predictive Analysis

  1. 1. Jongwook Woo HiPIC CalStateLA IDEAS Live Webinar 2019 May 4 2019 Jongwook Woo, PhD, jwoo5@calstatela.edu Big Data AI Center (BigDAI) California State University Los Angeles Big Data and Predictive Analysis
  2. 2. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data Predictive Analysis  Summary
  3. 3. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself Experience: Since 2002, Professor at California State University Los Angeles – PhD in 2001: Computer Science and Engineering at USC
  4. 4. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Universities in Los Angeles West North
  5. 5. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Universities in Los Angeles
  6. 6. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA California State University Los Angeles
  7. 7. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: S/W Development Lead http://www.mobygames.com/game/windows/matrix-online/credits
  8. 8. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Isaac Engineering, HDP, CDH, Oracle using Hadoop Big Data https://www.cloudera.com/more/customers/csula.html
  9. 9. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Partners for Services
  10. 10. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Collaborations
  11. 11. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data Predictive Analysis  Summary
  12. 12. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Issues Large-Scale data Tera-Byte (1012), Peta-byte (1015) – Because of web – Sensor Data (IoT), Bioinformatics, Social Computing, Streaming data, smart phone, online game… Legacy approach  Can do – Improve the speed of CPU  Increase the storage size  Only Problem – Too expensive
  13. 13. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Traditional Way
  14. 14. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Traditional Way Becomes too Expensive
  15. 15. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Issues Cannot handle with the legacy approach Too big Non-/Semi-structured data  3 Vs, 4 Vs,… – Velocity, Volume, Variety Traditional Systems can handle them – But Again, Too expensive Need new systems Non-expensive
  16. 16. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Two Cores in Big Data How to store Big Data How to compute Big Data Google How to store Big Data – GFS – Distributed Systems on non-expensive commodity computers How to compute Big Data – MapReduce – Parallel Computing with non-expensive computers Own super computers Published papers in 2003, 2004
  17. 17. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Not Expensive From 2017 Korean Blockbuster Movie, “The Fortress” (남한산성)
  18. 18. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way But Works Well with the crazy massive data set Battle of Nagashino, 1575, Japan
  19. 19. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Need Resource Management
  20. 20. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA What is Hadoop? 20  Apache Hadoop Project in Jan, 2006 split from Nutch  Hadoop Founder: o Doug Cutting  Apache Committer: Lucene, Nutch, …
  21. 21. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Super Computer vs Hadoop vs Cloud Parallel vs. Distributed file systems by Michael Malak Updated by Jongwook Woo Cluster for Store Cluster for Compute/Store Cluster for Compute Cloud Computing adopts this architecture: with High Speed N/W
  22. 22. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Definition: Big Data Non-expensive platform that is distributed parallel systems and that can store a large scale data and process it in parallel [1, 2] Hadoop – Non-expensive Super Computer – More public than the traditional super computers • You can store and process your applications – In your university labs, small companies, research centers Others with storage and computing services – Spark • normally integrated into Hadoop with Hadoop community – NoSQL DB (Cassandra, MongoDB, Redis, Hbase,…) – ElasticSearch
  23. 23. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Data Analysis & Visualization Sentiment Map of Alphago Positive Negative
  24. 24. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA K-Election 2017 (April 29 – May 9)
  25. 25. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Businesses popular in 5 miles of CalStateLA, USC , UCLA
  26. 26. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Jams and other traffic incidents reported by users in Dec 2017 – Jan 2018: (Dalyapraz Dauletbak)
  27. 27. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data Predictive Analysis  Summary
  28. 28. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Analysis and Prediction Big Data Analysis Hadoop, Spark, NoSQL DB, SAP HANA, ElasticSearch,.. Big Data for Data Analysis – How to store, compute, analyze massive dataset? Big Data Science How to predict the future trend and pattern with the massive dataset? => Machine Learning
  29. 29. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark  Limitation in MapReduce  Hard to program in Java  Batch Processing – Not interactive  Disk storage for intermediate data – Performance issue  Spark by UC Berkley AMP Lab  Started by Matei Zaharia in 2009, – and open sourced in 2010 In-Memory storage for intermediate data  20 ~ 100 times faster than – MapReduce Good in Machine Learning => Big Data Science – Iterative algorithms
  30. 30. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark (Cont’d) Spark ML Supports Machine Learning libraries Process massive data set to build prediction models
  31. 31. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning  Machine Learning  Has been popular since Google Tensorflow  Multiple Cores in GPU – Even with multiple GPUs and CPUs  Parallel Computing  GPU (Nvidia GTX 1660 Ti)  1280 CUDA cores  Deep Learning Libraries  Tensor Flow  PyTorch  Keras  Caffe, Caffe2  Microsoft Cognitive Toolkit (Previously CNTK)  Apache Mxnet  DeepLearning4j  …
  32. 32. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA From Neural Networks to Deep Learning Deep learning – Different types of architectures Generative Adversarial Networks (GAN) Convolutional Neural Networks (CNN) Neural Networks (NN) © 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Recurrent Neural Networks (RNN) & Long-Short Term Memory (LSTM) Ref: SAP Enterprise Deep Learning with TensorFlow
  33. 33. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning CNN Image Recognition Video Analysis  NLP for classification, Prediction RNN Time Series Prediction Speech Recognition/Synthesis Image/Video Captioning Text Analysis – Conversation Q&A GAN  Media Generation – Photo Realistic Images Human Image Synthesis: Fake faces
  34. 34. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning with Spark What if we combine Deep Learning and Spark?
  35. 35. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning with Spark Deep Learning Pipelines for Apache Spark Databricks TensorFlowOnSpark Yahoo! Inc BigDL (Distributed Deep Learning Library for Apache Spark) Intel DL4J (Deeplearning4j On Spark) Skymind Distributed Deep Learning with Keras & Spark Elephas
  36. 36. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data Predictive Analysis: Use Case  Summary
  37. 37. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Use Case in Spark  “Predicting AD click fraud using Azure and Spark ML”, Accepted at The 14th Asia Pacific International Conference on Information Science and Technology (APIC-IST 2019), June 23-26 2019, Beijing, China – By Neha Gupta, Hai Anh Le, Maria Boldina  Machine Learning Distributed Parallel Computing – using Spark with Hadoop and Cloud Computing Not Deep Learning
  38. 38. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Ad Click Fraud A person, automated script or computer program imitates a legitimate user clicking on an ad without having an actual interest in the target of the ad's link resulting in misleading click data and wasted money Companies suffers from huge volumes of fraudulent traffic Especially, in mobile market in the world Goal Predict who will download the apps Using Classification model Traditional and Big Data approach
  39. 39. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Ad Click Fraud (Cont’d) TalkingData  China’s largest independent big data service platform – covers over 70% of active mobile devices nationwide  handles 3 billion clicks per day – 90% of which are potentially fraudulent  Goal of the Predictive Analysis  Predict whether a user will download an app – after clicking on a mobile app advertisement  To better target the audience, – to avoid fraudulent practices – and save money
  40. 40. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Set  Dataset: TalkingData AdTracking Fraud Detection https://www.kaggle.com/c/talkingdata-adtracking-fraud- detection/data Dataset Property: Original dataset size: 7GB – contains 200 million clicks over 4 day period Dataset format: csv Fields: 8 – Target Column to Predict: ‘is_attributed’
  41. 41. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Set Details
  42. 42. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experiment Environment: Traditional and Big Data Systems
  43. 43. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experiment Environment: Traditional Azure ML Studio: Traditional for small data set Free Workspace 10GB storage Single node Implement fundamental prediction models – Using Sample data: 80MB (1.1% of the original data set) Select the best model among number of classifications
  44. 44. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experiment Environment: Spark Spark ML: Data Filtering: – 1 GB from 8 GB • Implemented Python code to reduce size to 1GB (15%) – We have experimental result with 8GB as well • For another publication Databricks Subscription – Cluster 4.0 (includes Apache Spark 2.3.0, Scala 2.11) • 2 Spark Workers with total of 16 GB Memory and 4 Cores • Python 2.7 • File System : Databricks File System
  45. 45. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experiment Environment: Spark (Cont’d) Oracle Big Data Spark Cluster  Oracle BDCE Python 2.7.x, Spark 2.1.x  10 nodes, – 20 OCPUs, 300GB Memory, 1,154GB Storage
  46. 46. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Work Flow in Azure ML  Relatively Easy to build and test Drag and Drop GUI Work Flow 1. Data Engineering – Understanding Data – Data preparation – Balancing data statistically 2. Data Science: Machine Learning (ML) – Model building and validation • Classification algorithms – Model evaluation – Model interpretation
  47. 47. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Engineering Unbalanced dataset 1: 0.19% App downloaded 0: 99.81% App not downloaded 1GB filtered dataset still too large for the traditional systems: Azure ML Studio More sampling needed for Azure ML
  48. 48. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Engineering  SMOTE: Synthetic Minority Over Sampling Technique takes a subset of data from the minority class and creates new synthetic similar instances  Helps balance data & avoid overfitting  Increased percent of minority class (1) from 0.19% to 11%  Stratified Split ensures that the output dataset contains a representative sample of the values in the selected column  Ensures that the random sample does not contain all rows with just 0s  8% sample used = 80 MB
  49. 49. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Algorithms in Azure ML Studio  Two-Class Classification:  classify the elements of a given set into two groups – either downloaded, is_attributed (1) – or not downloaded, is_attributed (0) Decision trees  often perform well on imbalanced datasets – as their hierarchical structure allows them to learn signals from both classes. Tree ensembles almost always outperform singular decision trees – Algorithm #1: Two-class Decision Jungle – Algorithm #2: Two-class Decision Forest
  50. 50. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Selecting Performance Metrics False Positives indicate the model predicted an app was downloaded when in fact it wasn’t  Goal: minimize the FP => To save $$$
  51. 51. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA AZURE ML MODEL #1: TWO-CLASS DECISION JUNGLE • 8% Sample • SMOTE 5000% • 70:30 Split Train/Test • Cross-Validation • Tune Model Hyperparameters • Features used: all 7
  52. 52. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA AZURE ML MODEL #1: Tune Model Hyperparameters Without Tune Hyperparameters With Tune Hyperparameters AUC = 0.905 vs 0.606 Precision = 1.0 TP = 35, FP = 0
  53. 53. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA AZURE ML MODEL #2: TWO-CLASS DECISION FOREST • 8% Sample • SMOTE 5000% • 70:30 Split Train/Test • Cross-Validation • Tune Model Hyperparameters • Permutation Feature Importance
  54. 54. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA AZURE ML MODEL #2: Improving Precision Precision increased to 0.992 FP decreased from 1,659 to 377 FN increased from 1,834 to 5,142 By increasing threshold from 0.5 to 0.8
  55. 55. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experimental Results in Azure ML Studio Performance: Execution time with sample data set: 1GB Decision Forrest – takes 2.5 hours Decision Jungle – takes 3 hours 19 min Good Guide from the models of Azure ML Studio  to adopt the 2 similar algorithms for Spark ML – Decision Tree – Random Forest
  56. 56. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experimental Results in AzureML Two-class Decision Forest is the best model!
  57. 57. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experiment with Spark ML in Databricks 1. Load the data source  1.03 GB  Same filtered data set as Azure ML 2. Train and build the models o Balanced data statistically 3. Evaluate
  58. 58. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Engineering Generate features Feature 1: extract day of the week and hour of the day from the click time Feature 2: group clicks by combination of – (Ip, Day_of_week_number and Hour) Feature 3: group clicks by combination of – (Ip, App, Operating System, Day_of_week_number and Hour) Feature 4: group clicks by combination of – (App, Day_of_week_number and Hour) Feature 5: group clicks by combination of – (Ip, App, Device and Operating System) Feature 6: group clicks by combination of – (Ip, Device and Operating System)
  59. 59. 59 Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark ML MODEL #1: Decision Tree Classifier Confusion Matrix
  60. 60. 60 Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark ML MODEL #1: Random Forrest Classifier Confusion Matrix
  61. 61. 6161 Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark ML Result Comparison Decision Tree Classifier is relatively the better model! Decision Tree Classifier Random Forest Classifier AUC 0.815 0.746 PRECISION 0.822 0.878 RECALL 0.633 0.495 TP 86,683 67,726 FP 18,727 9,408 TN 7,112,961 7,122,280 FN 50,074 69,031 RMSE 0.0972 0.1038
  62. 62. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experiment in Oracle Cluster Oracle Big Data Spark Cluster  10 nodes, 20 OCPUs, 300GB Memory, 1,154GB Storage 1. Load the data source  1.03 GB 2. Sample the balanced data based on Downloaded  116 MB 3. Train and build the models o Balanced data statistically 4. Evaluate
  63. 63. 6363 Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Azure ML Studio and Spark ML Result Comparison TWO-CLASS DECISION JUNGLE (AzureML) TWO-CLASS DECISION FOREST (AzureML) DECISION TREE CLASSIFIER (Databricks ) RANDOM FOREST CLASSIFIER (Databricks ) DECISION TREE CLASSIFIER (Balanced Sample Data, Oracle) RANDOM FOREST CLASSIFIER (Balanced Sample Data, Oracle) AUC 0.905 0.997 0.815 0.746 0.896 0.893 PRECISION 1.0 0.992 0.822 0.878 0.935 0.934 RECALL 0.001 0.902 0.633 0.495 0.807 0.800 TP 35 47,199 86,683 67,726 111,187 110,220 FP 0 377 18,727 9,408 7,712 7,791 TN 52,306 406,228 7,112,961 7,122,280 545,302 545,223 FN 406,605 5,142 50,074 69,031 26,604 27,571 Run Time 2 hrs 2-3 hrs 22 mins 50 mins 24 sec 2 mins
  64. 64. 6464 Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Azure ML Studio and Spark ML Result Comparison TWO-CLASS DECISION JUNGLE (AzureML) TWO-CLASS DECISION FOREST (AzureML) DECISION TREE CLASSIFIER (Databricks ) RANDOM FOREST CLASSIFIER (Databricks ) DECISION TREE CLASSIFIER (Balanced Sample Data, Oracle) RANDOM FOREST CLASSIFIER (Balanced Sample Data, Oracle) AUC 0.905 0.997 0.815 0.746 0.896 0.893 PRECISION 1.0 0.992 0.822 0.878 0.935 0.934 RECALL 0.001 0.902 0.633 0.495 0.807 0.800 TP 35 47,199 86,683 67,726 111,187 110,220 FP 0 377 18,727 9,408 7,712 7,791 TN 52,306 406,228 7,112,961 7,122,280 545,302 545,223 FN 406,605 5,142 50,074 69,031 26,604 27,571 Run Time 2 hrs 2-3 hrs 22 mins 50 mins 24 sec 2 mins • Azure ML Two-class Decision Forest is the best model! • Spark ML code need to be updated for the better accuracy • Balanced Sampling based on the fraud in Oracle: • Decision Tree has 0.935 in Precision • Execution Time: 24 secs
  65. 65. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data Predictive Analysis  Summary
  66. 66. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary Introduction to Big Data Ad Click Prediction models in Traditional and Big Data Systems Azure ML Studio shows best accuracy with Two Class Decision Forrest model Spark ML performance is 3.5 – 7 times faster than Azure ML Studio with 1 GB data set but not accurate  With 2 nodes Spark Cluster Balanced sample data in Oracle has the close accuracy to the traditional systems while it is 300 times faster  with 10 nodes Spark Cluster
  67. 67. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Questions?
  68. 68. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Set Details (Cont‘d)
  69. 69. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Precision vs Recall True Positive (TP): Fraud? Yes it is False Negative (FN): No fraud? but it is False Positive (FP): Fraud? but it is not  Precision  TP / (TP + FP)  Recall  TP / (TP + FN)  Ref: https://en.wikipedia.org/wiki/Precision_and_recall Positive: Event occurs (Fraud) Negative: Event does not Occur (non Fraud)
  70. 70. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 1. Priyanka Purushu, Niklas Melcher, Bhagyashree Bhagwat, Jongwook Woo, "Predictive Analysis of Financial Fraud Detection using Azure and Spark ML", Asia Pacific Journal of Information Systems (APJIS), VOL.28│NO.4│December 2018, pp308~319 2. Jongwook Woo, DMKD-00150, “Market Basket Analysis Algorithms with MapReduce”, Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, Oct 28 2013, Volume 3, Issue 6, pp445- 452, ISSN 1942-4795 3. Jongwook Woo, “Big Data Trend and Open Data”, UKC 2016, Dallas, TX, Aug 12 2016 4. How to choose algorithms for Microsoft Azure Machine Learning, https://docs.microsoft.com/en- us/azure/machine-learning/machine-learning-algorithm-choice 5. “Big Data Analysis using Spark for Collision Rate Near CalStateLA” , Manik Katyal, Parag Chhadva, Shubhra Wahi & Jongwook Woo, https://globaljournals.org/GJCST_Volume16/1-Big-Data-Analysis-using-Spark.pdf 6. Spark Programming Guide: http://spark.apache.org/docs/latest/programming-guide.html 7. TensorFrames: Google Tensorflow on Apache Spark, https://www.slideshare.net/databricks/tensorframes- google-tensorflow-on-apache-spark 8. Deep learning and Apache Spark, https://www.slideshare.net/QuantUniversity/deep-learning-and-apache- spark
  71. 71. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 9. Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark, https://www.slideshare.net/SparkSummit/which-is-deeper-comparison-of-deep-learning-frameworks-on- spark 10. Accelerating Machine Learning and Deep Learning At Scale with Apache Spark, https://www.slideshare.net/SparkSummit/accelerating-machine-learning-and-deep-learning-at-scalewith- apache-spark-keynote-by-ziya-ma 11. Deep Learning with Apache Spark and TensorFlow, https://databricks.com/blog/2016/01/25/deep- learning-with-apache-spark-and-tensorflow.html 12. Tensor Flow Deep Learning Open SAP 13. Overview of Smart Factory, https://www.slideshare.net/BrendanSheppard1/overview-of-smart-factory- solutions-68137094/6 14. https://dzone.com/articles/sqoop-import-data-from-mysql-tohive 15. https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data 16. https://blogs.msdn.microsoft.com/andreasderuiter/2015/02/09/performance-measures-in-azure-ml- accuracy-precision-recall-and-f1-score/

×