Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2

Presenter: Victor Herrera, PhD Candidate and Research Assistant, Florida Atlantic University

NOTE: This is the 2nd presentation in this session and also the 2nd presentation shown in the accompanying YouTube video

  • Sé el primero en comentar

HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2

  1. 1. Developing Machine Learning Algorithms on HPCC/ECL Platform Industry/University Cooperative Research LexisNexis/Florida Atlantic University Victor Herrera – FAU – Fall 2014
  2. 2. Developing ML Algorithms On HPCC/ECL Platform LexisNexis/Florida Atlantic University Cooperative Research HPCC ECL ML Library High Level Data Centric Declarative Language Scalability Dictionary Approach Open Source LexisNexis HPCC Platform Big Data Management Optimized Code Parallel Processing Machine Learning Algorithms Victor Herrera – FAU – Fall 2014
  3. 3. FAU Principal Investigator: Taghi Khoshgoftaar Data Mining and Machine Learning Researchers: Victor Herrera Maryam Najafabadi LexisNexis Flavio Villanustre VP Technology & Product Edin Muharemagic Data Scientist and Architect Project Team Victor Herrera – FAU – Fall 2014
  4. 4. Victor Herrera – FAU – Fall 2014 Machine Learning Algorithm Classification Model Training Data Constructor Businessman Doctor Learning Classification Constructor Project Scope • Supervised Learning for Classification • Big Data • Parallel Computing Implement Machine Learning Algorithms in HPCC ECL-ML library, focusing on:
  5. 5. ECL-ML Learning Library ML Supervised Learning Classifiers •Naïve Bayes •Perceptron •Logistic Regression Regression Linear Regression Unsupervised Learning Clustering KMeans Association Analysis •APrioriN •EclatN Project Starting Point April 2012 Victor Herrera – FAU – Fall 2014
  6. 6. Community Machine Learning Documentation Review Code • Comment • Merge • Commit • Pull Request Process comments ML-ECL Library Contributor ML-ECL Library Administrator • Use • Contribute ECL – ML Library HPCC-ECL Training HPCC-ECL Documentation Forums ECL-ML Open Source Blueprint Victor Herrera – FAU – Fall 2014
  7. 7. Victor Herrera – FAU – Fall 2014 Algorithm Complexity Naïve Bayes Decision Trees •KNN •Random Forest Language Knowledge •Transformation •Aggregations Data Distribution Parallel Optimization Implementation Complexity •Understand Existing Code •Fix Existing Modules Add Functionality Design Modules Data Categorical (Discrete) Continuous Mixed (Future Work) Experience Progress
  8. 8. Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  9. 9. // Learning Phase minNumObj:= 2; maxLevel := 5; // DecTree Parameters learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); tmodel:= learner1.LearnC(indepData, depData); // DecTree Model // Classification Phase results1:= learner1.ClassifyC(indepData, tmodel); // Comparing to Original compare1:= Classify.Compare(depData, results1); OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results // Learning Phase treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model // Classification Phase results2:= trainer2.ClassifyC(indepData, rfmodel); // Comparing to Original compare2:= Classify.Compare(depData, results2); OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results A40A4A4112A3A412<= 0.6> 0.6<= 4.7> 4.7<= 1.5> 1.5<= 1.8> 1.8<= 1.7> 1.7 Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  10. 10. // Learning Phase minNumObj:= 2; maxLevel := 5; // DecTree Parameters learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); tmodel:= learner1.LearnC(indepData, depData); // DecTree Model // Classification Phase results1:= learner1.ClassifyC(indepData, tmodel); // Comparing to Original compare1:= Classify.Compare(depData, results1); OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results // Learning Phase treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model // Classification Phase results2:= trainer2.ClassifyC(indepData, rfmodel); // Comparing to Original compare2:= Classify.Compare(depData, results2); OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  11. 11. Victor Herrera – FAU – Fall 2014 Supervised Learning Naïve Bayes Maximum Likelihood Probabilistic Model Decision Tree Top-Down Induction Recursive Data Partitioning Decision Tree Model Case Based KD-Tree Partitioning Nearest Neighbor Search No Model – Lazy KNN Algorithm Ensemble Data Sampling Tree Bagging F. Sampling Rdn Subspace Ensemble Tree Model Classification Class Probability Missing Values Classification Area Under ROC Project Contribution
  12. 12. • Development of Machine Learning Algorithms in ECL Language, focused on the implementation of Classification Algorithms, Big Data and Parallel Processing. • Added functionality to Naïve Bayes Classifier and implemented Decision Trees, KNN and Random Forest Classifiers into ECL-ML library. • Currently working on Big Data Learning & Evaluation : – Experiment design and implementation of a Big Data CASE Study – Analysis of Results. – Classifiers enhancements based upon analysis of results. Victor Herrera – FAU – Fall 2014 Overall Summary
  13. 13. Thank you. Victor Herrera – FAU – Fall 2014

×