SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Developing Machine Learning Algorithms on HPCC/ECL Platform 
Industry/University Cooperative Research 
LexisNexis/Florida Atlantic University 
Victor Herrera – FAU – Fall 2014
Developing ML Algorithms On HPCC/ECL Platform 
LexisNexis/Florida Atlantic University Cooperative Research 
HPCC ECL ML Library 
High Level 
Data Centric Declarative Language 
Scalability Dictionary Approach 
Open Source 
LexisNexis HPCC Platform 
Big Data Management 
Optimized Code 
Parallel Processing 
Machine Learning Algorithms Victor Herrera – FAU – Fall 2014
FAU 
Principal Investigator: Taghi Khoshgoftaar Data Mining and Machine Learning 
Researchers: Victor Herrera Maryam Najafabadi 
LexisNexis 
Flavio Villanustre VP Technology & Product 
Edin Muharemagic Data Scientist and Architect 
Project Team Victor Herrera – FAU – Fall 2014
Victor Herrera – FAU – Fall 2014 
Machine Learning Algorithm 
Classification Model 
Training Data 
Constructor 
Businessman 
Doctor 
Learning 
Classification 
Constructor 
Project Scope 
• 
Supervised Learning for Classification 
• 
Big Data 
• 
Parallel Computing 
Implement Machine Learning Algorithms in HPCC ECL-ML library, focusing on:
ECL-ML Learning Library 
ML 
Supervised Learning 
Classifiers 
•Naïve Bayes 
•Perceptron 
•Logistic Regression 
Regression 
Linear Regression 
Unsupervised Learning 
Clustering 
KMeans 
Association Analysis 
•APrioriN 
•EclatN 
Project Starting Point April 2012 
Victor Herrera – FAU – Fall 2014
Community 
Machine Learning Documentation 
Review Code 
• 
Comment 
• 
Merge 
• 
Commit 
• 
Pull Request 
Process comments 
ML-ECL Library Contributor 
ML-ECL Library 
Administrator 
• 
Use 
• 
Contribute 
ECL – ML Library 
HPCC-ECL Training 
HPCC-ECL Documentation Forums 
ECL-ML Open Source Blueprint Victor Herrera – FAU – Fall 2014
Victor Herrera – FAU – Fall 2014 
Algorithm Complexity 
Naïve Bayes 
Decision Trees 
•KNN 
•Random Forest 
Language Knowledge 
•Transformation 
•Aggregations 
Data Distribution 
Parallel Optimization 
Implementation Complexity 
•Understand Existing Code 
•Fix Existing Modules 
Add Functionality 
Design Modules 
Data 
Categorical (Discrete) 
Continuous 
Mixed (Future Work) 
Experience Progress
Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
// Learning Phase 
minNumObj:= 2; maxLevel := 5; // DecTree Parameters 
learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); 
tmodel:= learner1.LearnC(indepData, depData); // DecTree Model 
// Classification Phase 
results1:= learner1.ClassifyC(indepData, tmodel); 
// Comparing to Original 
compare1:= Classify.Compare(depData, results1); 
OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results 
// Learning Phase 
treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters 
learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); 
rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model 
// Classification Phase 
results2:= trainer2.ClassifyC(indepData, rfmodel); 
// Comparing to Original 
compare2:= Classify.Compare(depData, results2); 
OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results 
A40A4A4112A3A412<= 0.6> 0.6<= 4.7> 4.7<= 1.5> 1.5<= 1.8> 1.8<= 1.7> 1.7 
Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
// Learning Phase 
minNumObj:= 2; maxLevel := 5; // DecTree Parameters 
learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); 
tmodel:= learner1.LearnC(indepData, depData); // DecTree Model 
// Classification Phase 
results1:= learner1.ClassifyC(indepData, tmodel); 
// Comparing to Original 
compare1:= Classify.Compare(depData, results1); 
OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results 
// Learning Phase 
treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters 
learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); 
rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model 
// Classification Phase 
results2:= trainer2.ClassifyC(indepData, rfmodel); 
// Comparing to Original 
compare2:= Classify.Compare(depData, results2); 
OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results 
Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
Victor Herrera – FAU – Fall 2014 
Supervised Learning 
Naïve Bayes 
Maximum Likelihood 
Probabilistic Model 
Decision Tree 
Top-Down Induction 
Recursive Data Partitioning 
Decision Tree Model 
Case Based 
KD-Tree Partitioning 
Nearest Neighbor Search 
No Model – Lazy KNN Algorithm 
Ensemble 
Data Sampling Tree Bagging 
F. Sampling Rdn Subspace 
Ensemble Tree Model 
Classification 
Class Probability 
Missing Values Classification 
Area Under ROC 
Project Contribution
• 
Development of Machine Learning Algorithms in ECL Language, focused on the implementation of Classification Algorithms, Big Data and Parallel Processing. 
• 
Added functionality to Naïve Bayes Classifier and implemented Decision Trees, KNN and Random Forest Classifiers into ECL-ML library. 
• 
Currently working on Big Data Learning & Evaluation : 
– 
Experiment design and implementation of a Big Data CASE Study 
– 
Analysis of Results. 
– 
Classifiers enhancements based upon analysis of results. Victor Herrera – FAU – Fall 2014 
Overall Summary
Thank you. Victor Herrera – FAU – Fall 2014

Más contenido relacionado

La actualidad más candente

Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositories
feiwin
 
AjayBhullar_Resume (5)
AjayBhullar_Resume (5)AjayBhullar_Resume (5)
AjayBhullar_Resume (5)
Ajay Bhullar
 

La actualidad más candente (15)

Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositories
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
Mit102 data & file structures
Mit102  data & file structuresMit102  data & file structures
Mit102 data & file structures
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with python
 
Lecture 5
Lecture 5Lecture 5
Lecture 5
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologies
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
AjayBhullar_Resume (5)
AjayBhullar_Resume (5)AjayBhullar_Resume (5)
AjayBhullar_Resume (5)
 
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | EdurekaTensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
ModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex information
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Compet...
 

Destacado

20130313 het abc van sociale media ename
20130313 het abc van sociale media ename20130313 het abc van sociale media ename
20130313 het abc van sociale media ename
kwb_eensgezind
 
20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels
kwb_eensgezind
 
jkintl profile
jkintl profilejkintl profile
jkintl profile
Jaya Kumar
 
Leveraging SMEs’ strengths for INSPIRE
Leveraging SMEs’ strengths for INSPIRELeveraging SMEs’ strengths for INSPIRE
Leveraging SMEs’ strengths for INSPIRE
smespire
 

Destacado (20)

 
Manual
ManualManual
Manual
 
20130313 het abc van sociale media ename
20130313 het abc van sociale media ename20130313 het abc van sociale media ename
20130313 het abc van sociale media ename
 
20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels
 
INSPIRED Magazine Vol 02 Issue 02
INSPIRED Magazine Vol 02 Issue 02INSPIRED Magazine Vol 02 Issue 02
INSPIRED Magazine Vol 02 Issue 02
 
Daily routine
Daily routineDaily routine
Daily routine
 
jkintl profile
jkintl profilejkintl profile
jkintl profile
 
2015 HPCC Systems Engineering Summit Community Day Wrap-up
2015 HPCC Systems Engineering Summit Community Day Wrap-up2015 HPCC Systems Engineering Summit Community Day Wrap-up
2015 HPCC Systems Engineering Summit Community Day Wrap-up
 
Presentación defensa proyecto
Presentación defensa proyecto Presentación defensa proyecto
Presentación defensa proyecto
 
LEVICK Weekly - Jan 11 2013
LEVICK Weekly - Jan 11 2013LEVICK Weekly - Jan 11 2013
LEVICK Weekly - Jan 11 2013
 
HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...
HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...
HPCC Systems Engineering Summit Presentation - Collaborative Research with FA...
 
What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17What is SPF record good for? | Part 7#17
What is SPF record good for? | Part 7#17
 
Starfile0001
Starfile0001Starfile0001
Starfile0001
 
Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency
 
第一次 概念發想
第一次 概念發想第一次 概念發想
第一次 概念發想
 
LEVICK Weekly - Mar 8 2013
LEVICK Weekly - Mar 8 2013LEVICK Weekly - Mar 8 2013
LEVICK Weekly - Mar 8 2013
 
The Laws of the Christian Life
The Laws of the Christian LifeThe Laws of the Christian Life
The Laws of the Christian Life
 
Linea del tiempo
Linea del tiempoLinea del tiempo
Linea del tiempo
 
Training program
Training programTraining program
Training program
 
Leveraging SMEs’ strengths for INSPIRE
Leveraging SMEs’ strengths for INSPIRELeveraging SMEs’ strengths for INSPIRE
Leveraging SMEs’ strengths for INSPIRE
 

Similar a HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
Databricks
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
MLconf
 
Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017
Mark Yashar
 
Mark_Yashar_Resume_Fall_2016
Mark_Yashar_Resume_Fall_2016Mark_Yashar_Resume_Fall_2016
Mark_Yashar_Resume_Fall_2016
Mark Yashar
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
Sujit Pal
 

Similar a HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2 (20)

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRGData Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developments
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017
 
Mark_Yashar_Resume_Fall_2016
Mark_Yashar_Resume_Fall_2016Mark_Yashar_Resume_Fall_2016
Mark_Yashar_Resume_Fall_2016
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Enhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through OntologyEnhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through Ontology
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Basic ideas on keras framework
Basic ideas on keras frameworkBasic ideas on keras framework
Basic ideas on keras framework
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 

Más de HPCC Systems

Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 

Más de HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

HPCC Systems Engineering Summit Presentation - Collaborative Research with FAU and LexisNexis Deck 2

  • 1. Developing Machine Learning Algorithms on HPCC/ECL Platform Industry/University Cooperative Research LexisNexis/Florida Atlantic University Victor Herrera – FAU – Fall 2014
  • 2. Developing ML Algorithms On HPCC/ECL Platform LexisNexis/Florida Atlantic University Cooperative Research HPCC ECL ML Library High Level Data Centric Declarative Language Scalability Dictionary Approach Open Source LexisNexis HPCC Platform Big Data Management Optimized Code Parallel Processing Machine Learning Algorithms Victor Herrera – FAU – Fall 2014
  • 3. FAU Principal Investigator: Taghi Khoshgoftaar Data Mining and Machine Learning Researchers: Victor Herrera Maryam Najafabadi LexisNexis Flavio Villanustre VP Technology & Product Edin Muharemagic Data Scientist and Architect Project Team Victor Herrera – FAU – Fall 2014
  • 4. Victor Herrera – FAU – Fall 2014 Machine Learning Algorithm Classification Model Training Data Constructor Businessman Doctor Learning Classification Constructor Project Scope • Supervised Learning for Classification • Big Data • Parallel Computing Implement Machine Learning Algorithms in HPCC ECL-ML library, focusing on:
  • 5. ECL-ML Learning Library ML Supervised Learning Classifiers •Naïve Bayes •Perceptron •Logistic Regression Regression Linear Regression Unsupervised Learning Clustering KMeans Association Analysis •APrioriN •EclatN Project Starting Point April 2012 Victor Herrera – FAU – Fall 2014
  • 6. Community Machine Learning Documentation Review Code • Comment • Merge • Commit • Pull Request Process comments ML-ECL Library Contributor ML-ECL Library Administrator • Use • Contribute ECL – ML Library HPCC-ECL Training HPCC-ECL Documentation Forums ECL-ML Open Source Blueprint Victor Herrera – FAU – Fall 2014
  • 7. Victor Herrera – FAU – Fall 2014 Algorithm Complexity Naïve Bayes Decision Trees •KNN •Random Forest Language Knowledge •Transformation •Aggregations Data Distribution Parallel Optimization Implementation Complexity •Understand Existing Code •Fix Existing Modules Add Functionality Design Modules Data Categorical (Discrete) Continuous Mixed (Future Work) Experience Progress
  • 8. Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  • 9. // Learning Phase minNumObj:= 2; maxLevel := 5; // DecTree Parameters learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); tmodel:= learner1.LearnC(indepData, depData); // DecTree Model // Classification Phase results1:= learner1.ClassifyC(indepData, tmodel); // Comparing to Original compare1:= Classify.Compare(depData, results1); OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results // Learning Phase treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model // Classification Phase results2:= trainer2.ClassifyC(indepData, rfmodel); // Comparing to Original compare2:= Classify.Compare(depData, results2); OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results A40A4A4112A3A412<= 0.6> 0.6<= 4.7> 4.7<= 1.5> 1.5<= 1.8> 1.8<= 1.7> 1.7 Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  • 10. // Learning Phase minNumObj:= 2; maxLevel := 5; // DecTree Parameters learner1:= ML.Classify.DecisionTree.C45Binary(minNumObj, maxLevel); tmodel:= learner1.LearnC(indepData, depData); // DecTree Model // Classification Phase results1:= learner1.ClassifyC(indepData, tmodel); // Comparing to Original compare1:= Classify.Compare(depData, results1); OUTPUT(compare1.CrossAssignments), ALL); // CrossAssignment results // Learning Phase treeNum:=25; fsNum:=3; Pure:=1.0; maxLevel:=5; // RndForest Parameters learner2 := ML.Classify.RandomForest(treeNum, fsNum, Pure, maxLevel); rfmodel:= learner2.LearnC(indepData, depData); // RndForest Model // Classification Phase results2:= trainer2.ClassifyC(indepData, rfmodel); // Comparing to Original compare2:= Classify.Compare(depData, results2); OUTPUT(compare2.CrossAssignments), ALL); // CrossAssignment results Example: Classifying Iris-Setosa Decision Tree vs. Random Forest Victor Herrera – FAU – Fall 2014
  • 11. Victor Herrera – FAU – Fall 2014 Supervised Learning Naïve Bayes Maximum Likelihood Probabilistic Model Decision Tree Top-Down Induction Recursive Data Partitioning Decision Tree Model Case Based KD-Tree Partitioning Nearest Neighbor Search No Model – Lazy KNN Algorithm Ensemble Data Sampling Tree Bagging F. Sampling Rdn Subspace Ensemble Tree Model Classification Class Probability Missing Values Classification Area Under ROC Project Contribution
  • 12. • Development of Machine Learning Algorithms in ECL Language, focused on the implementation of Classification Algorithms, Big Data and Parallel Processing. • Added functionality to Naïve Bayes Classifier and implemented Decision Trees, KNN and Random Forest Classifiers into ECL-ML library. • Currently working on Big Data Learning & Evaluation : – Experiment design and implementation of a Big Data CASE Study – Analysis of Results. – Classifiers enhancements based upon analysis of results. Victor Herrera – FAU – Fall 2014 Overall Summary
  • 13. Thank you. Victor Herrera – FAU – Fall 2014