SlideShare una empresa de Scribd logo
1 de 59
Descargar para leer sin conexión
Practical Machine Learning
(Part I)
Traian Rebedea
Faculty of Automatic Control and Computers, UPB
traian.rebedea@cs.pub.ro
Outline
• About me
• Why machine learning?
• Basic notions of machine learning
• Sample problems and solutions
– Finding vandalism on Wikipedia
– Sexual predator detection
– Identification of buildings on a map
– Change in the meaning of concepts over time
• Practical ML
– R
– Python (sklearn)
– Java (Weka)
– Others
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
About me
• Started working in 2003 (year 2 of studies)
– J2EE developer
• Then created my own company with a colleague (in 2005)
– Web projects (PHP, Javascript, etc.), involved in some startups for the
technological aspects (e.g. Happyfish TV)
• Started teaching at A&C, UPB (in 2006)
– TA for Algorithms, Natural Language Processing
• Started my PhD (in 2007)
– Natural Language Processing, Discourse Analysis, Tehnology-Enhanced
Learning
• Now I am teaching several lectures: Algorithm Design, Algorithm Design
and Complexity, Symbolic and Statistical Learning, Information Retrieval
• Working on several topics in NLP: opinion mining, conversational agents,
question-answering, culturonomics
• Interested in all new projects that use NLP, ML and IR as well as any other
“smart” applications
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Why machine learning?
• “Simply put, machine learning is the part of artificial
intelligence that actually works” (Forbes, link)
• Most popular course on Coursera
• Popular?
– Meaning practical: ML results are nice and easy to show to
others
– Meaning well-paid/in demand: companies are increasing
demand for ML specialists
• Large volumes of data, corpora, etc.
• Computer science, data science, almost any other
domain (especially in research)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Why ML?
• Mix of Computer science and Math (Statistics)
skills
• Why is math essential to ML?
– http://courses.washington.edu/css490/2012.Winter/lecture_slides/02
_math_essentials.pdf
• “To get really useful results, you need good
mathematical intuitions about certain general
machine learning principles, as well as the
inner workings of the individual algorithms.”
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Basic Notions of ML
• Different tasks in ML
• Supervised
– Regression
– Classification
• Unsupervised
– Clustering
– Dimensionality reduction / visualization
• Semi-supervised
– Label propagation
• Reinforcement learning
• Deep learning
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Supervised learning
• Dataset consisting of labeled examples
• Each example has one or several attributes
and a dependent variable (label, output,
response, predicted variable)
• After building, selection and assessing the
model on the labeled data, it is used to find
labels for new data
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Supervised learning
• Dataset is usually split into three parts
– A rule is 50-25-25
• Training set
– Used to build models
– Compute the basic parameters of the model
• Holdout/validation set
– Used to find out the best model and adjust some parameters
of the model (usually, in order to minimize some error)
– Also called model selection
• Test set
– Used to assess the final model on new data, after model
building and selection
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://blogs.sas.com/content/jmp/2010/07/06/train-validate-and-test-for-
data-mining-in-jmp/
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Cross-validation
If no of partitions (S) = no of training instances 
leave-one-out (only for small datasets)
Main disadvantages:
- no of runs is
increased by a factor
of S
- Multiple parameters
for the same model
for different runs
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Regression
• Supervised learning when the output variable
is continous
– This is the usual textbook definition
– Also may be used for binary output variables (e.g.
Not-spam=0, Spam=1) or output variables that
have an ordering (e.g. integer numbers, sentiment
levels, etc.)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://smlv.cc.gatech.edu/2010/10/06/linear-regression-and-least-
squares-estimation/
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Classification
• Supervised learning when the output variable
is discrete (classes)
– Can also be used for binary or integer outputs
– Even if they are ordered, but classification usually
does not account for the order
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• CIFAR dataset: http://www.cs.toronto.edu/~kriz/cifar.html (also image source)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Classification
• Several important types
– Binary vs. multiple classes
– Hard vs. soft
– Single-label vs. multi-label (per example)
– Balanced vs. unbalanced classes
• Extensions that are semi-supervised
– One-class classification
– PU (Positive and Unlabeled) learning
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Unsupervised learning
• For an unlabeled dataset, infer properties (find
“hidden” structure) about the distribution of the
objects in the dataset
– without any help related to correct answers
• Generally, much more data than for supervised
learning
• There are several techniques that can be applied
– Clustering (cluster analysis)
– Dimensionality reduction (visualization of data)
– Association rules, frequent itemsets
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Clustering
• Partition dataset into clusters (groups) of
objects such that the ones in the same cluster
are more similar to each other than to objects
in different clusters
• Similarity
– The most important notion when clustering
– “Inverse of a distance”
• Possible objective: approximate the modes of
the input dataset distribution
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Clustering
• Lots of different cluster models:
– Connectivity (distance models) (e.g Hierarchical)
– Centroid (e.g. K-means)
– Distribution (e.g GMM)
– Density of data (e.g. DBSCAN)
– Graph (e.g. cliques, community detection)
– Subspace (e.g. biclustering)
– …
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://genomics10.bu.edu/terrence/gems/gems.html
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: https://sites.google.com/site/statsr4us/advanced/multivariate/cluster
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Dimensionality reduction
• Usually, unsupervised datasets are large both
in number of examples and in number of
attributes (features)
• Reduce the number of features in order to
improve (human) readability and
interpretation of data
• Mainly linked to visualization
– Also used for feature selection, data compression
or denoising
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://www.ifs.tuwien.ac.at/ifs/research/pub_pdf/rau_ecdl01.pdf
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Semi-supervised learning
• Mix of labeled and unlabeled data in the dataset
• Enlarge the labeled dataset with unlabeled instances for which we
might determine the correct label with a high probability
• Unlabeled data is much cheaper or simpler to get than labeled data
– Human annotation either requires experts or is not challenging (it is
boring)
– Annotation may also require specific software or other types of
devices
– Students (or Amazon Mturk users) are not always good annotators
• More information:
http://pages.cs.wisc.edu/~jerryzhu/pub/sslicml07.pdf
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Label propagation
• Start with the labeled dataset
• Want to assign labels to some (or all) of the unlabeled
instances
• Assumption: “data points that are close have similar labels”
• Build a fully connected graph with both labeled and unlabeled
instances as vertices
• The weight of an edge represents the similarity between the
points (or inverse of a distance)
• Repeat until convergence
– Propagate labels through edges (larger weights allow a
more quick propagation)
– Nodes will have soft labels
• More info: http://lvk.cs.msu.su/~bruzz/articles/classification/zhu02learning.pdf
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://web.ornl.gov/sci/knowledgediscovery/Projects.htm
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Basic Notions of ML
• How to asses the performance of a model?
• Used to choose the best model
• Supervised learning
– Assessing errors of the model on the validation
and test sets
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Performance measures - supervised
• Regression
– Mean error, mean squared error (MSE), root MSE, etc.
• Classification
– Accuracy
– Precision / recall (possibly weighted)
– F-measure
– Confusion matrix
– TP, TN, FP, FN
– AUC / ROC
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://www.resacorp.com/lsmse2.htm
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://en.wikipedia.org/wiki/Sensitivity_and_specificity
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Basic Notions of ML
• Overfitting
– Low error on training data
– High error on (unseen/new) test data
– The model does not generalize well from the training data
on unseen data
– Possible causes:
• Too few training examples
• Model too complex (too many parameters)
• Underfitting
– High error both on training and test data
– Model is too simple to capture the information in the
training data
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://gerardnico.com/wiki/data_mining/overfitting
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Sample problems and solutions
• Chosen from prior experience
• May not be the most interesting problems for all
(any) of you
• I have tried to present 4 problems that are
somehow diverse in solution and simple to
explain
• All solutions are state-of-the-art (that means not
the very best one, but among the top)
– Very best solution may be found in research papers
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Finding vandalism on Wikipedia
• Work with Dan Cioiu
• Vandalism = editing in a malicious manner that is intentionally
disruptive
• Dataset used at PAN 2011
– Real articles extracted from Wikipedia in 2009 and 2010
– Training data: 32,000 edits
– Test data: 130,000 edits
• An edit consists of:
– the old and new content of the article
– author information
– revision comment and timestamp
• We chose to use only the actual text in the revision (no author and
timestamp information) to predict vandalism
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Features
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Using several classifiers
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Cross-validation results
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Combining results using a meta-
classifier
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Final results on test set
• Source: http://link.springer.com/chapter/10.1007%2F978-3-642-40495-5_32
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Main conclusions
• Feature extraction is important in ML
– Which are the best features?
• Syntactic and context analysis are the most important
• Semantic analysis is slow but increases detection by
10%
• Using features related to users would further improve
our results
• A meta-classifier sometimes improves the results of
several individual classifiers
– Not always! Maybe another discussion on this…
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Sexual predator detection
• Work with Claudia Cardei
• Automatically detect sexual predators in a chat
discussion in order to alert the parents or the police
• Corpus from PAN 2012
– 60000+ chat conversations (around 10000 after removing
outliers)
– 90000+ users
– 142 sexual predators
• Sexual predator is used “to describe a person seen as
obtaining or trying to obtain sexual contact with
another person in a metaphorically predatory manner”
(Wikipedia)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Problem description
• Decide either a user is a predator or not
• Use only the chats that have a predator
(balanced corpus: 50% predators - 50%
victims)
• One document for each user
• How to determine the features?
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Simple solution
• “Bag of words” (BoW) features with tf-idf
weighting
• Feature extraction using mutual information
• MLP (neural network) with 500 features: 89%
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Going beyond
• The should be other features that are discriminative for predators
• Investigated behavioral features:
– Percentage of questions and inquiries initiated by an user
– Percentage of negations
– Expressions that might denote an underage user (“don’t know”, “never
did”,…)
– The age of the user if found in a chat reply (“asl” like answers)
– Percentage of slang words
– Percentage of words with a sexual connotation
– Readability scores (Flesch)
– Who started the conversation
• AdaBoost (with DecisionStump): 90%
• Random Forest: 93%
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Main conclusions
• Feature selection is useful when the number of
features is large
– Reduced from 10k+ features to hundreds/thousands
– We used MI, there are others techniques
• Fewer features can be as descriptive as (or even
more descriptive than) the BoW model
– Difficult to choose
– Need additional computations and information
– Need to understand the dataset
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Identification of buildings on a map
• Work with Alexandra Vieru, Paul Vlase
• Given an aerial image, identify the buildings
• Non-military purposes:
– Estimate population density
– Identify main landmarks (shopping malls, etc.)
– Identify new buildings over time
• Features
– HOG descriptors – 3x3 cells
– Histogram of RGB – 5 bins, normalized
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Preliminary results
• SVM classifier
• Train: 2276 images, Test: 637 images
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Main conclusions
• Verify which are the best features
– If this is possible
• Must verify for overfitting using train and test
corpus
• Each task can have a baseline (a simple
classification) method
– Here it might be SVM with RGB histogram features
• Actually this could be treated as a one-class
classification (or PU learning) problem
– There are much more negative instances than positive
ones
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Change in the meaning of concepts
over time
• Work with Costin Chiru, Bianca Dinu, Diana Mincu
• Assumption:
– Different meanings of words should be identifiable
based on the other words that are associated with
them
– These other words are changing over time thus
denoting a change in the meaning (or usage of the
meanings)
• Using the Google Books n-grams corpus
– Dependency information
– 5-grams information centered on the analyzed word
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: https://books.google.com/ngrams (Query: “gay=>*_NOUN”)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Clustering using tf-idf over dependencies of the
analyzed word
• Each distinct year is considered a different document
and each dependency is a feature/term
• Example for the word “gay”:
– Cluster 1 (1600 - 1925): signature {more, lively, happy,
brilliant, cheerful}
– Cluster 2 (1923-1969): signature {more, lively, happy,
cheerful, be}
– Cluster 3 (1981 - today): signature {lesbian, bisexual, anti,
enolum, straight}
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Practical ML
• Brief discussion of alternatives that might be
used by a programmer
• The main idea is to show that it is simple to
start working on a ML task
• However, it is more difficult to achieve very
good results
– Experience
– Theoretical underpinnings of models
– Statistical background
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
R language
• Designed for “statistical computing”
• Lots of packages and functions for ML & statistics
• Simple to output visualizations
• Easy to write code, short
• It is kind of slow on some tasks
• Need to understand some new concepts: data frame,
factor, etc.
• You may find the same functionality in several
packages (e.g. NaïveBayes, Cohen’s Kappa, etc.)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Python (sklearn)
• sklearn (scikit-learn) package developed for
ML
• Uses existing packages numpy, scipy
• Started as a GSOC project in 2007
• It is new, implemented efficiently (time and
memory), lots of functionality
• Easy to write code
– Naming conventions, parameters, etc.
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://peekaboo-vision.blogspot.ro/2013/01/machine-learning-cheat-
sheet-for-scikit.html
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Java (Weka)
• Similar to scikit-learn for Python
• Older (since 1997) and more stable than scikit-learn
• Some implementations are (were) a little bit more
inefficient, especially related to memory consumption
• However, usually it is fast, accurate, and has lots of
other useful utilities
• It also has a GUI for getting some results fast without
writing any code
• Lots of people are debating whether Weka or scikit-learn is
better. I don’t want to get into this discussion as it depends on
your task and resources.
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Not discussed
• RapidMiner
– Easy to use
– GUI interface for building the processing pipeline
– Can write some code/plugins
• Matlab/Octave
– Numeric computing
– Paid vs. free
• SPSS/SAS/Stata
– Somehow similar to R, faster, more robust, expensive
• Many others:
– Apache Mahout / MALLET / NLTK / Orange / MLPACK
– Specific solutions for various ML models
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Conclusions
• Lots of interesting ML tasks and data available
– Do you have access to new and interesting data in your
business?
• Several easy to use programming alternatives
• Understand the basics
• Enhance your maths (statistics) skills for a better ML
experience
• Hands-on experience and communities of practice
– www.kaggle.com (and other similar initiatives)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Thank you!
traian.rebedea@cs.pub.ro
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
_____
_____

Más contenido relacionado

La actualidad más candente

Fundamentals of OOP (Object Oriented Programming)
Fundamentals of OOP (Object Oriented Programming)Fundamentals of OOP (Object Oriented Programming)
Fundamentals of OOP (Object Oriented Programming)MD Sulaiman
 
Trends in DM.pptx
Trends in DM.pptxTrends in DM.pptx
Trends in DM.pptxImXaib
 
Compiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementCompiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementEelco Visser
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work IIMohamed Loey
 
Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptxfunnyworld18
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsBuhwan Jeong
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Object oriented programming (oop) cs304 power point slides lecture 01
Object oriented programming (oop)   cs304 power point slides lecture 01Object oriented programming (oop)   cs304 power point slides lecture 01
Object oriented programming (oop) cs304 power point slides lecture 01Adil Kakakhel
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep LearningMyungjin Lee
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning Omkar Rane
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural networkServer CALAP
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)butest
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 

La actualidad más candente (20)

Fundamentals of OOP (Object Oriented Programming)
Fundamentals of OOP (Object Oriented Programming)Fundamentals of OOP (Object Oriented Programming)
Fundamentals of OOP (Object Oriented Programming)
 
Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithms
 
Trends in DM.pptx
Trends in DM.pptxTrends in DM.pptx
Trends in DM.pptx
 
Compiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementCompiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory Management
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptx
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Object oriented programming (oop) cs304 power point slides lecture 01
Object oriented programming (oop)   cs304 power point slides lecture 01Object oriented programming (oop)   cs304 power point slides lecture 01
Object oriented programming (oop) cs304 power point slides lecture 01
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep Learning
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Lecture 1 oop
Lecture 1 oopLecture 1 oop
Lecture 1 oop
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
[OOP - Lec 02] Why do we need OOP
[OOP - Lec 02] Why do we need OOP[OOP - Lec 02] Why do we need OOP
[OOP - Lec 02] Why do we need OOP
 
ML Basics
ML BasicsML Basics
ML Basics
 

Similar a Practical machine learning - Part 1

Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nybergdiannepatricia
 
Simulation and modeling introduction.pptx
Simulation and modeling introduction.pptxSimulation and modeling introduction.pptx
Simulation and modeling introduction.pptxShamasRehman4
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Hans Põldoja
 
Domain Modeling for Personalized Learning
Domain Modeling for Personalized LearningDomain Modeling for Personalized Learning
Domain Modeling for Personalized LearningPeter Brusilovsky
 
Enabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMSEnabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMSMohamed EL Zayat
 
Measuring the user experience
Measuring the user experienceMeasuring the user experience
Measuring the user experienceAndres Baravalle
 
Lessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareLessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareChristian Ramirez
 
Using Qualtrics for Online Trainings
Using Qualtrics for Online TrainingsUsing Qualtrics for Online Trainings
Using Qualtrics for Online TrainingsShalin Hai-Jew
 
Course Possibilities & Architecture
Course Possibilities & ArchitectureCourse Possibilities & Architecture
Course Possibilities & ArchitectureFolajimi Fakoya
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
 
Unit IV Software Engineering
Unit IV Software EngineeringUnit IV Software Engineering
Unit IV Software EngineeringNandhini S
 
Web-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesWeb-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesHans Põldoja
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.pptNelsonYanes6
 
Evaluation beyond the desktop
Evaluation beyond the desktopEvaluation beyond the desktop
Evaluation beyond the desktopThomas Grill
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 

Similar a Practical machine learning - Part 1 (20)

Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
 
WWW2015 PHD Symposium
WWW2015 PHD SymposiumWWW2015 PHD Symposium
WWW2015 PHD Symposium
 
Simulation and modeling introduction.pptx
Simulation and modeling introduction.pptxSimulation and modeling introduction.pptx
Simulation and modeling introduction.pptx
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
 
Domain Modeling for Personalized Learning
Domain Modeling for Personalized LearningDomain Modeling for Personalized Learning
Domain Modeling for Personalized Learning
 
Enabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMSEnabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMS
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
 
Measuring the user experience
Measuring the user experienceMeasuring the user experience
Measuring the user experience
 
Building Comfort with MATLAB
Building Comfort with MATLABBuilding Comfort with MATLAB
Building Comfort with MATLAB
 
Lessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareLessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning Software
 
Using Qualtrics for Online Trainings
Using Qualtrics for Online TrainingsUsing Qualtrics for Online Trainings
Using Qualtrics for Online Trainings
 
Course Possibilities & Architecture
Course Possibilities & ArchitectureCourse Possibilities & Architecture
Course Possibilities & Architecture
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Unit IV Software Engineering
Unit IV Software EngineeringUnit IV Software Engineering
Unit IV Software Engineering
 
Be cse
Be cseBe cse
Be cse
 
Web-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesWeb-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital Competences
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
 
Evaluation beyond the desktop
Evaluation beyond the desktopEvaluation beyond the desktop
Evaluation beyond the desktop
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 

Más de Traian Rebedea

An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeTraian Rebedea
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5Traian Rebedea
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesTraian Rebedea
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...Traian Rebedea
 
A focused crawler for romanian words discovery
A focused crawler for romanian words discoveryA focused crawler for romanian words discovery
A focused crawler for romanian words discoveryTraian Rebedea
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitareTraian Rebedea
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaTraian Rebedea
 
Relevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTubeRelevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTubeTraian Rebedea
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianTraian Rebedea
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...Traian Rebedea
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriTraian Rebedea
 
Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Traian Rebedea
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Traian Rebedea
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyTraian Rebedea
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Traian Rebedea
 

Más de Traian Rebedea (20)

An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...
 
A focused crawler for romanian words discovery
A focused crawler for romanian words discoveryA focused crawler for romanian words discovery
A focused crawler for romanian words discovery
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitare
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
 
Relevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTubeRelevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTube
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuri
 
Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD Survey
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Practical machine learning - Part 1

  • 1. Practical Machine Learning (Part I) Traian Rebedea Faculty of Automatic Control and Computers, UPB traian.rebedea@cs.pub.ro
  • 2. Outline • About me • Why machine learning? • Basic notions of machine learning • Sample problems and solutions – Finding vandalism on Wikipedia – Sexual predator detection – Identification of buildings on a map – Change in the meaning of concepts over time • Practical ML – R – Python (sklearn) – Java (Weka) – Others 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 3. About me • Started working in 2003 (year 2 of studies) – J2EE developer • Then created my own company with a colleague (in 2005) – Web projects (PHP, Javascript, etc.), involved in some startups for the technological aspects (e.g. Happyfish TV) • Started teaching at A&C, UPB (in 2006) – TA for Algorithms, Natural Language Processing • Started my PhD (in 2007) – Natural Language Processing, Discourse Analysis, Tehnology-Enhanced Learning • Now I am teaching several lectures: Algorithm Design, Algorithm Design and Complexity, Symbolic and Statistical Learning, Information Retrieval • Working on several topics in NLP: opinion mining, conversational agents, question-answering, culturonomics • Interested in all new projects that use NLP, ML and IR as well as any other “smart” applications 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 4. Why machine learning? • “Simply put, machine learning is the part of artificial intelligence that actually works” (Forbes, link) • Most popular course on Coursera • Popular? – Meaning practical: ML results are nice and easy to show to others – Meaning well-paid/in demand: companies are increasing demand for ML specialists • Large volumes of data, corpora, etc. • Computer science, data science, almost any other domain (especially in research) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 5. Why ML? • Mix of Computer science and Math (Statistics) skills • Why is math essential to ML? – http://courses.washington.edu/css490/2012.Winter/lecture_slides/02 _math_essentials.pdf • “To get really useful results, you need good mathematical intuitions about certain general machine learning principles, as well as the inner workings of the individual algorithms.” 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 6. Basic Notions of ML • Different tasks in ML • Supervised – Regression – Classification • Unsupervised – Clustering – Dimensionality reduction / visualization • Semi-supervised – Label propagation • Reinforcement learning • Deep learning 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 7. Supervised learning • Dataset consisting of labeled examples • Each example has one or several attributes and a dependent variable (label, output, response, predicted variable) • After building, selection and assessing the model on the labeled data, it is used to find labels for new data 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 8. Supervised learning • Dataset is usually split into three parts – A rule is 50-25-25 • Training set – Used to build models – Compute the basic parameters of the model • Holdout/validation set – Used to find out the best model and adjust some parameters of the model (usually, in order to minimize some error) – Also called model selection • Test set – Used to assess the final model on new data, after model building and selection 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 10. Cross-validation If no of partitions (S) = no of training instances  leave-one-out (only for small datasets) Main disadvantages: - no of runs is increased by a factor of S - Multiple parameters for the same model for different runs 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 11. Regression • Supervised learning when the output variable is continous – This is the usual textbook definition – Also may be used for binary output variables (e.g. Not-spam=0, Spam=1) or output variables that have an ordering (e.g. integer numbers, sentiment levels, etc.) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 12. • Source: http://smlv.cc.gatech.edu/2010/10/06/linear-regression-and-least- squares-estimation/ 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 13. Classification • Supervised learning when the output variable is discrete (classes) – Can also be used for binary or integer outputs – Even if they are ordered, but classification usually does not account for the order 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 14. • CIFAR dataset: http://www.cs.toronto.edu/~kriz/cifar.html (also image source) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 15. Classification • Several important types – Binary vs. multiple classes – Hard vs. soft – Single-label vs. multi-label (per example) – Balanced vs. unbalanced classes • Extensions that are semi-supervised – One-class classification – PU (Positive and Unlabeled) learning 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 16. Unsupervised learning • For an unlabeled dataset, infer properties (find “hidden” structure) about the distribution of the objects in the dataset – without any help related to correct answers • Generally, much more data than for supervised learning • There are several techniques that can be applied – Clustering (cluster analysis) – Dimensionality reduction (visualization of data) – Association rules, frequent itemsets 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 17. Clustering • Partition dataset into clusters (groups) of objects such that the ones in the same cluster are more similar to each other than to objects in different clusters • Similarity – The most important notion when clustering – “Inverse of a distance” • Possible objective: approximate the modes of the input dataset distribution 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 18. Clustering • Lots of different cluster models: – Connectivity (distance models) (e.g Hierarchical) – Centroid (e.g. K-means) – Distribution (e.g GMM) – Density of data (e.g. DBSCAN) – Graph (e.g. cliques, community detection) – Subspace (e.g. biclustering) – … 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 19. • Source: http://genomics10.bu.edu/terrence/gems/gems.html 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 20. • Source: https://sites.google.com/site/statsr4us/advanced/multivariate/cluster 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 21. Dimensionality reduction • Usually, unsupervised datasets are large both in number of examples and in number of attributes (features) • Reduce the number of features in order to improve (human) readability and interpretation of data • Mainly linked to visualization – Also used for feature selection, data compression or denoising 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 22. • Source: http://www.ifs.tuwien.ac.at/ifs/research/pub_pdf/rau_ecdl01.pdf 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 23. Semi-supervised learning • Mix of labeled and unlabeled data in the dataset • Enlarge the labeled dataset with unlabeled instances for which we might determine the correct label with a high probability • Unlabeled data is much cheaper or simpler to get than labeled data – Human annotation either requires experts or is not challenging (it is boring) – Annotation may also require specific software or other types of devices – Students (or Amazon Mturk users) are not always good annotators • More information: http://pages.cs.wisc.edu/~jerryzhu/pub/sslicml07.pdf 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 24. Label propagation • Start with the labeled dataset • Want to assign labels to some (or all) of the unlabeled instances • Assumption: “data points that are close have similar labels” • Build a fully connected graph with both labeled and unlabeled instances as vertices • The weight of an edge represents the similarity between the points (or inverse of a distance) • Repeat until convergence – Propagate labels through edges (larger weights allow a more quick propagation) – Nodes will have soft labels • More info: http://lvk.cs.msu.su/~bruzz/articles/classification/zhu02learning.pdf 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 25. • Source: http://web.ornl.gov/sci/knowledgediscovery/Projects.htm 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 26. Basic Notions of ML • How to asses the performance of a model? • Used to choose the best model • Supervised learning – Assessing errors of the model on the validation and test sets 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 27. Performance measures - supervised • Regression – Mean error, mean squared error (MSE), root MSE, etc. • Classification – Accuracy – Precision / recall (possibly weighted) – F-measure – Confusion matrix – TP, TN, FP, FN – AUC / ROC 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 28. • Source: http://www.resacorp.com/lsmse2.htm 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 29. • Source: http://en.wikipedia.org/wiki/Sensitivity_and_specificity 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 30. Basic Notions of ML • Overfitting – Low error on training data – High error on (unseen/new) test data – The model does not generalize well from the training data on unseen data – Possible causes: • Too few training examples • Model too complex (too many parameters) • Underfitting – High error both on training and test data – Model is too simple to capture the information in the training data 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 31. • Source: http://gerardnico.com/wiki/data_mining/overfitting 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 32. Sample problems and solutions • Chosen from prior experience • May not be the most interesting problems for all (any) of you • I have tried to present 4 problems that are somehow diverse in solution and simple to explain • All solutions are state-of-the-art (that means not the very best one, but among the top) – Very best solution may be found in research papers 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 33. Finding vandalism on Wikipedia • Work with Dan Cioiu • Vandalism = editing in a malicious manner that is intentionally disruptive • Dataset used at PAN 2011 – Real articles extracted from Wikipedia in 2009 and 2010 – Training data: 32,000 edits – Test data: 130,000 edits • An edit consists of: – the old and new content of the article – author information – revision comment and timestamp • We chose to use only the actual text in the revision (no author and timestamp information) to predict vandalism 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 34. Features 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 35. Using several classifiers 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 36. Cross-validation results 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 37. Combining results using a meta- classifier 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 38. Final results on test set • Source: http://link.springer.com/chapter/10.1007%2F978-3-642-40495-5_32 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 39. Main conclusions • Feature extraction is important in ML – Which are the best features? • Syntactic and context analysis are the most important • Semantic analysis is slow but increases detection by 10% • Using features related to users would further improve our results • A meta-classifier sometimes improves the results of several individual classifiers – Not always! Maybe another discussion on this… 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 40. Sexual predator detection • Work with Claudia Cardei • Automatically detect sexual predators in a chat discussion in order to alert the parents or the police • Corpus from PAN 2012 – 60000+ chat conversations (around 10000 after removing outliers) – 90000+ users – 142 sexual predators • Sexual predator is used “to describe a person seen as obtaining or trying to obtain sexual contact with another person in a metaphorically predatory manner” (Wikipedia) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 41. Problem description • Decide either a user is a predator or not • Use only the chats that have a predator (balanced corpus: 50% predators - 50% victims) • One document for each user • How to determine the features? 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 42. Simple solution • “Bag of words” (BoW) features with tf-idf weighting • Feature extraction using mutual information • MLP (neural network) with 500 features: 89% 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 43. Going beyond • The should be other features that are discriminative for predators • Investigated behavioral features: – Percentage of questions and inquiries initiated by an user – Percentage of negations – Expressions that might denote an underage user (“don’t know”, “never did”,…) – The age of the user if found in a chat reply (“asl” like answers) – Percentage of slang words – Percentage of words with a sexual connotation – Readability scores (Flesch) – Who started the conversation • AdaBoost (with DecisionStump): 90% • Random Forest: 93% 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 44. Main conclusions • Feature selection is useful when the number of features is large – Reduced from 10k+ features to hundreds/thousands – We used MI, there are others techniques • Fewer features can be as descriptive as (or even more descriptive than) the BoW model – Difficult to choose – Need additional computations and information – Need to understand the dataset 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 45. Identification of buildings on a map • Work with Alexandra Vieru, Paul Vlase • Given an aerial image, identify the buildings • Non-military purposes: – Estimate population density – Identify main landmarks (shopping malls, etc.) – Identify new buildings over time • Features – HOG descriptors – 3x3 cells – Histogram of RGB – 5 bins, normalized 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 46. 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 47. Preliminary results • SVM classifier • Train: 2276 images, Test: 637 images 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 48. Main conclusions • Verify which are the best features – If this is possible • Must verify for overfitting using train and test corpus • Each task can have a baseline (a simple classification) method – Here it might be SVM with RGB histogram features • Actually this could be treated as a one-class classification (or PU learning) problem – There are much more negative instances than positive ones 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 49. Change in the meaning of concepts over time • Work with Costin Chiru, Bianca Dinu, Diana Mincu • Assumption: – Different meanings of words should be identifiable based on the other words that are associated with them – These other words are changing over time thus denoting a change in the meaning (or usage of the meanings) • Using the Google Books n-grams corpus – Dependency information – 5-grams information centered on the analyzed word 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 50. • Source: https://books.google.com/ngrams (Query: “gay=>*_NOUN”) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 51. • Clustering using tf-idf over dependencies of the analyzed word • Each distinct year is considered a different document and each dependency is a feature/term • Example for the word “gay”: – Cluster 1 (1600 - 1925): signature {more, lively, happy, brilliant, cheerful} – Cluster 2 (1923-1969): signature {more, lively, happy, cheerful, be} – Cluster 3 (1981 - today): signature {lesbian, bisexual, anti, enolum, straight} 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 52. Practical ML • Brief discussion of alternatives that might be used by a programmer • The main idea is to show that it is simple to start working on a ML task • However, it is more difficult to achieve very good results – Experience – Theoretical underpinnings of models – Statistical background 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 53. R language • Designed for “statistical computing” • Lots of packages and functions for ML & statistics • Simple to output visualizations • Easy to write code, short • It is kind of slow on some tasks • Need to understand some new concepts: data frame, factor, etc. • You may find the same functionality in several packages (e.g. NaïveBayes, Cohen’s Kappa, etc.) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 54. Python (sklearn) • sklearn (scikit-learn) package developed for ML • Uses existing packages numpy, scipy • Started as a GSOC project in 2007 • It is new, implemented efficiently (time and memory), lots of functionality • Easy to write code – Naming conventions, parameters, etc. 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 55. • Source: http://peekaboo-vision.blogspot.ro/2013/01/machine-learning-cheat- sheet-for-scikit.html 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 56. Java (Weka) • Similar to scikit-learn for Python • Older (since 1997) and more stable than scikit-learn • Some implementations are (were) a little bit more inefficient, especially related to memory consumption • However, usually it is fast, accurate, and has lots of other useful utilities • It also has a GUI for getting some results fast without writing any code • Lots of people are debating whether Weka or scikit-learn is better. I don’t want to get into this discussion as it depends on your task and resources. 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 57. Not discussed • RapidMiner – Easy to use – GUI interface for building the processing pipeline – Can write some code/plugins • Matlab/Octave – Numeric computing – Paid vs. free • SPSS/SAS/Stata – Somehow similar to R, faster, more robust, expensive • Many others: – Apache Mahout / MALLET / NLTK / Orange / MLPACK – Specific solutions for various ML models 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 58. Conclusions • Lots of interesting ML tasks and data available – Do you have access to new and interesting data in your business? • Several easy to use programming alternatives • Understand the basics • Enhance your maths (statistics) skills for a better ML experience • Hands-on experience and communities of practice – www.kaggle.com (and other similar initiatives) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 59. Thank you! traian.rebedea@cs.pub.ro 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49 _____ _____