SlideShare a Scribd company logo
1 of 17
Social @tsunamiide tsunami.io Earthquake Enterprises
K-Means Clustering
Social @tsunamiide tsunami.io Earthquake Enterprises
 Two parts
 Simple Clustering Algorithm
 Using ML with Large Datasets
Social @tsunamiide tsunami.io Earthquake Enterprises
 Very elegant
 Scales to large datasets
 It is simple and easy to learn
 Works with unsupervised data
Social @tsunamiide tsunami.io Earthquake Enterprises
 Competitive Analysis
 Compare products from Company A with
Company B by clustering them into groups
 Semi-Structured Search Engine
 Show different results to different users
depending on how they are classified
▪ What Google thinks about you:
https://www.google.com/settings/ads/onweb/
Social @tsunamiide tsunami.io Earthquake Enterprises
 Multivariate data set
 (i.e. each row is a float[])
 Classification is
labeled
 Not linearly
separable
 Popular for testing
ML Algorithms
Social @tsunamiide tsunami.io Earthquake Enterprises
 Iris data in (n-1)! charts
Social @tsunamiide tsunami.io Earthquake Enterprises
 E.g. Classifying text documents
 Charting no longer makes sense
 Need to rely derived metrics
Social @tsunamiide tsunami.io Earthquake Enterprises
 Euclidian
 Manhattan Distance
 Angle between
 Correlation
Social @tsunamiide tsunami.io Earthquake Enterprises
 Many ML algorithms rely on the features
to be in the range of [-1,1] or [0,1]
 K-means will work with any range but for
many distance functions larger ranges will
crowed out smaller ones
 We can use this to emphasize some
factors over others
Social @tsunamiide tsunami.io Earthquake Enterprises
 select the number of clusters (K)
 select a seed for each cluster (centroid)
 Do {
 assign each item in the training set to the
closest centroid
 update each centroid to the mean of the
assigned items }
 while (any of the centroids have moved)
Social @tsunamiide tsunami.io Earthquake Enterprises
 Number of clusters are known (3)
 Pick seed by randomly selecting 3 rows
from dataset
 We intentionally pick 3 close together for
demonstration
Social @tsunamiide tsunami.io Earthquake Enterprises
 Number of clusters
 Distance functions
 Feature scaling
 Datasets
 E.g. included abalone and breast cancer
datasets
Social @tsunamiide tsunami.io Earthquake Enterprises
Social @tsunamiide tsunami.io Earthquake Enterprises
 Faster algorithms
with more data will
often beat slower
algorithms with less
data.
Social @tsunamiide tsunami.io Earthquake Enterprises
 Some algorithms do not scale well
 e.g. Layered NN
 can take many days (not suited to tutorials)
 ML algorithms need to be run repeatedly
 Tuning hyper-parameters
 K-fold cross validation
 Feature discovery
Social @tsunamiide tsunami.io Earthquake Enterprises
 Random Forest
 Built in, popular and effective
 Leave one out
 My preferred
Social @tsunamiide tsunami.io Earthquake Enterprises
 Use a fast algorithm for factor discovery
 Use a slow algorithm for final solution
 Many competitions are won on starting the
slow algorithm as soon as possible

More Related Content

What's hot

Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structuresYoav chernobroda
 
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based ModelProbabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based ModelNick Malleson
 
Partial Binomial Distribution method for Generation capacity outage using Spr...
Partial Binomial Distribution method for Generation capacity outage using Spr...Partial Binomial Distribution method for Generation capacity outage using Spr...
Partial Binomial Distribution method for Generation capacity outage using Spr...vivatechijri
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsPooyan Jamshidi
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themAlbert Bifet
 
Gray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsGray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsATMOSPHERE .
 
Meteoio Introduction given by Mathias Bavey in Bozen
Meteoio Introduction given by Mathias Bavey in BozenMeteoio Introduction given by Mathias Bavey in Bozen
Meteoio Introduction given by Mathias Bavey in BozenRiccardo Rigon
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013MLconf
 
"Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ..."Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ...Dataconomy Media
 

What's hot (15)

Application of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
 
Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structures
 
Topology in ArcGIS
Topology in ArcGISTopology in ArcGIS
Topology in ArcGIS
 
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based ModelProbabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
 
Partial Binomial Distribution method for Generation capacity outage using Spr...
Partial Binomial Distribution method for Generation capacity outage using Spr...Partial Binomial Distribution method for Generation capacity outage using Spr...
Partial Binomial Distribution method for Generation capacity outage using Spr...
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
Gray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsGray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark Applications
 
Meteoio Introduction given by Mathias Bavey in Bozen
Meteoio Introduction given by Mathias Bavey in BozenMeteoio Introduction given by Mathias Bavey in Bozen
Meteoio Introduction given by Mathias Bavey in Bozen
 
Terra Populus Overview Poster
Terra Populus Overview PosterTerra Populus Overview Poster
Terra Populus Overview Poster
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 
Internship
InternshipInternship
Internship
 
"Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ..."Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ...
 
Chap02 01
Chap02 01Chap02 01
Chap02 01
 

Similar to Machine Learning - Matt Moloney

Data science technology overview
Data science technology overviewData science technology overview
Data science technology overviewSoojung Hong
 
Course 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos DeligiannisCourse 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos DeligiannisBetacowork
 
Jane Recommendation Engines
Jane Recommendation EnginesJane Recommendation Engines
Jane Recommendation EnginesAdam Rogers
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
XL-MINER:Introduction To Xl Miner
XL-MINER:Introduction To Xl MinerXL-MINER:Introduction To Xl Miner
XL-MINER:Introduction To Xl Minerxlminer content
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...Alex Liu
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarinn5712036
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data miningDatamining Tools
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6Rod Soto
 
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005Data Mining with SQL Server 2005
Data Mining with SQL Server 2005Dean Willson
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 

Similar to Machine Learning - Matt Moloney (20)

Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
Course 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos DeligiannisCourse 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos Deligiannis
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
Jane Recommendation Engines
Jane Recommendation EnginesJane Recommendation Engines
Jane Recommendation Engines
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
kdd2015
kdd2015kdd2015
kdd2015
 
XL-MINER:Introduction To Xl Miner
XL-MINER:Introduction To Xl MinerXL-MINER:Introduction To Xl Miner
XL-MINER:Introduction To Xl Miner
 
Introduction To XL-Miner
Introduction To XL-MinerIntroduction To XL-Miner
Introduction To XL-Miner
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
Poster
PosterPoster
Poster
 
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 

More from Phillip Trelford

How to be a rock star developer
How to be a rock star developerHow to be a rock star developer
How to be a rock star developerPhillip Trelford
 
FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015Phillip Trelford
 
Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Phillip Trelford
 
F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015Phillip Trelford
 
Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015Phillip Trelford
 
24 hours later - FSharp Gotham 2015
24 hours later - FSharp Gotham  201524 hours later - FSharp Gotham  2015
24 hours later - FSharp Gotham 2015Phillip Trelford
 
Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015Phillip Trelford
 
Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Phillip Trelford
 
FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015Phillip Trelford
 
Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015Phillip Trelford
 
F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015Phillip Trelford
 
F# for C# devs - Leeds Sharp 2015
F# for C# devs -  Leeds Sharp 2015F# for C# devs -  Leeds Sharp 2015
F# for C# devs - Leeds Sharp 2015Phillip Trelford
 
Build a compiler in 2hrs - NCrafts Paris 2015
Build a compiler in 2hrs -  NCrafts Paris 2015Build a compiler in 2hrs -  NCrafts Paris 2015
Build a compiler in 2hrs - NCrafts Paris 2015Phillip Trelford
 
24 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 201524 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 2015Phillip Trelford
 
Machine learning from disaster - GL.Net 2015
Machine learning from disaster  - GL.Net 2015Machine learning from disaster  - GL.Net 2015
Machine learning from disaster - GL.Net 2015Phillip Trelford
 
F# for Trading - QuantLabs 2014
F# for Trading -  QuantLabs 2014F# for Trading -  QuantLabs 2014
F# for Trading - QuantLabs 2014Phillip Trelford
 

More from Phillip Trelford (20)

How to be a rock star developer
How to be a rock star developerHow to be a rock star developer
How to be a rock star developer
 
Mobile F#un
Mobile F#unMobile F#un
Mobile F#un
 
F# eXchange Keynote 2016
F# eXchange Keynote 2016F# eXchange Keynote 2016
F# eXchange Keynote 2016
 
FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015
 
Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015
 
F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015
 
Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015
 
24 hours later - FSharp Gotham 2015
24 hours later - FSharp Gotham  201524 hours later - FSharp Gotham  2015
24 hours later - FSharp Gotham 2015
 
Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015
 
Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015
 
FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015
 
Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015
 
F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015
 
F# for C# devs - Leeds Sharp 2015
F# for C# devs -  Leeds Sharp 2015F# for C# devs -  Leeds Sharp 2015
F# for C# devs - Leeds Sharp 2015
 
Build a compiler in 2hrs - NCrafts Paris 2015
Build a compiler in 2hrs -  NCrafts Paris 2015Build a compiler in 2hrs -  NCrafts Paris 2015
Build a compiler in 2hrs - NCrafts Paris 2015
 
24 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 201524 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 2015
 
Real World F# - SDD 2015
Real World F# -  SDD 2015Real World F# -  SDD 2015
Real World F# - SDD 2015
 
F# for C# devs - SDD 2015
F# for C# devs - SDD 2015F# for C# devs - SDD 2015
F# for C# devs - SDD 2015
 
Machine learning from disaster - GL.Net 2015
Machine learning from disaster  - GL.Net 2015Machine learning from disaster  - GL.Net 2015
Machine learning from disaster - GL.Net 2015
 
F# for Trading - QuantLabs 2014
F# for Trading -  QuantLabs 2014F# for Trading -  QuantLabs 2014
F# for Trading - QuantLabs 2014
 

Recently uploaded

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Machine Learning - Matt Moloney

  • 1. Social @tsunamiide tsunami.io Earthquake Enterprises K-Means Clustering
  • 2. Social @tsunamiide tsunami.io Earthquake Enterprises  Two parts  Simple Clustering Algorithm  Using ML with Large Datasets
  • 3. Social @tsunamiide tsunami.io Earthquake Enterprises  Very elegant  Scales to large datasets  It is simple and easy to learn  Works with unsupervised data
  • 4. Social @tsunamiide tsunami.io Earthquake Enterprises  Competitive Analysis  Compare products from Company A with Company B by clustering them into groups  Semi-Structured Search Engine  Show different results to different users depending on how they are classified ▪ What Google thinks about you: https://www.google.com/settings/ads/onweb/
  • 5. Social @tsunamiide tsunami.io Earthquake Enterprises  Multivariate data set  (i.e. each row is a float[])  Classification is labeled  Not linearly separable  Popular for testing ML Algorithms
  • 6. Social @tsunamiide tsunami.io Earthquake Enterprises  Iris data in (n-1)! charts
  • 7. Social @tsunamiide tsunami.io Earthquake Enterprises  E.g. Classifying text documents  Charting no longer makes sense  Need to rely derived metrics
  • 8. Social @tsunamiide tsunami.io Earthquake Enterprises  Euclidian  Manhattan Distance  Angle between  Correlation
  • 9. Social @tsunamiide tsunami.io Earthquake Enterprises  Many ML algorithms rely on the features to be in the range of [-1,1] or [0,1]  K-means will work with any range but for many distance functions larger ranges will crowed out smaller ones  We can use this to emphasize some factors over others
  • 10. Social @tsunamiide tsunami.io Earthquake Enterprises  select the number of clusters (K)  select a seed for each cluster (centroid)  Do {  assign each item in the training set to the closest centroid  update each centroid to the mean of the assigned items }  while (any of the centroids have moved)
  • 11. Social @tsunamiide tsunami.io Earthquake Enterprises  Number of clusters are known (3)  Pick seed by randomly selecting 3 rows from dataset  We intentionally pick 3 close together for demonstration
  • 12. Social @tsunamiide tsunami.io Earthquake Enterprises  Number of clusters  Distance functions  Feature scaling  Datasets  E.g. included abalone and breast cancer datasets
  • 13. Social @tsunamiide tsunami.io Earthquake Enterprises
  • 14. Social @tsunamiide tsunami.io Earthquake Enterprises  Faster algorithms with more data will often beat slower algorithms with less data.
  • 15. Social @tsunamiide tsunami.io Earthquake Enterprises  Some algorithms do not scale well  e.g. Layered NN  can take many days (not suited to tutorials)  ML algorithms need to be run repeatedly  Tuning hyper-parameters  K-fold cross validation  Feature discovery
  • 16. Social @tsunamiide tsunami.io Earthquake Enterprises  Random Forest  Built in, popular and effective  Leave one out  My preferred
  • 17. Social @tsunamiide tsunami.io Earthquake Enterprises  Use a fast algorithm for factor discovery  Use a slow algorithm for final solution  Many competitions are won on starting the slow algorithm as soon as possible