SlideShare a Scribd company logo
1 of 19
 
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Linnaean Taxonomy Example
Phenomenon: Technology Space ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Patent Classification Taxonomy http://uspto.gov/go/classification/selectnumwithtitle.htm http://www.uspto.gov/web/offices/opc/documents/classescombined.pdf
USCL Hierarchy (Class and above)
USCL Class 704 (Subclass Level)
Traditional Distance Measures ,[object Object],[object Object],[object Object],[object Object]
Limitations of Traditional Measures ,[object Object],[object Object],[object Object],[object Object],A21B A21C B60F A 21B A 21B A 01M = = ≠ ≠ ,[object Object],[object Object],[object Object],[object Object]
Taxonomically Appropriate Measure ,[object Object],[object Object],[object Object],=Number of times classification i is assigned to entity A =Number of times classification i is assigned to entity B =Frequency of Patents Classified within subtree subsumed by parent of classification i =Frequency of Patents Classified within subtree subsumed by classification i
Class & Subclass Array Expansion ,[object Object],Dimension / Level Classification  (Dimension Name) Description 1 G1-02 COMMUNICATIONS, RADIANT ENERGY, WEAPONS, ELECTRICAL, AND COMPUTER ARTS 2 G1-02/G2-05 … / CALCULATORS, COMPUTERS, OR DATA PROCESSING SYSTEMS 3 G1-02/G2-05 /704 … / DATA PROCESSING: SPEECH SIGNAL PROCESSING, LINGUISTICS, LANGUAGE TRANSLATION, AND AUDIO COMPRESSION-DECOMPRESSION 4 G1-02/G2-05 /704/200 … / SPEECH SIGNAL PROCESSING 5 G1-02/G2-05 /704/200/231 … / Recognition 6 G1-02/G2-05 /704/200/231/232 … / Neural network
IDF Weighting ,[object Object],[object Object],[object Object],[object Object],[object Object]
Patent Example 6 7 7 6 5 5 5 5 4 4 3 3 2 2 2 1 1 7 7 4 3 2
Dataset #1: Traditional Methods ,[object Object],[object Object],Primary Only All Classifications Class Level Class-Subclass Level Graphs show frequency of similarity calculations within samples Left most is similarity = 0 Right most is similarity = 1
Dataset #1: Taxonomical Method Primary Only All Classifications
Dataset #1: Traditional vs. Taxonomical Subclass Level Class Level
Dataset #2: Traditional Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],Class Level Jaffe Subcategory Level Jaffe Category Level
Dataset #2: Taxonomical Method vs. Class vs. Subcategory vs. Category ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

Similar to Similarity and Distance Measures for Hierarchical Taxonomies

An Ontology-based Decision Support Framework for Personalized Quality of Life...
An Ontology-based Decision Support Framework for Personalized Quality of Life...An Ontology-based Decision Support Framework for Personalized Quality of Life...
An Ontology-based Decision Support Framework for Personalized Quality of Life...Marina Riga
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation sourcebutest
 
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectCosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectIntland Software GmbH
 
Smart appliances EupP interoperability
Smart appliances EupP interoperabilitySmart appliances EupP interoperability
Smart appliances EupP interoperabilityRogelio Segovia
 
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...IRJET Journal
 
Hsis2005 Geospatial Nomadeyes Full
Hsis2005 Geospatial Nomadeyes FullHsis2005 Geospatial Nomadeyes Full
Hsis2005 Geospatial Nomadeyes Fullmartindudziak
 
Trilinos progress, challenges and future plans
Trilinos progress, challenges and future plansTrilinos progress, challenges and future plans
Trilinos progress, challenges and future plansM Reza Rahmati
 
Oop2018 tutorial-stal-mo2-io t-arduino-en
Oop2018 tutorial-stal-mo2-io t-arduino-enOop2018 tutorial-stal-mo2-io t-arduino-en
Oop2018 tutorial-stal-mo2-io t-arduino-enMichael Stal
 
Databasing the world
Databasing the worldDatabasing the world
Databasing the worldChen Zhang
 
Handling High Cardinality Categoricals via Target Encodings
Handling High Cardinality Categoricals via Target Encodings Handling High Cardinality Categoricals via Target Encodings
Handling High Cardinality Categoricals via Target Encodings zouzias
 
Artificial intelligent
Artificial intelligentArtificial intelligent
Artificial intelligentALi Akram
 
Short TRIZ Workshop for the University of the Philippines
Short TRIZ Workshop for the University of the PhilippinesShort TRIZ Workshop for the University of the Philippines
Short TRIZ Workshop for the University of the PhilippinesRichard Platt
 
Triz Basics -Product Design & Development
Triz Basics -Product Design & DevelopmentTriz Basics -Product Design & Development
Triz Basics -Product Design & DevelopmentQRCE
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowOswald Campesato
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly CommunityMarko Rodriguez
 
Demystifying Semantics:Practical Utilization of Semantic Technologies for Rea...
Demystifying Semantics:Practical Utilization of Semantic Technologies for Rea...Demystifying Semantics:Practical Utilization of Semantic Technologies for Rea...
Demystifying Semantics:Practical Utilization of Semantic Technologies for Rea...OSTHUS
 
Technology & innovation Management Course - Session 2
Technology & innovation Management Course - Session 2Technology & innovation Management Course - Session 2
Technology & innovation Management Course - Session 2Dan Toma
 

Similar to Similarity and Distance Measures for Hierarchical Taxonomies (20)

An Ontology-based Decision Support Framework for Personalized Quality of Life...
An Ontology-based Decision Support Framework for Personalized Quality of Life...An Ontology-based Decision Support Framework for Personalized Quality of Life...
An Ontology-based Decision Support Framework for Personalized Quality of Life...
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
 
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectCosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
 
Smart appliances EupP interoperability
Smart appliances EupP interoperabilitySmart appliances EupP interoperability
Smart appliances EupP interoperability
 
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
Hsis2005 Geospatial Nomadeyes Full
Hsis2005 Geospatial Nomadeyes FullHsis2005 Geospatial Nomadeyes Full
Hsis2005 Geospatial Nomadeyes Full
 
Trilinos progress, challenges and future plans
Trilinos progress, challenges and future plansTrilinos progress, challenges and future plans
Trilinos progress, challenges and future plans
 
Oop2018 tutorial-stal-mo2-io t-arduino-en
Oop2018 tutorial-stal-mo2-io t-arduino-enOop2018 tutorial-stal-mo2-io t-arduino-en
Oop2018 tutorial-stal-mo2-io t-arduino-en
 
Databasing the world
Databasing the worldDatabasing the world
Databasing the world
 
Handling High Cardinality Categoricals via Target Encodings
Handling High Cardinality Categoricals via Target Encodings Handling High Cardinality Categoricals via Target Encodings
Handling High Cardinality Categoricals via Target Encodings
 
Artificial intelligent
Artificial intelligentArtificial intelligent
Artificial intelligent
 
Short TRIZ Workshop for the University of the Philippines
Short TRIZ Workshop for the University of the PhilippinesShort TRIZ Workshop for the University of the Philippines
Short TRIZ Workshop for the University of the Philippines
 
Triz Basics -Product Design & Development
Triz Basics -Product Design & DevelopmentTriz Basics -Product Design & Development
Triz Basics -Product Design & Development
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 
Demystifying Semantics:Practical Utilization of Semantic Technologies for Rea...
Demystifying Semantics:Practical Utilization of Semantic Technologies for Rea...Demystifying Semantics:Practical Utilization of Semantic Technologies for Rea...
Demystifying Semantics:Practical Utilization of Semantic Technologies for Rea...
 
Iswc2008
Iswc2008Iswc2008
Iswc2008
 
Technology & innovation Management Course - Session 2
Technology & innovation Management Course - Session 2Technology & innovation Management Course - Session 2
Technology & innovation Management Course - Session 2
 

Similarity and Distance Measures for Hierarchical Taxonomies

Editor's Notes

  1. This paper introduces an appropriate methodology to the management domain that allows the use of hierarchical taxonomies for calculating distance and similarity.
  2. Although my example is based on technology space and the patent classification system it applies to any theoretically complex space that can be characterized by a hierarchical taxonomy of classification: ex. industry and SIC/NAICS codes; culture and demographic variables, etc…
  3. If we look at the theoretical phenomenon we are examining we should utilize constructs and measures that are as loyal to this view as possible
  4. I extracted the data from the USPTO website and classes combined document. Then I extracted the count of patents classified into each of the 150,000+ class/subclasses
  5. This extension draws on established methods from fields like machine learning and information search and retrieval.
  6. Interestingly, the original quote accurately assumes that subclasses with more classifications are more important within the technology space, however, these subclasses are actually less important in establishing the similarity of two patents.
  7. Since this uses probability within the entire universe of patents this effectively normalizes distance across all levels of analysis and whole universe – i.e. the analysis of similarity within a industry will be quite high whereas the similarity between industries should be much lower.
  8. Top right actually shows somewhat arbitrary results based on non-704 classes in the sample – this does not describe the technology space within field 704 but rather something like interplay of class 704 with other fields.