SlideShare una empresa de Scribd logo
1 de 21
STL : A Similarity Measure Based on Semantic, Terminological and Linguistic Information Nitish Aggarwal joint work with Tobias Wunner, MihaelArcan DERI, NUI Galway firstname.lastname@deri.org Friday,19th Aug, 2011 DERI, Friday Meeting
Overview Motivation & Applications Why STL?  Semantic Terminology Linguistic Evaluation Conclusion and future work 2
Motivation & Applications SemanticAnnotation Similarity between corpus data and ontology concepts SAP AG held €1615 million in short-term liquid assets (2009) “dbpedia:SAP_AG” “xEBR:LiquidAssets” at “dbpedia:year:2009” 3
SemanticSearch Similarity between Query and index object Motivation & Applications SAP liquid asset in 2010 Current asset of SAP last year “dbpedia:SAP_AG” “xEBR:liquid asset” at “dbpedia:year:2010” Net cash of SAP in 2010 SAP total amount received in 2010 4
Motivation & Applications OntologyMatching & Alignment Similarity between ontology concepts ifrs:StatementOfFinancialPosition xebr:KeyBalanceSheet Assets Ifrs:Assets ifrs:BiologicalAssets xebr:SubscribedCapitalUnpaid Ifrs:CurrentAssets Ifrs:NonCurrentAssets xebr:FixedAssets xebr:CurrentAssets ifrs:PropertyPlantAndEquipment xebr:TangibleFixedAssets xebr:IntangibleFixedAssets xebr:Amount Receivable xebr:Liquid Assets Similarity = ? Similarity = ? ifrs:CashAndCashEquivalents Ifrs:TradeAndOtherCurrentReceivables Ifrs:Inventories 5
Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based LSA, ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 6
Why STL? Semantic Semanticstructure and relations Terminology complex terms expressing the same concept Linguistic  Phrase and dependency structure 7
STL Definition Linear combination of semantic, terminological and linguistic obtained by using a linear regression Formula used STL = w1*S + w2*T + w3*L + Constant w1, w2, w3 represent the contribution of each 8
Semantic WuPalmer 2*depth(MSCA) / depth(c1) + depth(c2) Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content (Pirro09) Overcome the analysis of large corpora 9
Cont. Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Pirro_Similarity 10
Cont. MSCA subconcepts = 48 IC (TFA) = 0.32 Assets Subscribed Capital Unpaid Fixed Assets Current Assets Pirro_Sim = 0.33 Pirro_Sim =? Stocks Tangible Fixed Assets Amount Receivable subconcepts = 6 IC (AR) = 0.69 subconcepts = 9 IC (TFA) = 0.60 Amount Receivable [total] Amount Receivable  with in one year Amount Receivable after more than one year Other Tangible Fixed Assets Property, Plant  and Equipment Payments on account and asset in construction Furniture Fixture and Equipment Trade Debtors Other Fixture Land and Building Other Debtors Plant and Machinery Other Property, Plant  and Equipment Property, Plant  and Equipment [Total] 11
Limitation Does semantic structure reflect a good similarity? not necessarily e.g. In xEBR, parent-child relation for describing the layout of 	    	concepts “Work in progress” is not a type of asset, although both are linked via the parent-child relationship   12
Terminology Definition Common naming convention Ngram Vs subterms In financial domain, bigram ”Intangible Fixed” is a subtring of ”Other Intangible Fixed Assets” but not a subterm. Terminological similarity maximal subterm overlap 13
Cont. Trade Debts Payable After More Than One Year  [[Trade][Debts]][Payable][After More Than One Year] [SAP:Payable] [Ifrs:After More Than One Year] [Investoword:Debt] [FinanceDict:Trade Debts] [Investopedia:Trade] Financial[Debts][Payable][After More Than One Year] Financial Debts Payable After More Than One Year  14
Multilingual Subterms Translatedsubterms Available in otherlanguages Advantage Reflect terminological similarities that may be available in one language but not in others. ”Property Plant and Equipment”@en ”Sachanlagen”@de ”Tangible Fixed Asset” @en 15
Linguistic	 Syntactic Information Beyond simple word order phrase structure Dependency structure Phrase structure Intangible fixed : adj adj > ?? Intangible fixed assets : adj adj n > NP Dependency structure Amounts receivable : N Adv : receive:mod, amounts:head Received amounts : V N : receive:mod, amounts:head 16
Evaluation Data Set xEBR finance vocabulary 269 terms (concept labels) 72,361(269*269) termpairs Benchmarks SimSem59: sample of 59 term pairs SimSem200 : sample of 200 term pairs (under construction) 17
Experiment An overview of similarity measures 18
Experiment Results (Simsem59) STL formula used STL = 0.1531 * S + 0.5218 * T + 0.1041 * L + 0.1791 Correlation between similarity scores & simsem59 Semantic  Contribution Terminology Contribution Linguistic  Contribution 19
Conclusion STL outperforms more traditional similarity measures Largest contribution by T (Terminological Analysis) Multilingual subterms performs better than monolingual 20
Future work Evaluation on larger data set and vocabularies (IFRS) 3000+ terms  9M term pairs richer set of linguistic operations “recognise” => “recognition”  	by derivation rule verb_lemma+"ion” Similarity between subterms “Staff Costs” and "Wages And Salaries" 21

Más contenido relacionado

La actualidad más candente

110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...helggeist
 
XBRL - Features and Fundamental
XBRL - Features and FundamentalXBRL - Features and Fundamental
XBRL - Features and FundamentalSundar B N
 
XBRL Conversion Steps
XBRL Conversion StepsXBRL Conversion Steps
XBRL Conversion Stepstrivesa
 
Understanding XBRL
Understanding XBRLUnderstanding XBRL
Understanding XBRLMamta Binani
 

La actualidad más candente (10)

Overview of XBRL by FinDynamics.com
Overview of XBRL by FinDynamics.comOverview of XBRL by FinDynamics.com
Overview of XBRL by FinDynamics.com
 
Gaia 5
Gaia 5Gaia 5
Gaia 5
 
Xbrl india[1]
Xbrl india[1]Xbrl india[1]
Xbrl india[1]
 
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
 
XBRL - Features and Fundamental
XBRL - Features and FundamentalXBRL - Features and Fundamental
XBRL - Features and Fundamental
 
XBRL Conversion Steps
XBRL Conversion StepsXBRL Conversion Steps
XBRL Conversion Steps
 
Understanding XBRL
Understanding XBRLUnderstanding XBRL
Understanding XBRL
 
XBRL Fundamentals
XBRL FundamentalsXBRL Fundamentals
XBRL Fundamentals
 
XBRL Overview
XBRL OverviewXBRL Overview
XBRL Overview
 
Xbrl slideshare
Xbrl slideshareXbrl slideshare
Xbrl slideshare
 

Similar a STL: A similarity measure based on semantic and linguistic information

Semantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlSemantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlTobias Wunner
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Tobias Wunner
 
Financial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesFinancial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesMike Bennett
 
Arch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxArch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxSanjoy Kumar Roy
 
Les week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlLes week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlIfk Bigfood
 
Implementing information federation
Implementing information federationImplementing information federation
Implementing information federationCory Casanave
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSiJohn O'Gorman
 
Chapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesChapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesjps619
 
SSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsSSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsJohn Bauer
 
What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11SAP Technology
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations Icd_crisci
 
Cloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportCloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportMMMTechLaw
 
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryNeo4j
 
Chapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesChapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesjps619
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Megan Bowe
 

Similar a STL: A similarity measure based on semantic and linguistic information (20)

Semantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlSemantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrl
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
 
Financial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesFinancial Industry Semantics and Ontologies
Financial Industry Semantics and Ontologies
 
Arch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxArch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptx
 
Les week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlLes week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrl
 
Implementing information federation
Implementing information federationImplementing information federation
Implementing information federation
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSi
 
Chapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesChapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvalues
 
42109 scudeletti (1)
42109 scudeletti (1)42109 scudeletti (1)
42109 scudeletti (1)
 
Mike Bennett
Mike BennettMike Bennett
Mike Bennett
 
SSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsSSO Strategy Implementation Considerations
SSO Strategy Implementation Considerations
 
What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations I
 
CV Tuyen Ly Eng 2017 01-09
CV Tuyen Ly Eng 2017 01-09CV Tuyen Ly Eng 2017 01-09
CV Tuyen Ly Eng 2017 01-09
 
Cloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportCloud insights m&a and capital markets report
Cloud insights m&a and capital markets report
 
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
 
Chapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesChapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrules
 
Wetzel, "CORE, Cost of Resource Exchange Update"
Wetzel, "CORE, Cost of Resource Exchange Update"Wetzel, "CORE, Cost of Resource Exchange Update"
Wetzel, "CORE, Cost of Resource Exchange Update"
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018
 

Último

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 

Último (20)

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 

STL: A similarity measure based on semantic and linguistic information

  • 1. STL : A Similarity Measure Based on Semantic, Terminological and Linguistic Information Nitish Aggarwal joint work with Tobias Wunner, MihaelArcan DERI, NUI Galway firstname.lastname@deri.org Friday,19th Aug, 2011 DERI, Friday Meeting
  • 2. Overview Motivation & Applications Why STL? Semantic Terminology Linguistic Evaluation Conclusion and future work 2
  • 3. Motivation & Applications SemanticAnnotation Similarity between corpus data and ontology concepts SAP AG held €1615 million in short-term liquid assets (2009) “dbpedia:SAP_AG” “xEBR:LiquidAssets” at “dbpedia:year:2009” 3
  • 4. SemanticSearch Similarity between Query and index object Motivation & Applications SAP liquid asset in 2010 Current asset of SAP last year “dbpedia:SAP_AG” “xEBR:liquid asset” at “dbpedia:year:2010” Net cash of SAP in 2010 SAP total amount received in 2010 4
  • 5. Motivation & Applications OntologyMatching & Alignment Similarity between ontology concepts ifrs:StatementOfFinancialPosition xebr:KeyBalanceSheet Assets Ifrs:Assets ifrs:BiologicalAssets xebr:SubscribedCapitalUnpaid Ifrs:CurrentAssets Ifrs:NonCurrentAssets xebr:FixedAssets xebr:CurrentAssets ifrs:PropertyPlantAndEquipment xebr:TangibleFixedAssets xebr:IntangibleFixedAssets xebr:Amount Receivable xebr:Liquid Assets Similarity = ? Similarity = ? ifrs:CashAndCashEquivalents Ifrs:TradeAndOtherCurrentReceivables Ifrs:Inventories 5
  • 6. Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based LSA, ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 6
  • 7. Why STL? Semantic Semanticstructure and relations Terminology complex terms expressing the same concept Linguistic Phrase and dependency structure 7
  • 8. STL Definition Linear combination of semantic, terminological and linguistic obtained by using a linear regression Formula used STL = w1*S + w2*T + w3*L + Constant w1, w2, w3 represent the contribution of each 8
  • 9. Semantic WuPalmer 2*depth(MSCA) / depth(c1) + depth(c2) Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content (Pirro09) Overcome the analysis of large corpora 9
  • 10. Cont. Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Pirro_Similarity 10
  • 11. Cont. MSCA subconcepts = 48 IC (TFA) = 0.32 Assets Subscribed Capital Unpaid Fixed Assets Current Assets Pirro_Sim = 0.33 Pirro_Sim =? Stocks Tangible Fixed Assets Amount Receivable subconcepts = 6 IC (AR) = 0.69 subconcepts = 9 IC (TFA) = 0.60 Amount Receivable [total] Amount Receivable with in one year Amount Receivable after more than one year Other Tangible Fixed Assets Property, Plant and Equipment Payments on account and asset in construction Furniture Fixture and Equipment Trade Debtors Other Fixture Land and Building Other Debtors Plant and Machinery Other Property, Plant and Equipment Property, Plant and Equipment [Total] 11
  • 12. Limitation Does semantic structure reflect a good similarity? not necessarily e.g. In xEBR, parent-child relation for describing the layout of concepts “Work in progress” is not a type of asset, although both are linked via the parent-child relationship 12
  • 13. Terminology Definition Common naming convention Ngram Vs subterms In financial domain, bigram ”Intangible Fixed” is a subtring of ”Other Intangible Fixed Assets” but not a subterm. Terminological similarity maximal subterm overlap 13
  • 14. Cont. Trade Debts Payable After More Than One Year [[Trade][Debts]][Payable][After More Than One Year] [SAP:Payable] [Ifrs:After More Than One Year] [Investoword:Debt] [FinanceDict:Trade Debts] [Investopedia:Trade] Financial[Debts][Payable][After More Than One Year] Financial Debts Payable After More Than One Year 14
  • 15. Multilingual Subterms Translatedsubterms Available in otherlanguages Advantage Reflect terminological similarities that may be available in one language but not in others. ”Property Plant and Equipment”@en ”Sachanlagen”@de ”Tangible Fixed Asset” @en 15
  • 16. Linguistic Syntactic Information Beyond simple word order phrase structure Dependency structure Phrase structure Intangible fixed : adj adj > ?? Intangible fixed assets : adj adj n > NP Dependency structure Amounts receivable : N Adv : receive:mod, amounts:head Received amounts : V N : receive:mod, amounts:head 16
  • 17. Evaluation Data Set xEBR finance vocabulary 269 terms (concept labels) 72,361(269*269) termpairs Benchmarks SimSem59: sample of 59 term pairs SimSem200 : sample of 200 term pairs (under construction) 17
  • 18. Experiment An overview of similarity measures 18
  • 19. Experiment Results (Simsem59) STL formula used STL = 0.1531 * S + 0.5218 * T + 0.1041 * L + 0.1791 Correlation between similarity scores & simsem59 Semantic Contribution Terminology Contribution Linguistic Contribution 19
  • 20. Conclusion STL outperforms more traditional similarity measures Largest contribution by T (Terminological Analysis) Multilingual subterms performs better than monolingual 20
  • 21. Future work Evaluation on larger data set and vocabularies (IFRS) 3000+ terms 9M term pairs richer set of linguistic operations “recognise” => “recognition” by derivation rule verb_lemma+"ion” Similarity between subterms “Staff Costs” and "Wages And Salaries" 21