SlideShare a Scribd company logo
1 of 11
10/15/18 Heiko Paulheim 1
How much is a Triple?
Heiko Paulheim
10/15/18 Heiko Paulheim 2
Yesterday’s News...
Was that
a good deal?
10/15/18 Heiko Paulheim 3
...and Today’s Calculations
• So what would have been a good price for Freebase?
• Some back of the envelope calculations...
10/15/18 Heiko Paulheim 4
What Do We Know?
• No. of facts in Freebase: 3B
• Cost of creating a single fact: unknown
• Freebase was edited similar to a Wiki
– assumption: adding a fact is as expensive
like adding a sentence in Wikipedia
10/15/18 Heiko Paulheim 5
Cost of Manual Triple Creation
• Assumption: adding a fact is as expensive
like adding a sentence in Wikipedia
– English Wikipedia up to April 2011: 41M working hours
(Geiger and Halfaker, 2013)
●
size in April 2011: 3.6M pages, avg. 36.4 sentences each
→ 18.7 minutes per sentence
●
using US minimum wage: $2.25 per sentence
→ $2.25 per statement
• Result: total cost of creating Freebase would be $6.75B
• Cyc
– Total development cost: $120M (according to a presentation by Lenat in
2017)
– Total #statements: 21M
→ $5.71 per statement
10/15/18 Heiko Paulheim 6
Cost of Automatic/Heuristic Creation
• DBpedia
– 4.9M LOC, 2.2M LOC for mappings
software project development: ~37 LOC per hour
(Devanbu et al., 1996)
we use German PhD salaries as a cost estimate
→ 1.85c per statement
– YAGO: made from 1.6M LOC
uses WordNet: 117k synsets, we treat each synset like a Wiki page
→ 0.83c per statement
– NELL: 103k LOC
→ 14.25c per statement
• Compared to manual curation: saving factor 16-250
10/15/18 Heiko Paulheim 7
Cost vs. Quality
• Graph error rate against cost
– we can pay for accuracy
– NELL is a bit of an outlier
• Error rates according to Färber et al. (2018), Mitchell et al. (2015)
10/15/18 Heiko Paulheim 8
Summary
• We can estimate the cost of KG creation
• A manually curated triple costs about $2 to $6
• An automatically/heuristically created triple costs about 1c-15c
– saving factor: around 100
• We can observe a relation
between cost and quality
10/15/18 Heiko Paulheim 9
Open Questions
• Debatable approximations
– can we do better?
• Rate KG refinement approaches by their cost
• What about Wikidata?
• What about the provision
and maintenance Cost?
• ...and...
10/15/18 Heiko Paulheim 10
...back to the Initial Question
Was that
a good deal?
acquisition by Google
estimated as $60-$300M!
estimated value of
Freebase: ~$6.75B
10/15/18 Heiko Paulheim 11
How much is a Triple?
Heiko Paulheim

More Related Content

What's hot

Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
Peter Löwe
 
02 buchberger it-chain-day3_ecc2012
02 buchberger it-chain-day3_ecc201202 buchberger it-chain-day3_ecc2012
02 buchberger it-chain-day3_ecc2012
ClusterExcellence
 

What's hot (20)

Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Linked Data at the German National Library
Linked Data at the German National LibraryLinked Data at the German National Library
Linked Data at the German National Library
 
Linked data in the German National Library at the OCLC IFLA round table 2013
Linked data in the German National Library at the OCLC IFLA round table 2013Linked data in the German National Library at the OCLC IFLA round table 2013
Linked data in the German National Library at the OCLC IFLA round table 2013
 
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
 
Brussels Capital of Data Science
Brussels Capital of Data ScienceBrussels Capital of Data Science
Brussels Capital of Data Science
 
How to become the best datascientist in Europe
How to become the best datascientist in EuropeHow to become the best datascientist in Europe
How to become the best datascientist in Europe
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
European creativity festival 2014: DataViz workshop
European creativity festival 2014: DataViz workshopEuropean creativity festival 2014: DataViz workshop
European creativity festival 2014: DataViz workshop
 
Archaeological tower blocks…? Computational and Theoretical ghettos
Archaeological tower blocks…? Computational and Theoretical ghettosArchaeological tower blocks…? Computational and Theoretical ghettos
Archaeological tower blocks…? Computational and Theoretical ghettos
 
Digitale Tools für Geflüchtete und Unterstützer - betterplace lab
Digitale Tools für Geflüchtete und Unterstützer - betterplace labDigitale Tools für Geflüchtete und Unterstützer - betterplace lab
Digitale Tools für Geflüchtete und Unterstützer - betterplace lab
 
02 buchberger it-chain-day3_ecc2012
02 buchberger it-chain-day3_ecc201202 buchberger it-chain-day3_ecc2012
02 buchberger it-chain-day3_ecc2012
 

Similar to How much is a Triple?

WhoIsKenDeLong
WhoIsKenDeLongWhoIsKenDeLong
WhoIsKenDeLong
Ken DeLong
 

Similar to How much is a Triple? (6)

DLF Forum 2015: Beyond eMOP
DLF Forum 2015: Beyond eMOPDLF Forum 2015: Beyond eMOP
DLF Forum 2015: Beyond eMOP
 
e-Consultation Platforms: Generating or just Recycling Ideas?
e-Consultation Platforms: Generating or just Recycling Ideas?e-Consultation Platforms: Generating or just Recycling Ideas?
e-Consultation Platforms: Generating or just Recycling Ideas?
 
On the Reproducibility of the TAGME entity linking system
On the Reproducibility of the TAGME entity linking systemOn the Reproducibility of the TAGME entity linking system
On the Reproducibility of the TAGME entity linking system
 
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 22018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
 
WhoIsKenDeLong
WhoIsKenDeLongWhoIsKenDeLong
WhoIsKenDeLong
 
Pair writing: better content, more customer-focused
Pair writing: better content, more customer-focusedPair writing: better content, more customer-focused
Pair writing: better content, more customer-focused
 

More from Heiko Paulheim

More from Heiko Paulheim (13)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpedia
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
 
Extending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List PagesExtending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List Pages
 

Recently uploaded

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

How much is a Triple?

  • 1. 10/15/18 Heiko Paulheim 1 How much is a Triple? Heiko Paulheim
  • 2. 10/15/18 Heiko Paulheim 2 Yesterday’s News... Was that a good deal?
  • 3. 10/15/18 Heiko Paulheim 3 ...and Today’s Calculations • So what would have been a good price for Freebase? • Some back of the envelope calculations...
  • 4. 10/15/18 Heiko Paulheim 4 What Do We Know? • No. of facts in Freebase: 3B • Cost of creating a single fact: unknown • Freebase was edited similar to a Wiki – assumption: adding a fact is as expensive like adding a sentence in Wikipedia
  • 5. 10/15/18 Heiko Paulheim 5 Cost of Manual Triple Creation • Assumption: adding a fact is as expensive like adding a sentence in Wikipedia – English Wikipedia up to April 2011: 41M working hours (Geiger and Halfaker, 2013) ● size in April 2011: 3.6M pages, avg. 36.4 sentences each → 18.7 minutes per sentence ● using US minimum wage: $2.25 per sentence → $2.25 per statement • Result: total cost of creating Freebase would be $6.75B • Cyc – Total development cost: $120M (according to a presentation by Lenat in 2017) – Total #statements: 21M → $5.71 per statement
  • 6. 10/15/18 Heiko Paulheim 6 Cost of Automatic/Heuristic Creation • DBpedia – 4.9M LOC, 2.2M LOC for mappings software project development: ~37 LOC per hour (Devanbu et al., 1996) we use German PhD salaries as a cost estimate → 1.85c per statement – YAGO: made from 1.6M LOC uses WordNet: 117k synsets, we treat each synset like a Wiki page → 0.83c per statement – NELL: 103k LOC → 14.25c per statement • Compared to manual curation: saving factor 16-250
  • 7. 10/15/18 Heiko Paulheim 7 Cost vs. Quality • Graph error rate against cost – we can pay for accuracy – NELL is a bit of an outlier • Error rates according to Färber et al. (2018), Mitchell et al. (2015)
  • 8. 10/15/18 Heiko Paulheim 8 Summary • We can estimate the cost of KG creation • A manually curated triple costs about $2 to $6 • An automatically/heuristically created triple costs about 1c-15c – saving factor: around 100 • We can observe a relation between cost and quality
  • 9. 10/15/18 Heiko Paulheim 9 Open Questions • Debatable approximations – can we do better? • Rate KG refinement approaches by their cost • What about Wikidata? • What about the provision and maintenance Cost? • ...and...
  • 10. 10/15/18 Heiko Paulheim 10 ...back to the Initial Question Was that a good deal? acquisition by Google estimated as $60-$300M! estimated value of Freebase: ~$6.75B
  • 11. 10/15/18 Heiko Paulheim 11 How much is a Triple? Heiko Paulheim