SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
Q: Is it possible
                           to automate

                            METADATA
                            CREATION?

Thursday, March 8, 2012
Or, alternatively:
                          Will I be replaced by a computer?
                                          -or-
                           Should I have gone to school
                                for computer science?


Thursday, March 8, 2012
How does it work?

                          There are 2 ways of automatically
                          creating metadata:
                          1) Text mining/clustering “Extraction”
                          2) Machine learning techniques
                          “Har vesting”



Thursday, March 8, 2012
Extraction vs.
                                  Har vesting
                          Metadata extraction involves the mining of
                          resource content (text-mining) and employs
                          sophisticated automatic indexing techniques to
                          produce structured (“labelled”) metadata for object
                          representation.
                          Metadata har vesting relies on machine
                          capabilities to collect tagged metadata previously
                          created by humans, machine processing, or both.


                                  Library of Congress, AMeGA Project Report
Thursday, March 8, 2012
What kind of metadata can be
                    automatically created?
           Best: Technical or Structural
           (format, date, page #s)*
           OK: Descriptive (title, abstract)*
           Not-so-good: Semantic (keywords, subject
           matter)

           *Not so effective for when documents have special
           layouts or structures.

Thursday, March 8, 2012
Why bother?
           Lessens time and effort required (Burk et al.,
           2007).
           “The enormous volume of online and digital
           resources makes semi-automatic metadata
           generation a critical need” (Park, & Lu, 2009).
           Alleviate the problems associated with
           “metadata bottleneck”.
           Better to start with something rather than
           nothing.
Thursday, March 8, 2012
Tim Berners-Lee
            Inventor of the World Wide Web




Thursday, March 8, 2012
“It’s really important to have a lot of data.”
                           “We haven’t got data on the Web as data.”

                          “Data can... help us understand the world.”
                       Tim Berners-Lee. (2009, February). “Tim Berners-Lee on the next Web.”
            TED Talk. <http://www.ted.com/talks/lang/eng/tim_berners_lee_on_the_next_web.html>




Thursday, March 8, 2012
A more efficient way to
                     present data
                          An example of the
                          automatic creation of
                          data to be reused.

                          Dates are extracted
                          by Google and
                          rearranged into a
                          timeline.




Thursday, March 8, 2012
A: Kind of/
                          it depends...


Thursday, March 8, 2012
Conclusions
           No artificial intelligence yet!
           Automated metadata creation can be used, but
           only with human inter vention.
           Some metadata types are easier to automate.
           Automation of metadata creation is not widely
           used in libraries yet.


Thursday, March 8, 2012
Questions?



Thursday, March 8, 2012

Más contenido relacionado

La actualidad más candente

Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
Trey Grainger
 

La actualidad más candente (7)

Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business Leaders
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Getting Comfortable with Metadata Reuse
Getting Comfortable with Metadata ReuseGetting Comfortable with Metadata Reuse
Getting Comfortable with Metadata Reuse
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
 

Similar a Metadata extraction

Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
butest
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis f
maru kindeneh
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
University of Bologna
 

Similar a Metadata extraction (20)

Big Data @ Bodensee Barcamp 2010
Big Data @ Bodensee Barcamp 2010Big Data @ Bodensee Barcamp 2010
Big Data @ Bodensee Barcamp 2010
 
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the WebTwenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
 
Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)
 
Design neural networks with meta learning
Design neural networks with meta learningDesign neural networks with meta learning
Design neural networks with meta learning
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
 
320 324
320 324320 324
320 324
 
How Semantic Technology Can Stay Relevant in the Big Data Age?
How Semantic Technology Can Stay Relevant in the Big Data Age?How Semantic Technology Can Stay Relevant in the Big Data Age?
How Semantic Technology Can Stay Relevant in the Big Data Age?
 
Semantic Web Landscape 2009
Semantic Web Landscape 2009Semantic Web Landscape 2009
Semantic Web Landscape 2009
 
Jonathan hendler deri - galway - feb 25 2008
Jonathan hendler   deri - galway - feb 25 2008Jonathan hendler   deri - galway - feb 25 2008
Jonathan hendler deri - galway - feb 25 2008
 
Data Research Vision
Data Research VisionData Research Vision
Data Research Vision
 
Linked Data Approach for Integration of Human Health & Environmental Data
Linked Data Approach for Integration of Human Health & Environmental DataLinked Data Approach for Integration of Human Health & Environmental Data
Linked Data Approach for Integration of Human Health & Environmental Data
 
What Is Artificial Intelligence? Part 1/10
What Is Artificial Intelligence? Part 1/10What Is Artificial Intelligence? Part 1/10
What Is Artificial Intelligence? Part 1/10
 
Approaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep ProjectApproaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep Project
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
 
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
 
Data Annotation in The World Of ML.pdf
Data Annotation in The World Of ML.pdfData Annotation in The World Of ML.pdf
Data Annotation in The World Of ML.pdf
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis f
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBC
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Sweo talk
Sweo talkSweo talk
Sweo talk
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Último (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Metadata extraction

  • 1. Q: Is it possible to automate METADATA CREATION? Thursday, March 8, 2012
  • 2. Or, alternatively: Will I be replaced by a computer? -or- Should I have gone to school for computer science? Thursday, March 8, 2012
  • 3. How does it work? There are 2 ways of automatically creating metadata: 1) Text mining/clustering “Extraction” 2) Machine learning techniques “Har vesting” Thursday, March 8, 2012
  • 4. Extraction vs. Har vesting Metadata extraction involves the mining of resource content (text-mining) and employs sophisticated automatic indexing techniques to produce structured (“labelled”) metadata for object representation. Metadata har vesting relies on machine capabilities to collect tagged metadata previously created by humans, machine processing, or both. Library of Congress, AMeGA Project Report Thursday, March 8, 2012
  • 5. What kind of metadata can be automatically created? Best: Technical or Structural (format, date, page #s)* OK: Descriptive (title, abstract)* Not-so-good: Semantic (keywords, subject matter) *Not so effective for when documents have special layouts or structures. Thursday, March 8, 2012
  • 6. Why bother? Lessens time and effort required (Burk et al., 2007). “The enormous volume of online and digital resources makes semi-automatic metadata generation a critical need” (Park, & Lu, 2009). Alleviate the problems associated with “metadata bottleneck”. Better to start with something rather than nothing. Thursday, March 8, 2012
  • 7. Tim Berners-Lee Inventor of the World Wide Web Thursday, March 8, 2012
  • 8. “It’s really important to have a lot of data.” “We haven’t got data on the Web as data.” “Data can... help us understand the world.” Tim Berners-Lee. (2009, February). “Tim Berners-Lee on the next Web.” TED Talk. <http://www.ted.com/talks/lang/eng/tim_berners_lee_on_the_next_web.html> Thursday, March 8, 2012
  • 9. A more efficient way to present data An example of the automatic creation of data to be reused. Dates are extracted by Google and rearranged into a timeline. Thursday, March 8, 2012
  • 10. A: Kind of/ it depends... Thursday, March 8, 2012
  • 11. Conclusions No artificial intelligence yet! Automated metadata creation can be used, but only with human inter vention. Some metadata types are easier to automate. Automation of metadata creation is not widely used in libraries yet. Thursday, March 8, 2012