SlideShare una empresa de Scribd logo
1 de 10
Richard Sapon-White
     March 18, 2013
 The growth of the Web
 Metadata in the context of the Web
 Important metadata schemes: XML, HTML, MARC
   From 1996-2007:
    ◦ 77,138 Web sites  125 million Web sites
   Access provided through search engines
    ◦ Google is the most used search engine
   Search engines use Web crawlers (a.k.a.,
    spiders or robots) to collect information on web
    sites
    ◦ Copy web pages and locations to build a catalog of
      indexed pages
 =Invisible Web, Deep Web
 Web crawlers cannot:

    ◦   submit queries to databases,
    ◦   parse file formats that they do not recognize,
    ◦   click buttons on Web forms, or
    ◦   Log-in to sites requiring authentication
   Therefore, much of the information on the Web is
    invisible!
    ◦ How much is invisible?
    ◦ Thousands of times larger than the indexed/visible web!!
•  Topic Databases — subject-specific aggregations of information, such as SEC corporate filings, medical
   databases, patent records, etc.
•  Internal site — searchable databases for the internal pages of large sites that are dynamically created,
   such as the knowledge base on the Microsoft site.
•  Publications — searchable databases for current and archived articles.
•  Shopping/Auction.
•  Classifieds.
•  Portals — broader sites that included more than one of these other categories in searchable databases.
•  Library — searchable internal holdings, mostly for university libraries.
•  Yellow and White Pages — people and business finders.
•  Calculators — while not strictly databases, many do include an internal data component for calculating
   results. Mortgage calculators, dictionary look-ups, and translators between languages are examples.
•  Jobs — job and resume postings.
•  Message or Chat .
•  General Search — searchable databases most often relevant to Internet search topics and information.
From: Michael K. Bergman, "The Deep Web: Surfacing Hidden Value," Journal of Electronic Publishing 7, no.
   1 (August 2001). http://www.press.umich.edu/jep/07-01/bergman.html.
 Poor site design results in invisible web sites
 To create web sites for human and machine

  retrieval:
    ◦ Use hyperlinked hierarchies of categories
    ◦ Contribute Deep Web collections’ metadata to union
      catalogs (which can then be indexed by search engines)
   Google’s Sitemap can provide detailed list of
    pages on a site
      http://www.sitemaps.org/
      http://en.wikipedia.org/wiki/Help:Contents/Site_map
 Create conventional, MARC-based metadata
 Access via library catalogs, union catalogs
 Problems:
    ◦ Creating MARC records is labor-intensive, slow,
      expensive
    ◦ Web sites are dynamic (content, URL’s), require MARC
      records to be revised
   Solutions:
    ◦ Dublin Core
    ◦ META tags
    ◦ Resource Description Framework
    Dublin Core PowerPoint
   Embed 2 metadata elements in HTML <Head>
    section of web page
    ◦ Keywords
    ◦ Description
   Example:
    ◦ <META NAME="KEYWORDS" CONTENT="data
      standards, metadata, Web resources, World Wide Web,
      cultural heritage information, digital resources, Dublin
      Core, RDF, Semantic Web">
      <META NAME="DESCRIPTION" CONTENT="Version
      3.0 of the site devoted to metadata: what it is, its types
      and uses, and how it can improve access to Web
      resources; includes a crosswalk.">
Metadata and the web

Más contenido relacionado

La actualidad más candente

Subject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionSubject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionkmusthu
 
Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)Hong (Jenny) Jing
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage MiningDaminda Herath
 
Ecp 11 created by hedley. hendricks and presented by gerald Louw
Ecp 11 created by hedley. hendricks and presented by gerald LouwEcp 11 created by hedley. hendricks and presented by gerald Louw
Ecp 11 created by hedley. hendricks and presented by gerald LouwGerald Louw
 
Islandora and Linked Open Data
Islandora and Linked Open Data Islandora and Linked Open Data
Islandora and Linked Open Data eohallor
 
Web mining
Web miningWeb mining
Web miningSilicon
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slidesmahavir_a
 
The invisible-webppt4899
The invisible-webppt4899The invisible-webppt4899
The invisible-webppt4899Eriik_lobo
 
Semantic Web Introduction - a perspective of data annotations
Semantic Web Introduction - a perspective of data annotationsSemantic Web Introduction - a perspective of data annotations
Semantic Web Introduction - a perspective of data annotationsAlex He
 
What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?Debra Shapiro
 

La actualidad más candente (20)

Subject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionSubject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introduction
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web mining Web mining
Web mining
 
Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage Mining
 
Ecp 11 created by hedley. hendricks and presented by gerald Louw
Ecp 11 created by hedley. hendricks and presented by gerald LouwEcp 11 created by hedley. hendricks and presented by gerald Louw
Ecp 11 created by hedley. hendricks and presented by gerald Louw
 
Islandora and Linked Open Data
Islandora and Linked Open Data Islandora and Linked Open Data
Islandora and Linked Open Data
 
Web mining
Web miningWeb mining
Web mining
 
Web Usage Pattern
Web Usage PatternWeb Usage Pattern
Web Usage Pattern
 
Web mining tools
Web mining toolsWeb mining tools
Web mining tools
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
5463 26 web mining
5463 26 web mining5463 26 web mining
5463 26 web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
The invisible-webppt4899
The invisible-webppt4899The invisible-webppt4899
The invisible-webppt4899
 
Web Presen
Web PresenWeb Presen
Web Presen
 
Semantic Web Introduction - a perspective of data annotations
Semantic Web Introduction - a perspective of data annotationsSemantic Web Introduction - a perspective of data annotations
Semantic Web Introduction - a perspective of data annotations
 
What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?
 
Web mining
Web miningWeb mining
Web mining
 

Destacado

Subject analysis class syllabus for warsaw
Subject analysis class syllabus for warsawSubject analysis class syllabus for warsaw
Subject analysis class syllabus for warsawRichard.Sapon-White
 
E books in public libraries. vendors in poland and usa
E books in public libraries. vendors in poland and usaE books in public libraries. vendors in poland and usa
E books in public libraries. vendors in poland and usaRichard.Sapon-White
 
Ebooks in the school library olga miechowska(2)
Ebooks in the school library  olga miechowska(2)Ebooks in the school library  olga miechowska(2)
Ebooks in the school library olga miechowska(2)Richard.Sapon-White
 
Continuing Education for Metadata Creation and Management
Continuing Education for Metadata Creation and ManagementContinuing Education for Metadata Creation and Management
Continuing Education for Metadata Creation and ManagementRichard.Sapon-White
 
Subject analysis, process of subject analysis
Subject analysis, process of subject analysisSubject analysis, process of subject analysis
Subject analysis, process of subject analysisRichard.Sapon-White
 

Destacado (7)

Subject analysis class syllabus for warsaw
Subject analysis class syllabus for warsawSubject analysis class syllabus for warsaw
Subject analysis class syllabus for warsaw
 
E books in public libraries. vendors in poland and usa
E books in public libraries. vendors in poland and usaE books in public libraries. vendors in poland and usa
E books in public libraries. vendors in poland and usa
 
Ebooks in the school library olga miechowska(2)
Ebooks in the school library  olga miechowska(2)Ebooks in the school library  olga miechowska(2)
Ebooks in the school library olga miechowska(2)
 
Priceless
PricelessPriceless
Priceless
 
Continuing Education for Metadata Creation and Management
Continuing Education for Metadata Creation and ManagementContinuing Education for Metadata Creation and Management
Continuing Education for Metadata Creation and Management
 
Subject analysis, process of subject analysis
Subject analysis, process of subject analysisSubject analysis, process of subject analysis
Subject analysis, process of subject analysis
 
Seo proposal
Seo proposalSeo proposal
Seo proposal
 

Similar a Metadata and the web

Internet Research: Finding Websites, Blogs, Wikis, and More
Internet Research: Finding Websites, Blogs, Wikis, and MoreInternet Research: Finding Websites, Blogs, Wikis, and More
Internet Research: Finding Websites, Blogs, Wikis, and Moreeclark131
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniquesTola Odugbesan
 
On building a search interface discovery system
On building a search interface discovery systemOn building a search interface discovery system
On building a search interface discovery systemDenis Shestakov
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museumsmherbison
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypresNekoGato
 
The Evolution of the Webbroadcast
The Evolution of the WebbroadcastThe Evolution of the Webbroadcast
The Evolution of the WebbroadcastJason Bengtson
 
Benefits of Internet
Benefits of Internet Benefits of Internet
Benefits of Internet yogini sharma
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.Shyjal Raazi
 
Interaccion2014 semanticwebendusertasks v2
Interaccion2014 semanticwebendusertasks v2Interaccion2014 semanticwebendusertasks v2
Interaccion2014 semanticwebendusertasks v2Alfons Palacios
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...MakoLab SA
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org sopekmir
 

Similar a Metadata and the web (20)

Web mining
Web miningWeb mining
Web mining
 
Internet Research: Finding Websites, Blogs, Wikis, and More
Internet Research: Finding Websites, Blogs, Wikis, and MoreInternet Research: Finding Websites, Blogs, Wikis, and More
Internet Research: Finding Websites, Blogs, Wikis, and More
 
Web mining
Web miningWeb mining
Web mining
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniques
 
L017447590
L017447590L017447590
L017447590
 
On building a search interface discovery system
On building a search interface discovery systemOn building a search interface discovery system
On building a search interface discovery system
 
Semantic Web For Dummies
Semantic Web For DummiesSemantic Web For Dummies
Semantic Web For Dummies
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypres
 
Semantic web
Semantic webSemantic web
Semantic web
 
The Evolution of the Webbroadcast
The Evolution of the WebbroadcastThe Evolution of the Webbroadcast
The Evolution of the Webbroadcast
 
Benefits of Internet
Benefits of Internet Benefits of Internet
Benefits of Internet
 
Web mining
Web miningWeb mining
Web mining
 
Gaurav web mining
Gaurav web miningGaurav web mining
Gaurav web mining
 
Websites
WebsitesWebsites
Websites
 
SharePoint WCM 2013
SharePoint WCM 2013SharePoint WCM 2013
SharePoint WCM 2013
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
Interaccion2014 semanticwebendusertasks v2
Interaccion2014 semanticwebendusertasks v2Interaccion2014 semanticwebendusertasks v2
Interaccion2014 semanticwebendusertasks v2
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 

Más de Richard.Sapon-White

Rda and new research potentials, agata kawalec
Rda and new research potentials, agata kawalecRda and new research potentials, agata kawalec
Rda and new research potentials, agata kawalecRichard.Sapon-White
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRichard.Sapon-White
 
RDA as an international standard
RDA as an international standardRDA as an international standard
RDA as an international standardRichard.Sapon-White
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesRichard.Sapon-White
 
Introduction to metadata, part 2
Introduction to metadata, part 2Introduction to metadata, part 2
Introduction to metadata, part 2Richard.Sapon-White
 
Course syllabus metadata systems for warsaw
Course syllabus metadata systems for warsawCourse syllabus metadata systems for warsaw
Course syllabus metadata systems for warsawRichard.Sapon-White
 
Preparing your presentation.pptx [repaired]
Preparing your presentation.pptx [repaired]Preparing your presentation.pptx [repaired]
Preparing your presentation.pptx [repaired]Richard.Sapon-White
 
Accessibility issues with ebooks
Accessibility issues with ebooksAccessibility issues with ebooks
Accessibility issues with ebooksRichard.Sapon-White
 
Subject analysis, shelflisting, filing rules, subject authority control
Subject analysis, shelflisting, filing rules, subject authority controlSubject analysis, shelflisting, filing rules, subject authority control
Subject analysis, shelflisting, filing rules, subject authority controlRichard.Sapon-White
 

Más de Richard.Sapon-White (20)

Rda and new research potentials, agata kawalec
Rda and new research potentials, agata kawalecRda and new research potentials, agata kawalec
Rda and new research potentials, agata kawalec
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna Pszenicyn
 
VRA Core 4.0
VRA Core 4.0VRA Core 4.0
VRA Core 4.0
 
Sgml and xml
Sgml and xmlSgml and xml
Sgml and xml
 
RDA as an international standard
RDA as an international standardRDA as an international standard
RDA as an international standard
 
Metadata april 8 2013
Metadata april 8 2013Metadata april 8 2013
Metadata april 8 2013
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
 
Metadata lecture 5 part 2
Metadata lecture 5 part 2Metadata lecture 5 part 2
Metadata lecture 5 part 2
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
 
Rda class, lecture 2
Rda class, lecture 2Rda class, lecture 2
Rda class, lecture 2
 
Rda class, lecture 2
Rda class, lecture 2Rda class, lecture 2
Rda class, lecture 2
 
Introduction to metadata, part 2
Introduction to metadata, part 2Introduction to metadata, part 2
Introduction to metadata, part 2
 
Course syllabus metadata systems for warsaw
Course syllabus metadata systems for warsawCourse syllabus metadata systems for warsaw
Course syllabus metadata systems for warsaw
 
Rda seminar syllabus
Rda seminar syllabusRda seminar syllabus
Rda seminar syllabus
 
Preparing your presentation.pptx [repaired]
Preparing your presentation.pptx [repaired]Preparing your presentation.pptx [repaired]
Preparing your presentation.pptx [repaired]
 
Rda class, lecture 1
Rda class, lecture 1Rda class, lecture 1
Rda class, lecture 1
 
Metadata lecture 1, intro
Metadata lecture 1, introMetadata lecture 1, intro
Metadata lecture 1, intro
 
Accessibility issues with ebooks
Accessibility issues with ebooksAccessibility issues with ebooks
Accessibility issues with ebooks
 
E books in university libraries
E books in university librariesE books in university libraries
E books in university libraries
 
Subject analysis, shelflisting, filing rules, subject authority control
Subject analysis, shelflisting, filing rules, subject authority controlSubject analysis, shelflisting, filing rules, subject authority control
Subject analysis, shelflisting, filing rules, subject authority control
 

Metadata and the web

  • 1. Richard Sapon-White March 18, 2013
  • 2.  The growth of the Web  Metadata in the context of the Web  Important metadata schemes: XML, HTML, MARC
  • 3. From 1996-2007: ◦ 77,138 Web sites  125 million Web sites  Access provided through search engines ◦ Google is the most used search engine  Search engines use Web crawlers (a.k.a., spiders or robots) to collect information on web sites ◦ Copy web pages and locations to build a catalog of indexed pages
  • 4.  =Invisible Web, Deep Web  Web crawlers cannot: ◦ submit queries to databases, ◦ parse file formats that they do not recognize, ◦ click buttons on Web forms, or ◦ Log-in to sites requiring authentication  Therefore, much of the information on the Web is invisible! ◦ How much is invisible? ◦ Thousands of times larger than the indexed/visible web!!
  • 5. • Topic Databases — subject-specific aggregations of information, such as SEC corporate filings, medical databases, patent records, etc. • Internal site — searchable databases for the internal pages of large sites that are dynamically created, such as the knowledge base on the Microsoft site. • Publications — searchable databases for current and archived articles. • Shopping/Auction. • Classifieds. • Portals — broader sites that included more than one of these other categories in searchable databases. • Library — searchable internal holdings, mostly for university libraries. • Yellow and White Pages — people and business finders. • Calculators — while not strictly databases, many do include an internal data component for calculating results. Mortgage calculators, dictionary look-ups, and translators between languages are examples. • Jobs — job and resume postings. • Message or Chat . • General Search — searchable databases most often relevant to Internet search topics and information. From: Michael K. Bergman, "The Deep Web: Surfacing Hidden Value," Journal of Electronic Publishing 7, no. 1 (August 2001). http://www.press.umich.edu/jep/07-01/bergman.html.
  • 6.  Poor site design results in invisible web sites  To create web sites for human and machine retrieval: ◦ Use hyperlinked hierarchies of categories ◦ Contribute Deep Web collections’ metadata to union catalogs (which can then be indexed by search engines)  Google’s Sitemap can provide detailed list of pages on a site  http://www.sitemaps.org/  http://en.wikipedia.org/wiki/Help:Contents/Site_map
  • 7.  Create conventional, MARC-based metadata  Access via library catalogs, union catalogs  Problems: ◦ Creating MARC records is labor-intensive, slow, expensive ◦ Web sites are dynamic (content, URL’s), require MARC records to be revised  Solutions: ◦ Dublin Core ◦ META tags ◦ Resource Description Framework
  • 8.  Dublin Core PowerPoint
  • 9. Embed 2 metadata elements in HTML <Head> section of web page ◦ Keywords ◦ Description  Example: ◦ <META NAME="KEYWORDS" CONTENT="data standards, metadata, Web resources, World Wide Web, cultural heritage information, digital resources, Dublin Core, RDF, Semantic Web"> <META NAME="DESCRIPTION" CONTENT="Version 3.0 of the site devoted to metadata: what it is, its types and uses, and how it can improve access to Web resources; includes a crosswalk.">