Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Zeng marcia ifla-subjectaccesssmartdatadh

1.879 visualizaciones

Publicado el

This presentation will discuss how the structured data, together with the semantically indexed/mined entities in semi-structured and unstructured data, are contributing to researches beyond libraries, especially in digital humanities. It aims to explore the opportunities and strategies to use, reuse, share, and effectively elaborate the smart data -- generated or to be generated -- in libraries.

Publicado en: Datos y análisis
  • Sé el primero en comentar

Zeng marcia ifla-subjectaccesssmartdatadh

  1. 1. Subject Access, Smart Data, and Digital Humanities – Finding Unlimited Opportunities through their Intersections Marcia Lei Zeng Kent State University Keynote @ IFLA Classification & Indexing Satellite Conference 2016 August 11-12, Columbus, OH, USA http://marciazeng.slis.kent.edu/
  2. 2. Outline • I. Background • II. Subject Access -- Finding the Unlimited Opportunities • III. The Importance of Knowledge Organization Systems (KOS) for Effective Subject Access M Zeng - IFLA Classification & Indexing Satellite Conference 2016 28/11/16
  3. 3. I. Background What do I mean… 1) Subject access – in the context of today’s environments 2) Smart data – in the context of Big Data 3) Digital Humanities – in the context of heritage institutions’ data M Zeng - IFLA Classification & Indexing Satellite Conference 2016 38/11/16
  4. 4. What is happening around us? • The 2nd generation of the Web: the Semantic Web – Search engines involvement, – mature of the Linked Data technologies, – non-traditional databases • “Big Data” – Government funding opportunities, – Blooming of ‘data analytics’ profession • Modern AI (artificial intelligence) – Machine-learning – Contextual computing • Participatory culture – Social media – Engaging end-users in the workflow M Zeng - IFLA Classification & Indexing Satellite Conference 2016 4 I. Background https://www.w3.org/2002/Talks/www2002-w3ct-swintro-em/slide7-0.html 8/11/16
  5. 5. Source: Nova Spivak, Radar Networks; John Breslin, DERI; & Mills Davis, Project10X, 2007, 2008 Copyright MILLS•DAVIS 5 Web 1.0: connecting information and getting on the net. Web 2.0: connecting people — putting the “I” in user interface, and the “we” into Webs of social participation. Web 3.0 Connecting knowledge -- representing meanings, connecting knowledge, and putting these to work in ways that make our experience of internet more relevant, useful, and enjoyable. Web 4.0 Connecting intelligence -- It is about connecting intelligences in a ubiquitous Web where both people and things reason and communicate together. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  6. 6. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 6 Big data • Volume (data quantity) • Velocity (data speed) • Variety (data types & nature) • Variability (data consistency) • Veracity (data quality) • Complexity Source: Kobielus, James. 2016. The Evolution of Big Data to Smart Data. Keynote at Smart Data Online 2016. Source: Big Data. Wikipedia. SAS Institute Inc. [2014]. Big Data: What it is and why it matters. Smart Data = Ability to achieve big insights from such data at any scale, great or small. I. Background 8/11/16
  7. 7. Why Smart Data • “However, in its raw form, data is just like crude oil; it needs to be refined and processed in order to generate real value. Data has to be cleaned, transformed, and analyzed to unlock its hidden potential.” (TiECON East. Data is new oil.) • Once tamed through organizing and integrating processes, large volumes of unstructured, semi-structured, and structured data are turned into “smart data” that reflect the research priorities of a particular discipline or field. • Smart data inquiries can then be used to provide comprehensive analyses and generate new products and services. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 7 Sources: Gardner, D, 2012. Prithwis Mukerjee, 2014 Schöch, 2013. TiECON East, 2014. 8/11/16
  8. 8. What can we do to avoid asthma episode? 8 Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies. Variety Volume VeracityVelocity Value What risk factors influence asthma control? What is the contribution of each risk factor? semantics WHY Big Data to Smart Data: Asthma example Slide from: Sheth, Amit. 2014. Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web. Understanding relationships between health signals and asthma attacks for providing actionable information 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  9. 9. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 9 Data (in humanities) Big data unstructured messy implicit relatively large in volume varied in form Smart data semi- structured or structured Clean Explicit and enriched Raw data + markup, annotations and metadata Relatively small in volume The creation involves human agency & demands time The process of modeling the data is essentialOf limited heterogeneity Complied based on Schöch, Christof. 2013. Big? Smart? Clean? Messy? Data in the humanities. Journal for Digital Humanities. 2(3) What about LAMs? 8/11/16
  10. 10. Structured Semi-structured Unstructured 10 • National bibliographies • Catalogs • Special collection portals • Registries • Metadata for datasets • … • Text Encoding Initiative (TEI) files • Finding Aids • Value added/tagged resources • Unstructured portion within metadata descriptions • Digitized materials, textual or non-textual • Original information-bearing objects • Documents in all kinds of formats • … • Data from Web crawling that need to be cleaned • … … LAM data examples “Smart Data” emphasizes the organizing and integrating processes from unstructured data to structured and semi-structured data, to make the big data smarter. - Schöch, 2013 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  11. 11. • the field is still expanding, • the definitions are being debated, and • the multifaceted landscape is yet to be fully understood. • Most agree that initiatives and activities in digital humanities are at the intersection between the humanities and digital information technology. • The field applies big data mathematical research techniques to the description and analysis of cultural objects—including art, literature, and technological artifacts themselves. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 11 Image source: Katherine Hayles http://dtc-wsuv.org/wp/dtc375- scodi/katherine-hayles/ • Svensson, P. 2010. The Landscape of Digital Humanities. Digital Humanities Quarterly. 4(1). • Svensson, P. 2009. Humanities Computing as Digital Humanities. Digital Humanities Quarterly. 3(3) I. Background 8/11/16
  12. 12. Advanced technologies now allow researchers : (under the umbrella of Big Data and the Semantic Web) • to access and reuse large volumes of diverse data, • to discover patterns and connections formerly hidden from view, • to reconstruct the past, • to discover impacts in real and virtual environments, and • to bring the complex intricacies of innovations to light, all as never before. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 12 Image source: http://goo.gl/a4gZsd Image source: Schöch, 2013. 8/11/16
  13. 13. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 13 Think: • What kind of data did the project use? Data sources: • Freebase (now Wikidata) • Union List of Artist Names (ULAN®) • Allgemeines Künstlerlexikon/ Artists of the World Schich, M. et al. 2014. “A Network Framework of Cultural History.” Science, 345(6196), 558-562. Nature Video. (2014, July 31). Charting culture. https://www.youtube.com/w atch?v=4gIhRkCcD4U 8/11/16
  14. 14. Advanced technologies now allow researchers : (under the umbrella of Big Data and the Semantic Web) • to access and reuse large volumes of diverse data, • to discover patterns and connections formerly hidden from view, • to reconstruct the past, • to discover impacts in real and virtual environments, and • to bring the complex intricacies of innovations to light, all as never before. Data provided by LAMs and cultural heritage institutions are treasures for all humanities researchers. Trending: • Machine readable understandable data • Machine readable actionable data • Accurate (no error) data in the processes of interlinking, citing, transferring, rights-permission, use and reuse, etc. • One –to -many uses and high efficiency processing data M Zeng - IFLA Classification & Indexing Satellite Conference 2016 14 http://goo.gl/a4gZsd 8/11/16
  15. 15. 15 Digital humanities – Librarian Survey Results, December 2015 http://americanlibrariesmagazine.org/2016/01/04/special-report-digital-humanities-libraries/ Source: http://americanlibrariesmagazine.org/wp-content/uploads/2016/01/digital-humanities-faculty.pdf 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  16. 16. 16 Digital humanities – Faculty Survey Results, December 2015 Source: http://americanlibrariesmagazine.org/wp-content/uploads/2016/01/digital-humanities-faculty.pdf8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  17. 17. Structured Semi-structured Unstructured 17 • National bibliographies • Catalogs • Special collection portals • Registries • Metadata for datasets • … • Text Encoding Initiative (TEI files) • Finding Aids • Value added/tagged resources • Unstructured portion within metadata descriptions • Digitized materials, textual or non-textual • Original information-bearing objects • Documents in all kinds of formats • … • Data from Web crawling that need to be cleaned • … … LAM data examples II. Subject Access -- Finding the Unlimited Opportunities 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  18. 18. Figure. Overview of Relationships (draft) Source: http://www.ifla.org/files/assets/cataloguing/frbr-lrm/frbr-lrm_20160225.pdf + revision draft FRBR-Library Reference Model (LRM) - World-wide review version RES:“Any entity in the universe of discourse” 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 18
  19. 19. liked, cited, researched, tagged, searched, shared, followed, time spent, … Three Perspectives -- from the creation of the structured data based on Rose, Gillan. 2013. Visual Methodologies, 3rd. Ed. • Index • Markup • Ontology • Knowledge base • Metadata • Descriptive • Administrative • Structural/techni cal M Zeng - IFLA Classification & Indexing Satellite Conference 2016 19 1 2 3 Production Content Audiences’ receiving interests Image sources: http://www.mahalo.com/how-to- understand-perspective-in-drawing/ http://judithlondono.com/ http://www.smrfoundation.org/nodexl/ 8/11/16
  20. 20. 20 Read more: Godby, Wang, Mixter. 2015. Library Linked Data in the Cloud – OCLC’s Experiments with New Models of Resource Description. ISBN 9781627052191. Figure 1.1: A bibliographic description as a record and a graph. WorldCat Linked Data https://www.oclc.org/developer/develop/linked- data.en.html OCLC WorldCat Works – 197 Million Nuggets of Linked Data -- Since 2014- “The bibliographic metadata found in WorldCat contains a rich set of objects that can be represented in linked data.” Linking THINGS, not strings. Access through the linked things. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  21. 21. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 21 BibFrame - based Read: https://www.denverlibrary.org/blog/rachel-f/dpl-announces-linked-data-launch 2015-06 Try: http://labs.libhub.org/denverpl/ Linking THINGS, not strings. Access through the linked things. 8/11/16
  22. 22. 22 http://www.worldcat.org/oclc/922220005 http://worldcat.org/entity/work/id/2534768 http://worldcat.org/entity/person/id/2631227899 Linking out Schema.org- based Linking THINGS, not strings. Access through the linked things. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  23. 23. 23 Go to http://dbpedia.org/page/Lois_Mai_Chan and follow knownFor and about Linking THINGS, not strings. Access through the linked things. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  24. 24. 24 The connected structured data for THINGs from Perspectives #2 and #3: 2 3 Linking THINGS, not strings. Access through the linked things. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  25. 25. Structured Semi-structured Unstructured 25 • National bibliographies • Catalogs • Special collection portals • Registries • Metadata for datasets • … • Text Encoding Initiative (TEI files) • Finding Aids • Value added/tagged resources • Unstructured portion within metadata descriptions • Digitized materials, textual or non-textual • Original information-bearing objects • Documents in all kinds of formats • … • Data from Web crawling that need to be cleaned • … … LAM data examples There are many hidden access points that can bring in much richer information and knowledge through LAM data. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  26. 26. Big text in humanities “Big text” – The text version of “big data” – Where? • special collections, • archives, • oral histories, • annual reports, • provenance indexes, • inventories, • … etc. – How? • Fact mining, analytics – What is needed? Tools • to ‘mine’ the text, • to manage extracted entities as new access points, and • to connect with the outside data. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 26 Source: SmarkLogic webinar http://www.marklogic.com/w ebinars/ 8/11/16
  27. 27. Audiences’ receiving interests liked, cited, researched, tagged, searched, shared, followed, time spent, … • Index • Markup • Ontology • Knowledge base • Metadata • Descriptive • Administrative • Structural/techni cal M Zeng - IFLA Classification & Indexing Satellite Conference 2016 27 1 2 3 Production Content 2). Taking archival finding aids as an example Image source: http://alelemuseum.tripod.com/Archive s.html Image source: https://libraries.u sc.edu/article/ins ide-usc-libraries- grand-avenue- library 8/11/16
  28. 28. • Finding aids • Provide detailed descriptions of a collection's component parts, • summarize the overall scope of the content, • convey details about the individuals and organizations involved, • list box and folder headings. • The ‘subject’ access is to the whole archive (=Perspective #1) • Few provided accesses to the ‘things’ contained in the contents through index terms. [Images from a finding aids: title page, content page, and index terms page. Image source: https://libraries.u sc.edu/article/ins ide-usc-libraries- grand-avenue- library8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 28
  29. 29. Finding Aids semantic analysis using ontology-based tools by KSU SLIS LOD-LAM team http://lod- lam.slis.kent.edu/SemanticAnalysis.html • 45 archival finding aids • drawn from 16 repositories • From OpenCalais: extracted 8,096 entities and 336 suggested social tags 29 OpenCalais and COGITO are • semantic analysis/fact-mining tools, • taxonomy and ontology-supported, • with machine learning and natural language processing behind.8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  30. 30. after before 30 http://www.opencalais.com/ 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  31. 31. Structured data produced by Calais (RDF/XML) 31 http://www.opencalais.com/ 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  32. 32. before after 32 http://www.intelligenceapi.com 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  33. 33. after after Structured data for THINGs M Zeng - IFLA Classification & Indexing Satellite Conference 2016 33 before http://www.intelligenceapi.com 8/11/16
  34. 34. 34 for enhancing access to the contents in oral history transcripts files. • Currently many are managed at collection level only. • Only some have deep indexes, with great quality. • The indexes usually existed as a ‘back-of-the-book’ style and stayed within PDF files, downloadable. • The indexes could be used in the collection's subject searching. • Indexed THINGs could be linked to external resources. The same approach can be used for the oral history collections [image of a back-of-the-book style index to an oral history transcripts] [image of a page of the oral history transcripts] 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  35. 35. 35 Tool used: Open Calais Note: Only for assistant extraction; still need human cleaning process. The same approach can be used for the library catalogs 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  36. 36. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 36 The same approach can be used for the museum object labels and descriptions COGITOOpenCalais http://www.metmuseum.org/toah/works-of-art/2010.312/ 8/11/16
  37. 37. AfterBosonBefore http://bosonnlp.com/demo entities keywords 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 37
  38. 38. Audiences’ receiving interests liked, cited, researched, tagged, searched, shared, followed, time spent, … • Index • Markup • Ontology • Knowledge base • Metadata • Descriptive • Administrative • Structural/techni cal M Zeng - IFLA Classification & Indexing Satellite Conference 2016 38 1 2 3 Production Content 3) Taking non-textual objects as examples 8/11/16
  39. 39. Portrait of Marcus Aurelius Online Coins of the Roman Empire (OCRE) - Ontology based, knowledge base http://numismatics.org/ocre/results • Modeling in an ontology (formed in classes, properties, relationships) • Following Linked Data principles • Using RDF triples for entities • Querying in SPARQL language 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 39
  40. 40. 40 Online Coins of the Roman Empire (OCRE) http://numismatics.org/ocre/ • Using sparql queries to find • Output as CSV files • Auto-Visualizing using FusionTable • Just needs a few seconds 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  41. 41. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 41 Online Coins of the Roman Empire (OCRE) http://numismatics.org/ocre/
  42. 42. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 42http://www.synaptica.com/oasis/ Deep Image Annotation 8/11/16
  43. 43. 43Clarke, David. 2015. Deep image annotation and Knowledge Organization. ISKO-UK 2015. /content/deep-image-annotation-and-knowledge-organization 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  44. 44. 44Clarke, David. 2015. Deep image annotation and Knowledge Organization. ISKO-UK 2015. /content/deep-image-annotation-and-knowledge-organization 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  45. 45. IIIF Image API 45See API specifications at: http://iiif.io/technical-details.html International Image Interoperability Framework Sanderson, Rob. 2014. Open Repositories 2014: Crowdsourced Transcription via IIIF, slide 9. API= application programming interface, a set of routines, protocols, and tools for building software applications. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  46. 46. III. The Importance of Knowledge Organization Systems (KOS) for Effective Subject Access Various Types of KOS 1. Eliminating ambiguity 2. Controlling synonyms or equivalents 3. Making explicit semantic relationships between/among concepts Hierarchical relationships hierarchical + other associate relationships 4. Presenting relationships between/among concepts as well as properties of concepts Fundamental KOS Approaches See full picture at http://nkos.slis.kent.edu/KOS_taxonomy.htm8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 46
  47. 47. Figure. FRBR-LRM Overview of Relationships (draft) Source: http://www.ifla.org/files/assets/cataloguing/frbr-lrm/frbr-lrm_20160225.pdf + revision draft Dealing with The Problem of Semantic Conflicts (inconsistencies in terminology and meanings) 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 47
  48. 48. 48 Dealing with The Problem of Information Overload Traditional Filters -- “Filter-out” • Site (physical or digital) organization and navigation support • Advanced search functions • “Umbrella” structures of classification and taxonomy from which to extend content • Browsing support—hierarchical structures Beyond traditional filters -- “Filter-forward” • Browsing and Filtering to the Front -- Using Faceted Structure • Connecting Things via Semantic Relations • Enabling Rediscovery – Data mining, semantic analysis, machine-learning through expert feedback, machine reasoning • LOD KOS Datasets become Knowledge Bases – obtaining special graphs or datasets for very complicated questions, and – revealing unknown relationships (e.g., http://vocab.getty.edu/queries#Top- level_Subjects • From Machine-readable to Machine-understandable/processable 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  49. 49. In the BARTOC registry KOS registered: 1836 in the Datahub LOD KOS registered :1251 (about a half are ontologies) M Zeng - IFLA Classification & Indexing Satellite Conference 2016 49 http://bartoc.org/ https://datahub.io/ (2016.05.27 data) (2016.03.15 data) Fact: The Increasing Need for KOS 8/11/16
  50. 50. Initiatives in digital humanities have demonstrated a paradigm shift in how cultural heritage materials can be - searched, mined, displayed, taught, and analyzed utilizing digital technologies. Data provided by LAMs and cultural heritage institutions are treasures for all humanities researchers. When subject access, smart data, and digital humanities interact, the opportunity of effective and innovative services and contributions can be endless. Let’s embrace the new and changing concepts and make these happen. Conclusion Thank you!8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 50
  51. 51. References • Gardner, D. 2012. An ocean of data [Introduction]. In: Smolan, R., Erwitt, J. (eds.) The human face of big data, pp. 14-17. Sausalito, CA: Against All Odds Productions. • Joshi, Kunal. 2013. Big data, data science & fast data. http://www.slideshare.net/kunaljoshi111/big-data- data-science-fast-data • Kobielus, James. 2016. The Evolution of Big Data to Smart Data. Keynote at Smart Data Online 2016. • Rose, Gillan. 2013. Visual Methodologies, 3rd. Edition. SAGE Publications Ltd. • Sanderson, Rob. 2014. Open Repositories 2014: Crowdsourced Transcription via IIIF, slide 9. http://www.slideshare.net/azaroth42/open-repositories-2014-crowdsourced-transcription-via-iiif • SAS Institute Inc. [2014]. Big Data: What it is and why it matters. http://www.sas.com/big-data/ • Schöch, Christof. 2013. Big? Smart? Clean? Messy? Data in the humanities. Journal for Digital Humanities. 2(3): 2-13. http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/ • Sheth, Amit. 2014. Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web. Keynote at 30th IEEE International Conference on Data Engineering (ICDE) 2014. • Svensson, Patrik. 2010. The landscape of digital humanities. Digital Humanities Quarterly. 4(1) http://digitalhumanities.org/dhq/vol/4/1/000080/000080.html • Svensson, Patrik. 2009. Humanities computing as digital humanities. Digital Humanities Quarterly. 3(3) http://digitalhumanities.org/dhq/vol/3/3/000065/000065.html • TiECON East. 2014. Data is new oil. http://www.tieconeast.org/2014/big-data-analytics M Zeng - IFLA Classification & Indexing Satellite Conference 2016 518/11/16

×