SlideShare a Scribd company logo
1 of 38
Download to read offline
A Corpus of Chinese Comic Books:
Database, Metadata, and Visual Object
Recognition
Matthias Arnold, HRA, Universität Heidelberg
Agenda
• Project history
• Approaching the material
• Achievements
• Automatic object detection
• New system and user annotation
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
“Leihbibliothek für Kinder an der Straße“, She Zeh Tschi, Der Holzschnitt im Neuen China, Katalog, Dresden 1951, p.101
图为上世纪80年代初的孩子们留连在小人书摊的情景。http://www.dili360.com/ch/article/p5350c3d9d48d394.htm
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Project History
2009-11 Main digitisation project
Funding: Cluster of Excellence „Asia and Europe“ and Institute
of Chinese Studies
Scanning: MediaLab at Cluster
First database: eXist-db, metadata in MODS XML
Presentation at Cartoon Museum Basel „Visual Words - Comics
from China“ (2010/11)
Content expansion and separation of books and stories
Image analysis project with Computer Vision (radiances)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Enduser interface (2010)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Project History
2009-11 Main digitisation project
Funding: Cluster of Excellence „Asia and Europe“ and Institute
of Chinese Studies
Scanning: MediaLab at Cluster
First database: eXist-db, metadata in MODS XML
Presentation at Cartoon Museum Basel „Visual Words - Comics
from China“ (2010/11)
Content expansion and separation of books and stories
Image analysis project with Computer Vision (radiances)
2018: Data migration (ongoing):
Mongo DB, ingest in XML, image service (IIIF), browse - search
- filter, user annotation through Mirador viewer
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Approaching the material
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
趙百萬 Zhao Baiwan (1951)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
红灯记 Hong deng ji (1970)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
鲁迅和青年的故事 Lu Xun he qingnian de gushi (1976)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
1. Spreadsheets
2. XML Database (MODS records) and frontend
3. Refined metadata schema (books & stories, agents), SQL
database
4. Mongo database, IIIF image service, user annotations in
Mirador
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Special cases
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
生死緣 Sheng si yuan (1953)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
高歌猛进 Gaoge mengjin (1952)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Project Achievements
1. Scans from 5 collections (1250/1031 books)
2. Digitize material (greysacale .tif @600dpi, ca. 4 TB)
3. Record all metadata (as provided on books)
4. Provide online access to metadata
5. Provide online access to full books (read books)
6. Open data for research annotations (Mirador)
7. Explore automatic content analysis (computer vision)
8. *Link data to authorities (agents)
9. **Generate fulltext
10. **Explore generic content description (XML)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Automatic object detection
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
• Chinese comics from the
second half of the Cultural
Revolution
• Over 1200 books (~120,000
pages)
• Grayscale .tif images @600 dpi
~ 4,5 Tb data
• Focus: comic book production
of late 1960s and 1970s from
China.
• Shows the diversity of Chinese
comic production in general
• Special type of emphasis
(heroes, symbolic objects or
idols)
Specific type of emphasis: radiance
Monroy et. al. (2011)
Monroyet.al.(2011)
Monroyet.al.(2011)
Outcome of experiment
• Automatic detection of objects using radiances
• No training data, multiple categories, multiple scales,
intense clutter, high object variability
• However: no tool for re-use
• …for object detection, text-image separation, auto-fulltext
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
New system and user annotation
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Website:
http://comics.freizo.org
Use of Mirador for (manual) annotation
Arnold/Decker - Annotationssysteme für Bild- und Videomedien, 2016-01-26 35
Standards - Interoperability
1. Image service:
IIIF Image API using standardized IIIF image call, http://iiif.io
2. Image annotation:
Web Annotation using Mirador IIIF viewer
http://w3c.github.io/web-annotation/
3. Bibliographic metadata: MODS XML for library catalogs
4. Agent‘s data: link to authorities (e.g. GND or VIAF)
5. Textual data (ideas):
Publish metadata on research data platform (HeiDATA)
Texts (stories, pre-/postface, paratext) in TEI XML
Structure in CBML?
6. Re-use and enhancements:
Include samples in Graphic Narrative or Vis. Lang Res. Corpus?
Find partners for automatical (Chinese) comics analysis
References
• Seifert, Andreas. Bildgeschichten für Chinas
Massen: Comic und Comicproduktion im 20.
Jahrhundert. Köln: Böhlau, 2008.
• Monroy, Antonio, Tobias Kröger, Matthias
Arnold, and Björn Ommer. “Parametric Object
Detection for Iconographic Analysis.” In
SCCH11, 1-8. Heidelberg, 2011.
• “Reddition : Reddition 63.” Edition Alfons, Dec.
2015. https://www.reddition.de/index.php/shop/reddition/reddition-63-
detail.
• Database:
http://comics.freizo.org
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Contact
Matthias Arnold
Heidelberg Research Architecture
Cluster of Excellence “Asia and Europe in a Global Context”
Heidelberg Centre for Transcultural Studies | HCTS
Karl Jaspers Centre
Voßstr. 2 | Building 4400 | Room 005b
69115 Heidelberg, Germany
arnold@asia-europe.uni-heidelberg.de
http://www.asia-europe.uni-heidelberg.de

More Related Content

What's hot

Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
 
Opening up the National Gallery’s Collection Information
Opening up the National Gallery’s Collection InformationOpening up the National Gallery’s Collection Information
Opening up the National Gallery’s Collection InformationMuseums Computer Group
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataLora Aroyo
 
Integration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and OntologiesIntegration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and OntologiesRoberto García
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddataStefan Gradmann
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madridStefan Gradmann
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionSören Auer
 
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainerBHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainerHeimo Rainer
 
ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniDataStadt Wien
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWGlen Robson
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
 

What's hot (20)

Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
Prague Hacks 2015
Prague Hacks 2015Prague Hacks 2015
Prague Hacks 2015
 
Open statistics Belgium
Open statistics BelgiumOpen statistics Belgium
Open statistics Belgium
 
Opening up the National Gallery’s Collection Information
Opening up the National Gallery’s Collection InformationOpening up the National Gallery’s Collection Information
Opening up the National Gallery’s Collection Information
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked Data
 
Weso research group
Weso research groupWeso research group
Weso research group
 
Integration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and OntologiesIntegration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and Ontologies
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddata
 
Data Mining Newspapers Metadata
Data Mining Newspapers MetadataData Mining Newspapers Metadata
Data Mining Newspapers Metadata
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madrid
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and Discussion
 
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainerBHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
 
ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniData
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLW
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 

Similar to Jingjing Zhangzhang@asia-europe.uni-heidelberg.deMatthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19

Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Matthias Arnold
 
Text and Data Mining at Springer Nature
Text and Data Mining at Springer NatureText and Data Mining at Springer Nature
Text and Data Mining at Springer NatureSpringer Nature
 
Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...BOBCATSSS 2017
 
Estermann wikidata performing-arts-20181109
Estermann wikidata performing-arts-20181109Estermann wikidata performing-arts-20181109
Estermann wikidata performing-arts-20181109Beat Estermann
 
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020Beat Estermann
 
07 reusable padfield
07 reusable padfield07 reusable padfield
07 reusable padfieldShareCareX
 
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
3D-printing with GRASS GIS – a work in progress in report FOSS4G 20143D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014Peter Löwe
 
Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Marton Nemeth
 
opening up japanese resources 4 linked data cloud
opening up japanese resources 4 linked data cloudopening up japanese resources 4 linked data cloud
opening up japanese resources 4 linked data cloudeveline wandl-vogt
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Peter Löwe
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesRichard Wallis
 
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)Matthias Arnold
 
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)Vladimir Alexiev, PhD, PMP
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria
 
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015KISK FF MU
 
IIIF for CNI Spring 2014 Membership Meeting
IIIF for CNI Spring 2014 Membership MeetingIIIF for CNI Spring 2014 Membership Meeting
IIIF for CNI Spring 2014 Membership MeetingTom-Cramer
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 
The Chinese Women’s Magazines Database
The Chinese Women’s Magazines DatabaseThe Chinese Women’s Magazines Database
The Chinese Women’s Magazines DatabaseMatthias Arnold
 

Similar to Jingjing Zhangzhang@asia-europe.uni-heidelberg.deMatthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19 (20)

Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
 
Text and Data Mining at Springer Nature
Text and Data Mining at Springer NatureText and Data Mining at Springer Nature
Text and Data Mining at Springer Nature
 
Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...
 
Estermann wikidata performing-arts-20181109
Estermann wikidata performing-arts-20181109Estermann wikidata performing-arts-20181109
Estermann wikidata performing-arts-20181109
 
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
 
07 reusable padfield
07 reusable padfield07 reusable padfield
07 reusable padfield
 
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
3D-printing with GRASS GIS – a work in progress in report FOSS4G 20143D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
 
Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...
 
opening up japanese resources 4 linked data cloud
opening up japanese resources 4 linked data cloudopening up japanese resources 4 linked data cloud
opening up japanese resources 4 linked data cloud
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Open Fashion & Europeana Fashion
Open Fashion & Europeana FashionOpen Fashion & Europeana Fashion
Open Fashion & Europeana Fashion
 
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
 
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
 
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
 
IIIF for CNI Spring 2014 Membership Meeting
IIIF for CNI Spring 2014 Membership MeetingIIIF for CNI Spring 2014 Membership Meeting
IIIF for CNI Spring 2014 Membership Meeting
 
Europeana datainaction nov2012
Europeana datainaction nov2012Europeana datainaction nov2012
Europeana datainaction nov2012
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 
The Chinese Women’s Magazines Database
The Chinese Women’s Magazines DatabaseThe Chinese Women’s Magazines Database
The Chinese Women’s Magazines Database
 

More from Matthias Arnold

Ocr workshop ubhd 2020 10-15
Ocr workshop ubhd  2020 10-15Ocr workshop ubhd  2020 10-15
Ocr workshop ubhd 2020 10-15Matthias Arnold
 
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...Matthias Arnold
 
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – ArchivierungMatthias Arnold
 
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...Matthias Arnold
 
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...Matthias Arnold
 
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.Matthias Arnold
 
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...Matthias Arnold
 

More from Matthias Arnold (8)

Ocr workshop ubhd 2020 10-15
Ocr workshop ubhd  2020 10-15Ocr workshop ubhd  2020 10-15
Ocr workshop ubhd 2020 10-15
 
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
 
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
 
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
 
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
 
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
 
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
 
Ziziphus/Tamboti
Ziziphus/TambotiZiziphus/Tamboti
Ziziphus/Tamboti
 

Recently uploaded

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 

Recently uploaded (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

Jingjing Zhangzhang@asia-europe.uni-heidelberg.deMatthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19

  • 1. A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recognition Matthias Arnold, HRA, Universität Heidelberg
  • 2. Agenda • Project history • Approaching the material • Achievements • Automatic object detection • New system and user annotation Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 3. “Leihbibliothek für Kinder an der Straße“, She Zeh Tschi, Der Holzschnitt im Neuen China, Katalog, Dresden 1951, p.101
  • 5. Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 6. Project History 2009-11 Main digitisation project Funding: Cluster of Excellence „Asia and Europe“ and Institute of Chinese Studies Scanning: MediaLab at Cluster First database: eXist-db, metadata in MODS XML Presentation at Cartoon Museum Basel „Visual Words - Comics from China“ (2010/11) Content expansion and separation of books and stories Image analysis project with Computer Vision (radiances) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 7. Enduser interface (2010) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Project History 2009-11 Main digitisation project Funding: Cluster of Excellence „Asia and Europe“ and Institute of Chinese Studies Scanning: MediaLab at Cluster First database: eXist-db, metadata in MODS XML Presentation at Cartoon Museum Basel „Visual Words - Comics from China“ (2010/11) Content expansion and separation of books and stories Image analysis project with Computer Vision (radiances) 2018: Data migration (ongoing): Mongo DB, ingest in XML, image service (IIIF), browse - search - filter, user annotation through Mirador viewer Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 14. Approaching the material Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 15. 趙百萬 Zhao Baiwan (1951) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 16. 红灯记 Hong deng ji (1970) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 17. 鲁迅和青年的故事 Lu Xun he qingnian de gushi (1976) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 18. 1. Spreadsheets 2. XML Database (MODS records) and frontend 3. Refined metadata schema (books & stories, agents), SQL database 4. Mongo database, IIIF image service, user annotations in Mirador Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 19. Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 20.
  • 21. Special cases Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 22. 生死緣 Sheng si yuan (1953) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 23. 高歌猛进 Gaoge mengjin (1952) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 24. Project Achievements 1. Scans from 5 collections (1250/1031 books) 2. Digitize material (greysacale .tif @600dpi, ca. 4 TB) 3. Record all metadata (as provided on books) 4. Provide online access to metadata 5. Provide online access to full books (read books) 6. Open data for research annotations (Mirador) 7. Explore automatic content analysis (computer vision) 8. *Link data to authorities (agents) 9. **Generate fulltext 10. **Explore generic content description (XML) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 25. Automatic object detection Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 26. Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19 • Chinese comics from the second half of the Cultural Revolution • Over 1200 books (~120,000 pages) • Grayscale .tif images @600 dpi ~ 4,5 Tb data • Focus: comic book production of late 1960s and 1970s from China. • Shows the diversity of Chinese comic production in general • Special type of emphasis (heroes, symbolic objects or idols)
  • 27. Specific type of emphasis: radiance
  • 28. Monroy et. al. (2011)
  • 31. Outcome of experiment • Automatic detection of objects using radiances • No training data, multiple categories, multiple scales, intense clutter, high object variability • However: no tool for re-use • …for object detection, text-image separation, auto-fulltext Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 32. New system and user annotation Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 34. Use of Mirador for (manual) annotation
  • 35. Arnold/Decker - Annotationssysteme für Bild- und Videomedien, 2016-01-26 35
  • 36. Standards - Interoperability 1. Image service: IIIF Image API using standardized IIIF image call, http://iiif.io 2. Image annotation: Web Annotation using Mirador IIIF viewer http://w3c.github.io/web-annotation/ 3. Bibliographic metadata: MODS XML for library catalogs 4. Agent‘s data: link to authorities (e.g. GND or VIAF) 5. Textual data (ideas): Publish metadata on research data platform (HeiDATA) Texts (stories, pre-/postface, paratext) in TEI XML Structure in CBML? 6. Re-use and enhancements: Include samples in Graphic Narrative or Vis. Lang Res. Corpus? Find partners for automatical (Chinese) comics analysis
  • 37. References • Seifert, Andreas. Bildgeschichten für Chinas Massen: Comic und Comicproduktion im 20. Jahrhundert. Köln: Böhlau, 2008. • Monroy, Antonio, Tobias Kröger, Matthias Arnold, and Björn Ommer. “Parametric Object Detection for Iconographic Analysis.” In SCCH11, 1-8. Heidelberg, 2011. • “Reddition : Reddition 63.” Edition Alfons, Dec. 2015. https://www.reddition.de/index.php/shop/reddition/reddition-63- detail. • Database: http://comics.freizo.org Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 38. Contact Matthias Arnold Heidelberg Research Architecture Cluster of Excellence “Asia and Europe in a Global Context” Heidelberg Centre for Transcultural Studies | HCTS Karl Jaspers Centre Voßstr. 2 | Building 4400 | Room 005b 69115 Heidelberg, Germany arnold@asia-europe.uni-heidelberg.de http://www.asia-europe.uni-heidelberg.de