SlideShare a Scribd company logo
1 of 19
From Early Modern Printing to Post-
Modern Indie Publishing
Using eMOP on AFP
Jennifer Hecker [@lasuprema]
 austinfanzineproject.org/
Matthew Christy [@matt_christy]
 emop.tamu.edu/
&
Fanzine? Zine?
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 3
“A magazine produced
for love, not money.”
- I didn’t make this up, but I have no idea who said it first
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 4
Background
Original Concept
Austin Fanzine Digitization, Transcription
& Indexing Project
 Access-focused
 DIY Digitization & online submissions
 Creator/community-sourced
transcription
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 5
Evolution into DH Sandbox
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 6
Kevin Powell
Spring 2013
Kristin Bongiovanni
Spring 2014
Kate Neptune
Summer 2014
Transcription Issues
Inconsistent layout (columns, offset
text, text-wrapped around other text)
Inconsistent humans (style-guides and
subject knowledge help)
Images
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 7
eMOP – Intro
 The Early Modern OCR Project (eMOP) is an
 Andrew W. Mellon Foundation funded grant project running out
of the Initiative for Digital Humanities, Media, and Culture
(IDHMC) at Texas A&M University, to
 develop and test tools and techniques to apply Optical
Character Recognition (OCR) to early modern English
documents
 from the hand press period, roughly 1475-1800.
 eMOP aims to improve the visibility of early modern texts by
making their contents fully searchable. The current
paradigm of searching special collections for early modern
materials by either metadata alone or “dirty” OCR is
insufficient for scholarly research.
8From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP – The Numbers
Page Images
 Early English Books online
(Proquest) EEBO: ~125,000
documents, ~13 million
pages images (1475-1700)
 Eighteenth Century
Collections Online (Gale
Cengage) ECCO: ~182,000
documents, ~32 million page
images (1700-1800)
 Total: >300,000 documents &
45 million page images.
GroundTruth
 Text Creation Partnership
TCP: ~46,000 double-keyed
hand transcribed
docuemnts
 44,000 EEBO
 2,200 ECCO
9From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP–TheData
10From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP – The Problems
 Early Modern Printing
 Individual, hand-made typefaces
 Worn and broken type
 Poor quality equipment/paper
 Inconsistent line bases
 Unusual page layouts, decorative
page elements,
 Special characters & ligatures
 Spelling variations
 Mixed typefaces and languages
 over/under-inking
 Digitization
 Old, low-quality, small tiff files
 Noise, skew, warp, bleedthrough
11From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
Page Images
12From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP–Workflow
13
Page image pre-processing
Tesseract Training
deNoising
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP – Pre-processing
14
Original Binarized De-noised
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
AFP - Results
 Geek Weekly #3
 9 pages of GroundTruth for typed pages
 63.9% correct on all 9 pages
 94.2% correct on 6 pages
 Analysis of what didn’t work
 Handwriting
 Page 10 was printed in an unusual italic typeface
 could create training – eMOP
 Pages 24 & 25 had good text recognition, but wrong reading
order
 Can put in FromThePage
15
Page 10
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP – De-noising
16
Before: 35% After: 58%
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP – De-noising
17
Before After
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
Integrating eMOP
 From the Page: new status designation will be added
 Launch refocused transcription effort this summer
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 18
Possible Applications
 other collections of print ephemera with
messy layout like posters, flyers, handbills,
ticket stubs, track listings, liner notes, other
publications
 DH coursework, public engagement
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 19
More information:
 eMOP
 emop.tamu.edu/
 Austin Fanzine Project
 www.AustinFanzineProject.org
 www.facebook.com/AustinFanzineProject
 @ATXFanzineProj
 AFDTIP@gmail.com
 “Why We’re Not Digitizing Zines,” Kelly Wooten, 2009,
http://blogs.library.duke.edu/digital-
collections/2009/09/21/why-were-not-digitizing-zines/
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 20

More Related Content

Viewers also liked

Printing Techniques
Printing TechniquesPrinting Techniques
Printing Techniques
Harry Neal
 
Decorative metalwork fw
Decorative metalwork fwDecorative metalwork fw
Decorative metalwork fw
fjpwhelan
 
Flexographic printing 1(9551)
Flexographic printing 1(9551)Flexographic printing 1(9551)
Flexographic printing 1(9551)
Md Ali Hossain
 
Printing process 3
Printing process 3Printing process 3
Printing process 3
Jutka Czirok
 
The printing processes
The printing processesThe printing processes
The printing processes
Jutka Czirok
 
Printing processes
Printing processesPrinting processes
Printing processes
Jutka Czirok
 
Printing technologies compared
Printing technologies comparedPrinting technologies compared
Printing technologies compared
SappiHouston
 
Basics of offset printing and other printing techniques
Basics of offset printing and other printing techniquesBasics of offset printing and other printing techniques
Basics of offset printing and other printing techniques
SappiHouston
 

Viewers also liked (18)

Mchristy-eMOP-workflows2-24x7
Mchristy-eMOP-workflows2-24x7Mchristy-eMOP-workflows2-24x7
Mchristy-eMOP-workflows2-24x7
 
Dh2014 e mopcobre-complete
Dh2014 e mopcobre-completeDh2014 e mopcobre-complete
Dh2014 e mopcobre-complete
 
Future Of Printing Techniques
Future Of Printing TechniquesFuture Of Printing Techniques
Future Of Printing Techniques
 
Printing techniques
Printing techniquesPrinting techniques
Printing techniques
 
Flexography
FlexographyFlexography
Flexography
 
Printing Techniques
Printing TechniquesPrinting Techniques
Printing Techniques
 
Printing Techniques & Substrates
Printing Techniques & SubstratesPrinting Techniques & Substrates
Printing Techniques & Substrates
 
Decorative metalwork fw
Decorative metalwork fwDecorative metalwork fw
Decorative metalwork fw
 
Printing Ink
Printing InkPrinting Ink
Printing Ink
 
Flexographic printing 1(9551)
Flexographic printing 1(9551)Flexographic printing 1(9551)
Flexographic printing 1(9551)
 
Printing process 3
Printing process 3Printing process 3
Printing process 3
 
Printing techniques
Printing techniquesPrinting techniques
Printing techniques
 
The printing processes
The printing processesThe printing processes
The printing processes
 
Printing processes
Printing processesPrinting processes
Printing processes
 
Printing technologies compared
Printing technologies comparedPrinting technologies compared
Printing technologies compared
 
Basics of offset printing and other printing techniques
Basics of offset printing and other printing techniquesBasics of offset printing and other printing techniques
Basics of offset printing and other printing techniques
 
Global print markets to 2016
Global print markets to 2016Global print markets to 2016
Global print markets to 2016
 
Asian paint ppt
Asian paint pptAsian paint ppt
Asian paint ppt
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 

From Early Modern Printing to Post-Modern Indie Publishing: Using eMOP on AFP

  • 1. From Early Modern Printing to Post- Modern Indie Publishing Using eMOP on AFP Jennifer Hecker [@lasuprema]  austinfanzineproject.org/ Matthew Christy [@matt_christy]  emop.tamu.edu/ &
  • 2. Fanzine? Zine? From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 3 “A magazine produced for love, not money.” - I didn’t make this up, but I have no idea who said it first
  • 3. From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 4 Background
  • 4. Original Concept Austin Fanzine Digitization, Transcription & Indexing Project  Access-focused  DIY Digitization & online submissions  Creator/community-sourced transcription From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 5
  • 5. Evolution into DH Sandbox From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 6 Kevin Powell Spring 2013 Kristin Bongiovanni Spring 2014 Kate Neptune Summer 2014
  • 6. Transcription Issues Inconsistent layout (columns, offset text, text-wrapped around other text) Inconsistent humans (style-guides and subject knowledge help) Images From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 7
  • 7. eMOP – Intro  The Early Modern OCR Project (eMOP) is an  Andrew W. Mellon Foundation funded grant project running out of the Initiative for Digital Humanities, Media, and Culture (IDHMC) at Texas A&M University, to  develop and test tools and techniques to apply Optical Character Recognition (OCR) to early modern English documents  from the hand press period, roughly 1475-1800.  eMOP aims to improve the visibility of early modern texts by making their contents fully searchable. The current paradigm of searching special collections for early modern materials by either metadata alone or “dirty” OCR is insufficient for scholarly research. 8From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 8. eMOP – The Numbers Page Images  Early English Books online (Proquest) EEBO: ~125,000 documents, ~13 million pages images (1475-1700)  Eighteenth Century Collections Online (Gale Cengage) ECCO: ~182,000 documents, ~32 million page images (1700-1800)  Total: >300,000 documents & 45 million page images. GroundTruth  Text Creation Partnership TCP: ~46,000 double-keyed hand transcribed docuemnts  44,000 EEBO  2,200 ECCO 9From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 9. eMOP–TheData 10From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 10. eMOP – The Problems  Early Modern Printing  Individual, hand-made typefaces  Worn and broken type  Poor quality equipment/paper  Inconsistent line bases  Unusual page layouts, decorative page elements,  Special characters & ligatures  Spelling variations  Mixed typefaces and languages  over/under-inking  Digitization  Old, low-quality, small tiff files  Noise, skew, warp, bleedthrough 11From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 11. Page Images 12From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 12. eMOP–Workflow 13 Page image pre-processing Tesseract Training deNoising From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 13. eMOP – Pre-processing 14 Original Binarized De-noised From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 14. AFP - Results  Geek Weekly #3  9 pages of GroundTruth for typed pages  63.9% correct on all 9 pages  94.2% correct on 6 pages  Analysis of what didn’t work  Handwriting  Page 10 was printed in an unusual italic typeface  could create training – eMOP  Pages 24 & 25 had good text recognition, but wrong reading order  Can put in FromThePage 15 Page 10 From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 15. eMOP – De-noising 16 Before: 35% After: 58% From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 16. eMOP – De-noising 17 Before After From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  • 17. Integrating eMOP  From the Page: new status designation will be added  Launch refocused transcription effort this summer From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 18
  • 18. Possible Applications  other collections of print ephemera with messy layout like posters, flyers, handbills, ticket stubs, track listings, liner notes, other publications  DH coursework, public engagement From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 19
  • 19. More information:  eMOP  emop.tamu.edu/  Austin Fanzine Project  www.AustinFanzineProject.org  www.facebook.com/AustinFanzineProject  @ATXFanzineProj  AFDTIP@gmail.com  “Why We’re Not Digitizing Zines,” Kelly Wooten, 2009, http://blogs.library.duke.edu/digital- collections/2009/09/21/why-were-not-digitizing-zines/ From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 20

Editor's Notes

  1. Some were great most were not Noisy Skewed Warped Or they posed challenges for OCR engines Multiple pages per image Multiple columns Images & decorative elements Marginalia Missing margins many were terrible
  2. Before: 55% After: 73%