5. What it isn't
We'll concentrate on web-based tools for
extracting text from images, not addressing:
● Oral History
● Video
● Audio Transcription
● Image Manipulation
● Transcription/Facsimile Display
Tools exist for these tasks, nevertheless.
12. Online Tools
● Recent (none older than 2005)
● Influenced by origin
● Still pretty raw
● Most require tech expertise for set-up and
customization
● All require making trade-offs
13. Lab Session 1: Breadth
NYPL What's on the Menu
Indexing
Wikisource
Editing
14. Selection Factors
● Source Material
● Transcript Purpose
● Organizational/Project Management Fit
● Financial and Technical Resources
15. Source Material
Evaluating your source material:
● Is it of interest to anyone else?
● Is it under copyright?
● Does it need restricted access?
● Is it composed of documents or records?
● Is it non-textual?
● How complex is the layout? How important
is that layout?
16. Purpose
How will you be using the transcribed data?
● Traditional print editions
● Searchable online editions
● Do you want to use the system to analyze
the text?
● How do you want to analyze the text?
● Is public engagement a goal?
● Should the transcripts be open?
17. Organizational/Project Management Fit
● How important is traditional editorial
workflow?
● Will you rely on volunteers? How will you
motivate them?
● What is the duration of the project?
● Is there a "final version"?
● Is TEI a mandate?
18. Financial and Technical Resources
Do you have or need:
● System administrators to install non-hosted
software?
● Money to pay hosting costs?
● Programming skills to customize a tool?
● Money to pay programmers for
customization?
● Support for on-going costs to keep the site
running, however small?
20. Technical Questions to Answer
● Where are the images now?
● How do images get into the system?
● How do transcripts get out of the system?
● How mature is the underlying technology?
● How configurable is the technology?
● How does the system work with the public
face of your project?
● Where does the metadata live?
● Who will maintain this? How long?
● How many sites are using this system?
21. Wikisource
Pro:
● Mediawiki plus its add-on modules (e.g.
print-on-demand, export).
● Wikimedia community.
● Incredibly mature.
Con:
● Wikimedia policy.
● Public editing.
● Limited mark-up.
22. Bentham Transcription Desk
Pro:
● MediaWiki is very mature.
● TEI Toolbar (can also be used on other
systems)
● Deployed outside original project.
Con:
● Development efforts halted.
23. Scripto
Pro:
● Team at CHNM has a great track record.
● Your CMS is your public face.
● MediaWiki is very mature.
● Deployed and under active development.
Con:
● Your CMS handles all metadata.
● Mark-up is extremely limited.
24. FromThePage
Pro:
● Designed for intensive editing and indexing.
● Semantic mark-up and analysis.
● Hosting available.
Con:
● Single developer (me).
● No TEI mark-up.
25. Islandora TEI Editor
Caveat: I don't know much about this tool or
this team.
● Based on Drupal and Fedora
● Supports TEI via friendly interface
● Many Drupal-based projects considering it.
26. T-PEN
Caveat: I don't know much about this tool.
● Designed for medieval manuscripts.
● Supports TEI natively.
● Line-by-line interface.
● Hosted version available.
27. Scribe
Pro:
● Excellent for complex layout or non-
documentary transcription.
● Zooniverse team is large, well-funded,
experienced.
● Configurable.
Con:
● No automated tool for loading images or
viewing transcript database (yet!)
● No concept of image-as-a-text.
28. Pybossa
Caveat: I don't know much about this tool or
this team.
● Open Knowledge Foundation's
crowdsourcing task management tool.
● Designed for tabular data.
● Google Spreadsheet data entry.
● Extremely young.
29. TextLab
Caveat: I don't know much about this tool or
this team.
● Melville Electronic Library.
● Direct addition of TEI tags to image.
30. Lab Session 3: Configuration
Scribe
Old Weather,
What's the Score,
Development deployments
31. Find me
Ben Brumfield
benwbrum@gmail.com
http://manuscripttranscription.blogspot.com/
@benwbrum