Digital publishing has changed. Understand the base components that allow modern publishers to more easily publish content in multiple formats across multiple platforms.
Presentation originally developed by Apex VP and Principal Consultant Bill Kasdorf for a university press in June 2016, based on presentations on this subject that he has given to many organizations over the past ten years. Learn more at www.apexcovantage.com.
7. Content
Think first about the content,
not about the publication.
That helps you focus on
what things are,
not what they look like.
8. Content
Think first about the content,
not about the publication.
That helps you focus on
what things are,
not what they look like.
That leads to adaptable markup
that you can optimize for
print, online, ebooks, or apps.
11. Content Analysis
What kind of content is this?
Who needs it? Why? (Later, ask “how?”)
What pieces are meaningful?
12. Content Analysis
What kind of content is this?
Who needs it? Why? (Later, ask “how?”)
What pieces are meaningful?
What chunks are needed for rendering?
13. Content Analysis
What kind of content is this?
Who needs it? Why? (Later, ask “how?”)
What pieces are meaningful?
What chunks are needed for rendering?
What chunks will people want to point to?
14. Content Analysis
What kind of content is this?
Who needs it? Why? (Later, ask “how?”)
What pieces are meaningful?
What chunks are needed for rendering?
What chunks will people want to point to?
How does one chunk relate to other
chunks . . . across all your publications?
15. Content Analysis
What kind of content is this?
Who needs it? Why? (Later, ask “how?”)
What pieces are meaningful?
What chunks are needed for rendering?
What chunks will people want to point to?
How does one chunk relate to other
chunks . . . across all your publications?
The Goal:
THOUGHTFUL CHUNKING
16. Vocabulary and Markup:
What to name the components
and how to tag them
for editing,
typesetting,
and digital publishing.
It works best if the same vocabulary
(but not necessarily the same markup syntax)
can be used for all of these
phases of your workflow.
17. Design: Typography and Layout
Typography is really implied “markup.”
Typography distinguishes the components.
Layout is a navigation guide.
This is a centuries-in-the-making
collection of design conventions.
Design is based on semantic distinctions:
What is this thing? How important is it?
How does it relate to
the other things around it?
19. What do you see
on this page?
“Huge numeral?”
“24 pt Meta, fl rr?”
“11 pt Charter,
letterspaced?”
“Rag right para
indented on left?”
“12 pt Meta Black all
caps, & sm caps?”
“Bold term?”
I don’t think so. . . .
20. Here’s what we
“see” on this page:
“Chapter number”
“Chapter title”
“Author’s name”
“Introductory
paragraph”
“Level 1 subhead”
“Level 2 subhead”
“Glossary term”
We see structure and
semantics, not specs.
23. <CN> </CN>
</CT>
</AU>
<INTRO>
</INTRO>
<H1>
<H2>
</H1>
</H2>
<CT>
<AU>
<GLOSS> </GLOSS>
Here’s one possible
markup scheme:
“Chapter number”
“Chapter title”
“Author’s name”
“Introductory
paragraph”
“Level 1 subhead”
“Level 2 subhead”
“Glossary term”
That’s XML markup.
Those are “tags.”
You don’t have
to use XML.
You do need
some form of
markup, even
if in the form
of styles, to
distinguish the
components.
XML is the
most powerful,
future-proof
markup.
28. XML
XML liberates your content
from any particular page design,
any particular reading system,
any particular workflow.
Print, app, ebook, and online:
all from the same XML document!
29. XML is not a set of tags.
It is a LANGUAGE for expressing:
30. XML is not a set of tags.
It is a LANGUAGE for expressing:
• Semantic information: what the pieces are
31. XML is not a set of tags.
It is a LANGUAGE for expressing:
• Semantic information: what the pieces are
• Structural information:
how the pieces fit together
32. XML is not a set of tags.
It is a LANGUAGE for expressing:
• Semantic information: what the pieces are
• Structural information:
how the pieces fit together
• Metadata: information about the content
33. XML is not a set of tags.
It is a LANGUAGE for expressing:
• Semantic information: what the pieces are
• Structural information:
how the pieces fit together
• Metadata: information about the content
• Presentation information, but only where
semantics and structure don’t apply
34. XML is not a set of tags.
It is a LANGUAGE for expressing:
• Semantic information: what the pieces are
• Structural information:
how the pieces fit together
• Metadata: information about the content
• Presentation information, but only where
semantics and structure don’t apply
. . . creating an unlimited number
of presentations from
a single XML document.
35. So where do the tags
come from?
Surely you don’t
just make them up.
Wasn’t the whole point
to make the tagging
clear, consistent,
and non-proprietary?
36. Well, technically,
you can just make them up.
But then only you know what they mean.
As long as you follow the XML rules,
it’s called “well-formed” XML.
37. Well, technically,
you can just make them up.
But then only you know what they mean.
As long as you follow the XML rules,
it’s called “well-formed” XML.
It’s better to have a formal specification
(a DTD or other schema), and if your XML
also conforms to that, it’s called
“valid” XML (which is also well-formed).
That lets any XML-based system
interpret and use your markup.
38. DTD
Document Type Definition
A special formal syntax
used to define a particular
type of document
or set of related documents.
It defines a tag set:
the specific tags and how they’re used.
39. DTD
Elements are the nouns:
e.g., <title> or <blockquote>.
A chunk of content is surrounded by
a “start tag” and an “end tag”:
e.g., <title>This Publication</title>;
and elements must “nest” properly.
Now systems can tell the chunks apart
and process them appropriately.
40. DTD
Attributes are the adjectives
that describe the elements:
e.g., <title class="title-page">
vs. <title class="chapter">.
Now they can be distinguished,
processed, and rendered differently.
Unique IDs identify “this specific one,” e.g.,
<section class="chapter" id="ch001">.
41. DTD
DTDs can also define metadata:
information about the content.
For example:
• Bibliographic information
• Subject codes
• Author and publisher information
• Technical information
• Rights and usage information
42. DTD
DTDs (or other types of schemas)
are often called “models.”
Most publishers’ models today
are based on one of a number of
standard models that are
widely used and well known
in a certain “community.”
43. Some Standard Models
DocBook
A generic book model, initially developed
for technical books and documentation
TEI, the Text Encoding Initiative
Mainly used for textual research
NLM/JATS/BITS
The model for scholarly journals and books
XHTML
The language of the Web and EPUB,
expressed as XML
44. Some Standard Models
DocBook
A generic book model, initially developed
for technical books and documentation
TEI, the Text Encoding Initiative
Mainly used for textual research
NLM/JATS/BITS
The model for scholarly journals and books
XHTML
The language of the Web and EPUB,
expressed as XML
These each
provide a
standard,
widely used
framework
to which a
publisher’s
specific
vocabulary
can be added
to address
their needs.
46. We all know what the stages of the
editorial and production workflow are . . .
Design.
Copyediting.
Typesetting.
Artwork.
Indexing.
Quality Control.
Online/Ebook Creation.
. . . but we need to look deeper
to optimize how they work
in any given organization.
47. They’re usually done in silos.
Which are hard to see into,
and are starting to break down.
48. Thinking of these stages
in the traditional way
leads to suboptimization.
In today’s digital ecosystem
we need to deconstruct them
in order to optimize:
Who does what?
At what stage(s) of the workflow?
How to best manage the process?
49. Who Does What?
Do it in-house?
Outsource it?
Automate it?
You can’t answer these questions properly
without deconstructing the categories.
And the answers differ
from publisher to publisher.
50. At What Stage(s) of the Workflow?
How do these aspects intersect?
How do you avoid duplication and rework?
How do you get out of “loopy QC”?
Getting the right things right upstream
eliminates a lot of headaches downstream.
51. How Best to Manage the Process?
Balancing predictability and creativity:
where to be strict, and where to be flexible?
How can systems and standards help?
Buy vs. build vs. wing it?
Your systems, partners, and processes
should make it easy for you to do the right work
and keep you from doing the wrong work.
53. Copyediting
Editing in Word?
Who cleans up the author’s messy MS files?
Who “normalizes” the styling?
Who designs those styles in the first place?
Who checks all the links to figures,
tables, cross references, notes?
Who actually does the intellectual work?
How do the files get trafficked?
What about version control?
54. Copyediting
Editing in Word?
Who cleans up the author’s messy MS files?
Who “normalizes” the styling?
Who designs those styles in the first place?
Who checks all the links to figures,
tables, cross references, notes?
Who actually does the intellectual work?
How do the files get trafficked?
What about version control?
The copyeditor?
The project or production editor?
Dedicated in-house file prep team?
Outsourced to vendor?
“Normalizing? What’s that?”
55. Copyediting
Editing in Word?
Who cleans up the author’s messy MS files?
Who “normalizes” the styling?
Who designs those styles in the first place?
Who checks all the links to figures,
tables, cross references, notes?
Who actually does the intellectual work?
How do the files get trafficked?
What about version control?
They need to be
aligned with your XML markup
and easy to use by the copyeditor.
56. Copyediting
Editing in Word?
Who cleans up the author’s messy MS files?
Who “normalizes” the styling?
Who designs those styles in the first place?
Who checks all the links to figures,
tables, cross references, notes?
Who actually does the intellectual work?
How do the files get trafficked?
What about version control?
The copyeditor?
An editorial assistant?
The editorial vendor?
The typesetter?
Software?
57. Copyediting
Editing in Word?
Who cleans up the author’s messy MS files?
Who “normalizes” the styling?
Who designs those styles in the first place?
Who checks all the links to figures,
tables, cross references, notes?
Who actually does the intellectual work?
How do the files get trafficked?
What about version control?
In-house copyeditor?
Freelance copyeditor?
An editorial service?
Full-service comp vendor?
58. Copyediting
Editing in Word?
Who cleans up the author’s messy MS files?
Who “normalizes” the styling?
Who designs those styles in the first place?
Who checks all the links to figures,
tables, cross references, notes?
Who actually does the intellectual work?
How do the files get trafficked?
What about version control?
Email files, named whatever. . . .
Consistent file naming, FTP, transmittals.
Digital Asset Management System (DAM).
Content Management System (CMS).
59. Typesetting
Who determines the tags or style names?
How do the editing styles translate to comp?
Who does the artwork?
How are figures, tables, etc. placed?
Are links preserved or implemented?
How do the files get trafficked?
What about version control?
60. Typesetting
Who determines the tags or style names?
How do the editing styles translate to comp?
Who does the artwork?
How are figures, tables, etc. placed?
Are links preserved or implemented?
How do the files get trafficked?
What about version control?
Freelance designer, ad hoc?
Compositor’s own system?
Publisher’s system?
XML?
61. Typesetting
Who determines the tags or style names?
How do the editing styles translate to comp?
Who does the artwork?
How are figures, tables, etc. placed?
Are links preserved or implemented?
How do the files get trafficked?
What about version control?
“They don’t.”
“The typesetter does it,
we don’t know what they do.”
Word styles imported into InDesign.
Programmatic transforms to XML.
62. Typesetting
Who determines the tags or style names?
How do the editing styles translate to comp?
Who does the artwork?
How are figures, tables, etc. placed?
Are links preserved or implemented?
How do the files get trafficked?
What about version control?
“Then we fix it in-house.”
“We send it to an art studio.”
“The typesetter fixes it.”
“We make the author fix it.”
“It depends. . . .”
“The author. Sorta.”
63. Typesetting
Who determines the tags or style names?
How do the editing styles translate to comp?
Who does the artwork?
How are figures, tables, etc. placed?
Are links preserved or implemented?
How do the files get trafficked?
What about version control?
Manually based on callouts
marked by copyeditor.
Automatically from XML in
Typefi, 3B2.
64. Typesetting
Who determines the tags or style names?
How do the editing styles translate to comp?
Who does the artwork?
How are figures, tables, etc. placed?
Are links preserved or implemented?
How do the files get trafficked?
What about version control?
“Nope.”
“The typesetter adds them.”
“We put them in when we make the ebook.”
“Yes, they’re in the XML.”
65. Typesetting
Who determines the tags or style names?
How do the editing styles translate to comp?
Who does the artwork?
How are figures, tables, etc. placed?
Are links preserved or implemented?
How do the files get trafficked?
What about version control?
Email files, named whatever. . . .
Consistent file naming, FTP, transmittals.
Digital Asset Management System (DAM).
Content Management System (CMS).
Sound familiar?
66. Workflow
Workflow is where it all comes together:
A vocabulary that fits your publications.
Markup that makes your content agile.
Metadata that makes it meaningful.
The standards that make it interoperable.
The technologies that fit your capabilities.
68. Publications today are composed of
a multitude of files and formats.
Text Files
Metadata
Image Files
Video and Audio Files
Scripts
Fonts
Stylesheets
Deliverable Products
XML is not the whole story!
69. Some Common Text File Formats
Microsoft Word
Used for most authoring and editing
TeX/LaTeX
Common for math, statistics, engineering
InDesign
The leading design/page layout format
XML
The foundation of most modern publishing
HTML5
The format of the World Wide Web
70. Some Common Text File Formats
Microsoft Word
Used for most authoring and editing
TeX/LaTeX
Common for math, statistics, engineering
InDesign
The leading design/page layout format
XML
The foundation of most modern publishing
HTML
The format of the World Wide Web
Ubiquitous but typically undisciplined
Authors do lots of inconsistent, messy things
Style templates work well for editing
Visually distinct styles for elements,
names align with terms in rest of workflow
Old .doc is “binary”; new .docx is XML
Don’t get excited; this “WordML” is full of messy stuff,
but at least it can be worked with
71. Some Common Text File Formats
Microsoft Word
Used for most authoring and editing
TeX/LaTeX
Common for math, statistics, engineering
InDesign
The leading design/page layout format
XML
The foundation of most modern publishing
HTML
The format of the World Wide Web
Very specialized
Encountered only in specific disciplines
Often used for authoring + typesetting
Difficult to convert, so publishers often
treat TeX as an outlier and skip XML
72. Some Common Text File Formats
Microsoft Word
Used for most authoring and editing
TeX/LaTeX
Common for math, statistics, engineering
InDesign
The leading design/page layout format
XML
The foundation of most modern publishing
HTML
The format of the World Wide Web
Ideal for design-intensive publications
Integrated withAdobe’s full toolset, now cloud-based
Structure: paragraph & character styles
Align vocabulary with rest of workflow
Can import and export XML
This is how Typefi and PShift work;
IDML and EPUB export can be problematic
73. Some Common Text File Formats
Microsoft Word
Used for most authoring and editing
TeX/LaTeX
Common for math, statistics, engineering
InDesign
The leading design/page layout format
XML
The foundation of most modern publishing
HTML5
The format of the World Wide Web
Most flexible, future-proof format
Adapts as technologies change
and new products are developed
Optimal for multi-channel delivery
Same XML file for print, ebook, app, & online,
either directly or with automated transformation
74. Some Common Text File Formats
Microsoft Word
Used for most authoring and editing
TeX/LaTeX
Common for math, statistics, engineering
InDesign
The leading design/page layout format
XML
The foundation of most modern publishing
HTML5
The format of the World Wide Web
Can be expressed as XML: XHTML5
The HTML “tag set” following XML syntax and rules
HTML5 is structure + semantics
Presentation is via CSS (Cascading Style Sheets)
Basis of Open Web Platform and EPUB 3
OWPis a huge collection of standards
that form the Web ecosystem:
HTML5, CSS3, JavaScript, and many more
75. Some Common Image Formats
TIFF (.tif or .tiff)
“Tagged Image File Format”
JPEG (.jpg or .jpeg)
“Joint Photographic Experts Group”
GIF (.gif)
“Graphics Interchange Format”
PNG (.png)
“Portable Network Graphics”
SVG (.svg)
“Scalable Vector Graphics”
76. Some Common Image Formats
TIFF (.tif or .tiff)
“Tagged Image File Format”
JPEG (.jpg or .jpeg)
“Joint Photographic Experts Group”
GIF (.gif)
“Graphics Interchange Format”
PNG (.png)
“Portable Network Graphics”
SVG (.svg)
“Scalable Vector Graphics”
Mainly used for photos (continuous tone)
“Raster”or“bitmap”(gridofpixels)
Typically“lossless”:keepsalltheimagedata
Primarily for print
Grayscale or CMYK high-resolution images
File sizes are usually quite large, esp. color images
77. Some Common Image Formats
TIFF (.tif or .tiff)
“Tagged Image File Format”
JPEG (.jpg or .jpeg)
“Joint Photographic Experts Group”
GIF (.gif)
“Graphics Interchange Format”
PNG (.png)
“Portable Network Graphics”
SVG (.svg)
“Scalable Vector Graphics”
Alsomainlyforcontinuoustoneimages
“Lossy”compression:canadjustbalanceof
qualityandfilesize
Primarily for online, ebooks, etc.
Time to “load” is a factor (plus device capacity)
Preserve more data when zooming is needed
78. Some Common Image Formats
TIFF (.tif or .tiff)
“Tagged Image File Format”
JPEG (.jpg or .jpeg)
“Joint Photographic Experts Group”
GIF (.gif)
“Graphics Interchange Format”
PNG (.png)
“Portable Network Graphics”
SVG (.svg)
“Scalable Vector Graphics”
Mainly for line art(diagrams, flat color)
Smallfilesize:designedforonline/digital
Losslesscompression
Can be animated: “Animated GIF”
[Editorial comment:
also can be annoying. ;-) ]
79. Some Common Image Formats
TIFF (.tif or .tiff)
“Tagged Image File Format”
JPEG (.jpg or .jpeg)
“Joint Photographic Experts Group”
GIF (.gif)
“Graphics Interchange Format”
PNG (.png)
“Portable Network Graphics”
SVG (.svg)
“Scalable Vector Graphics”
Created as open-source successor to GIF
Smallfilesizeforlineart,flatcolor;offersexcellent
quality,goodtransparency,losslesscompression
Can be used for photos or line art.
Better than JPEG at flat color areas,
but PNG photos are larger files than JPEGs
80. Some Common Image Formats
TIFF (.tif or .tiff)
“Tagged Image File Format”
JPEG (.jpg or .jpeg)
“Joint Photographic Experts Group”
GIF (.gif)
“Graphics Interchange Format”
PNG (.png)
“Portable Network Graphics”
SVG (.svg)
“Scalable Vector Graphics”
W3C standardXML-based vector format
VectormathbasedonAdobe’sPDF/Postscript
Searchable,accessibletext
No loss of quality when resized
Sharp on for laptop, tablet, phone, zoom—like PDF
Not widely or consistently implemented yet,
but should become a dominant image format
81. . . . and Some Common Proprietary Formats
AI (.ai)
Adobe Illustrator
PSD (.psd)
Photoshop
EPS (.eps)
Encapsulated Postscript
PPT (.ppt)
PowerPoint
WMF/EMF
Windows Metafile / Enhanced Metafile
These are used
in production
but don’t belong
in deliverable
products.
82. Audio and Video Formats
HTML5 vs. Proprietary
Best: open formats permitted by HTML5
in the <audio> and <video> elements:
theyworknativelyinbrowsers&e-readers
Proprietary formats like Flash (.swf) and
QuickTime (.mov, .qt) require plug-ins
Ideal: Formats Recommended by EPUB 3
Audio: MP3 and MP4 AAC LC
Video: H.264 and VP8/WebM
(often both due to browser/RS inconsistency)
83. Scripts
JavaScript
Fundamental to the Open Web Platform
JavaScript Libraries
“Pre-written” scripts to adapt as needed
Most popular: open-source jQuery
Widgets
Interactive features like quizzes, sliders,
“assessments” in educational content,
graphing data from a table, etc.
84. Fonts
OpenType
Primary font format for print
WOFF
Primary font format for web
Licensing
Know what rights you’ve got!
Obfuscating and Embedding
Enable ebook to contain the fonts it needs
Unicode Fonts
Character encoding of the Web & XML
85. Fonts
OpenType
Primary font format for print
WOFF
Primary font format for web
Licensing
Know what rights you’ve got!
Obfuscating and Embedding
Enable ebook to contain the fonts it needs
UNICODE Fonts
Encoding aligns with Web and XML
The “legal” fonts in EPUB3
Readingsystemsrequiredtohandleboth—
butmanysystemsjustusetheirowndefaultfontsnow
Many fonts available in both formats
WOFF is a “wrapper” for underlying font data/metrics
86. Fonts
OpenType
Primary font format for print
WOFF
Primary format for web
Licensing
Know what rights you’ve got!
Obfuscating and Embedding
Enable ebook to contain the fonts it needs
UNICODE Fonts
Encoding aligns with Web and XML
Needlicense toembed font in ebook
Beware“free”fonts!“OpenLicenseFonts”aresafe
Need“fallbacks” for embedded fonts
“Systemfonts”arebuiltintoareadingsystem
“Webfonts”requireyoutobeonline—notforebooks
TheCSSletsyoudefaultto“serif”or“sans-serif ”
Embedded fonts for “special characters”
Math,linguistics,quotesfromnon-latinlanguages
87. Fonts
OpenType
Primary font format for print
WOFF
Primary format for web
Licensing
Know what rights you’ve got!
Obfuscating and Embedding
Enable ebook to contain the fonts it needs
Unicode Fonts
Character encoding of the Web & XML
AllthecharactersinXML
are Unicode by definition
Thisenablesunambiguous characterspecification
Word, InDesign, and XML-based systems
allunderstandanduseUnicode
UseUnicodefontsthroughoutyourworkflow!
88. Stylesheets
Word
A good “styles library” helps add
structure and semantics
InDesign/Quark
Paragraph styles and character styles
ensure consistency, efficiency
Browsers/Ebooks
CSS (Cascading Style Sheets)
Adapts rendering for context/device
Enables “responsive design”
89. Deliverable Products
PDF
Preserves look of typeset page
Used for printing, online delivery
Doesn’t “reflow” for different screen sizes
EPUB
International standard format
Non-proprietary, works almost everywhere
Reflowable or fixed layout
KF8
Amazon’s proprietary ebook format