SlideShare a Scribd company logo
1 of 59
Engineering Next-
Generation Publishing
Workflows
IDPF Digital Book 2013
May 30, 2013
Sanders Kleinfeld
O’Reilly Media, Inc.
How do you
write a book?
How do you
write a “book”?
How do you
write an (e)book?
How do you
“write” an (e)book?
Anatomy of an ebook: EPUB
What you see
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://
www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Chapter 1. A Python Q&amp;A Session</title>
<link rel="stylesheet" href="core.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.74.0" />
</head>
<body>
<div class="chapter" title="Chapter 1. A Python Q&amp;A Session">
<div class="titlepage”>
<div>
<div>
<h1 class="title">
<a id="a_python_q_ampersand_a_session”></a>
Chapter 1. A Python Q&amp;A Session
</h1>
</div>
</div>
</div>
<p>If you’ve bought this book, you may already know what Python is
and why it’s an important tool to learn. If you don’t, you probably won’t be sold
on Python until you’ve learned the language by reading the rest of this book and
have done a project or two. But before we jump into details, the first few pages
of this book will briefly introduce some of the main reasons behind Python’s
popularity. To begin sculpting a definition of Python, this chapter takes the form
of a question-and-answer session, which poses some of the most common
questions asked by beginners.</p>
What’s inside
Ebooks are made of
code. If you are an ebook
publisher, you are in the
software-development
business.
An Inconvenient Truth:
How do you
“write” an (e)book?
How do you
develop an (e)book?
Five Key Principles of a
Modern (e)Book Workflow
#1. Semantic Markup Matters
#2. Single Source, Multiple Outputs
#3. Automate Your Headaches Away
#4. Versioning is the New Spell-Check
#5. Always think “Digital First”
#1 Semantic Markup
Matters
First Chapter of My Memoirs
Microsoft
Word
Underlying Representation of Content
(Word XML)
<w:body><w:p w:rsidR="0073527D" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:jc w:val="right"/
><w:rPr><w:sz w:val="96"/><w:szCs w:val="96"/></w:rPr></w:pPr><w:r w:rsidRPr="007F1550”><w:rPr><w:sz
w:val="96"/><w:szCs w:val="96"/></w:rPr>!
!
<w:t>1</w:t>!
!
</w:r></w:p><w:p w:rsidR="007F1550" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:jc w:val="right"/>!
<w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr></w:pPr><w:r w:rsidRPr="007F1550”><w:rPr><w:sz
w:val="72"/><w:szCs w:val="72"/></w:rPr>!
!
<w:t>Autobiography of Me</w:t>!
!
</w:r></w:p><w:p w:rsidR="007F1550" w:rsidRPr="007F1550" w:rsidRDefault="007F1550" w:rsidP="007F1550">!
<w:pPr><w:jc w:val="right"/><w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr></w:pPr></w:p>!
<w:p w:rsidR="007F1550" w:rsidRPr="00032659" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:rPr>!
<w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr></w:pPr><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/>!
<w:szCs w:val="48"/></w:rPr>!
!
<w:t xml:space="preserve">I was born in 1980, I love chocolate ice cream, and I am a </w:t>!
!
</w:r><w:r w:rsidRPr="00032659”><w:rPr><w:i/><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>!
!
<w:t>wicked awesome</w:t>!
!
</w:r><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>!
!
<w:t xml:space="preserve"> writer, </w:t></w:r>!
!
<w:proofErr w:type="spellStart"/><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/><w:szCs w:val="48"/></
w:rPr>!
!
<w:t>yo</w:t>!
!
</w:r><w:proofErr w:type="spellEnd"/>!
…!
Three Problems with this XML
•  Markup is not semantic!
•  It conflates content and presentation
•  Um, yuck 
Semantic Markup in a Nutshell
Semantic markup describes the function
of your content, not its formatting
SEMANTIC MARKUP SAYS:
“This is a section heading”
NOT:
“This text is in Garamond, 36 pt, bold,
center-aligned”
Semantic Markup Option #1:
DocBook
•  DocBook is a semantic XML markup
vocabulary introduced in 1991
•  It was primarily designed for
representing technical
documentation, but is well-suited for
representing any prose content
•  DocBook DTDs are available here:
http://www.oasis-open.org/docbook/xml/
DocBook Representation of
Book Content
<?xml version="1.0" encoding="utf-8"?>!
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML
V4.5//EN" "http://www.oasis-open.org/docbook/xml/
4.5/docbookx.dtd">!
<chapter>!
<title>Autobiography of Me</title>!
<para>I was born in 1980, I love
chocolate ice cream, and I am a
<emphasis>wicked awesome</emphasis>
writer, yo!</para>!
</chapter>!
Text Editors with GUI
DocBook Support
XMLmind XML Editor
(http://www.xmlmind.com/xmleditor/)
Oxygen XML Editor
(http://www.oxygenxml.com/)
Semantic Markup Option #2:
AsciiDoc
•  AsciiDoc is a lightweight, wiki-like
markup language for prose content
•  It was created by Stuart Rackham in
2002.
•  The AsciiDoc toolchain is written in
Python, and relies heavily on text
processing with regular expressions.
AsciiDoc Representation of
Book Content
== Autobiography of Me!
!
I was born in 1980, I love
chocolate ice cream, and I
am a _wicked awesome_
writer, yo!!
Text Editor with
AsciiDoc Support
O’Reilly Atlas
Semantic Markup Option #3:
HTML
“Say what? HTML?”
Ebooks are composed of
HTML…
So, why not write them in
HTML?
HTML5 = New Structural
Semantics
•  <article>
•  <aside>
•  <header>
•  <figure>
•  <footer>
•  <nav>
•  <section>
But eBooks require a richer
content model!!!
•  More robust semantics for book-
specific elements—e.g, chapter,
appendix, glossary
•  Explicit, enforceable rules for
structure—e.g, no <h1>s lower in the
hierarchy than <h2>s
Introducing the HTMLBook Project:
http://github.com/oreillymedia/HTMLBook
“That’s nice, but
what’s in it for me if
I develop my (e)book
in DocBook or
AsciiDoc or HTML?”
#2 Single Source,
Multiple Outputs
Welcome to Conversion City
Enjoy Your Stay!
Conversion! Conversion!
Conversion!
The Single-Source Model
XML or HTML
Advantages of the Single-Source Model:
•  All authoring/edits are made to just one
set of files. No need to maintain multiple
sets of files.
•  Outputs are produced by transforms, not
conversions.
•  Transforms are automated, fast,
infinitely repeatable, and do not require
cleanup afterward.
•  The model is extensible. Add new output
formats by adding a new transform.
Workflow doesn’t need to be reinvented.
ASC/DB Single-Source Workflow:
AsciiDoc
DocBook XML
asciidoc.py
DocBook XSL
EPUB Stylesheets
+ Custom CSS
EPUB
DocBook XSL
HTML5
Stylesheets
HTML5
Print PDF Web PDF
AntennaHouse +
Print CSS3
AntennaHouse +
Web CSS3
EPUB
DocBook XSL
EPUB Stylesheets
Custom XSL for
EPUB postprocessing
+ KF8/Mobi7 CSS
Mobi-ready EPUB
Kindlegen
Mobi (KF8)Source Content
Intermediate Output
Final Output For Sale
(optional; can start with DocBook)
HTML5 Single-Source Workflow:
HTML5
EPUB Print PDF Web PDF
AntennaHouse
+ Print CSS3
AntennaHouse
+ Web CSS3
EPUB
Custom XSL for
EPUB postprocessing
+ KF8/Mobi7 CSS
Mobi-ready EPUB
Kindlegen
Mobi (KF8)
Source Content
Intermediate Output
Final Output For Sale
Packaging XSL
+ CSS
Packaging XSL
+ CSS
O’Reilly Atlas Ebook Build UI
#1. Pick ebook
formats to build
#2. Pick content
files to build
#3. Click “Build”
#3 Automate Your
Headaches Away
1776:
http://commons.wikimedia.org/wiki/File:Quill_(PSF).svg!
2012:
Manuscript edits
cannot be automated
Manuscript edits
can be automated
http://www.flickr.com/photos/asurroca/3699873444/!
Some rights reserved by ASurroca!
Tools for Scripting
Word Documents
•  Macros
•  Visual Basic for Applications (VBA)
•  PowerShell
Tools for Scripting
Plaintext (AsciiDoc/XML) Documents
•  Ruby
•  Python
•  Perl
•  Java
•  XPath/XSLT/XQuery
•  JavaScript
•  Regex
•  Emacs/vi
•  sed
•  And many more…
Fix My Manuscript with One Line of Code!
Request #1:
“In the important scientific article below, please change all
superscripts to subscripts, except in informal equation
elements”
<chapter id="chap1">!
!
<title>Makin’ Water and Energy</title>!
!
<para>Makin’ water is really easy. The formula is !
H<superscript>2</superscript>O, so you just take
some H<superscript>2</superscript>, and add some
O.</para>!
!
<para>Also, here’s how you make energy (per
Einstein):</para>!
!
<informalequation>!
<mathphrase>!
E = mc<superscript>2</superscript>!
</mathphrase>!
</informalequation>!
</chapter>!
DocBook XML Manuscript: PDF Output:
Fix My Manuscript with One Line of Code!
Solution #1: XPath to the rescue!
<chapter id="chap1">!
!
<title>Makin’ Water and Energy</title>!
!
<para>Makin’ water is really easy. The formula is !
H<subscript>2</subscript>O, so you just take some
H<subscript>2</subscript>, and add some O.</para>!
!
<para>Also, here’s how you make energy (per
Einstein):</para>!
!
<informalequation>!
<mathphrase>!
E = mc<superscript>2</superscript>!
</mathphrase>!
</informalequation>!
</chapter>!
Revised DocBook Manuscript: PDF Output:
$ xmlstarlet ed -r "//superscript[not(ancestor::informalequation)]" -v "subscript" book.xml!
!
XML
command
Make
an
edit
r =
rename
Select
superscripts…
…that
are
not…. …inside…
…informal
equations.
v =
replacement
value
Replace
with
subscripts.
Do all this
on
book.xml
Fix My Manuscript with One Line of Code!
Request #2:
“House style for dates is YYYY-MM-DD Can you please fix in
manuscript below?”
AsciiDoc Manuscript: PDF Output:
== Kindergarten Lemonade Sales!
!
.Lemonade sales by Kindergarten
Lemonade, LLC!
[options="header"]!
|================!
|Date|Lemonade Sold|!
|3/15/12|6 glasses|!
|4/22/10|10 glasses|!
|5/31/12|2 glasses|!
|7/14/11|4 glasses|!
|8/19/12|1 glass|!
|9/24/12|432 glasses|!
|================!
Fix My Manuscript with One Line of Code!
Solution #2: Regex FTW!
AsciiDoc Manuscript: PDF Output:
== Kindergarten Lemonade Sales!
!
.Lemonade sales by Kindergarten
Lemonade, LLC!
[options="header"]!
|================!
|Date|Lemonade Sold|!
|2012-03-15|6 glasses|!
|2010-04-22|10 glasses|!
|2012-05-31|2 glasses|!
|2011-07-14|4 glasses|!
|2012-08-19|1 glass|!
|2012-09-24|432 glasses|!
|================!
$ perl -p -e 's#^(.*)([1-9])/([0-9]{2})/([0-9]{4})(.*)$#$1$4-0$2-$3$5#g' book.asc!
Perl script!
Print each
line…
Run the
following
regex
Capture the following pattern:
Chars
before
date
Digits
in
month
Digits in
day
Digits in
year
Chars
after
date
Specify replacement pattern:
Chars
before
date
Year Month Day
Chars
after
date
Perform on
this file
#4 Versioning is the
New Spell-Check
Two Questions About Your (e)Book’s
Editorial Lifecycle
1. Will more than one person be
working on the manuscript files?
2. Will there be more than one draft of
the manuscript?
If you answered yes
to either question,
you need a version-
control system.
Key Feature #1 of Version Control:
Revision Snapshots
Key Feature #2 of Version Control:
Diffing
What if we
versioned
manuscripts like
software developers
version code?
Revision snapshots in GitHub
Pro Git: https://github.com/progit/progit
Diffing in GitHub
(English to Portuguese translation)
#5 Always Think
“Digital First”
There is a difference
between a digitized
text and a digital
text
Digitized Text = Digital Last
“Let’s make a print book and
then get it converted to an
ebook.”
Digital Text = Digital First
“Let’s make an ebook.”
What Does Digital First
Look Like?
Welcome to Atlas [Beta]
http://atlas.oreilly.com/
Interactive examples!
Welcome to Atlas [Beta]
http://atlas.oreilly.com/
Inline Commenting!
Welcome to Atlas [Beta]
http://atlas.oreilly.com/
Integrated Multimedia!
Contact Me!
Email: sanders@oreilly.com
Twitter: @sandersk

More Related Content

More from Sanders Kleinfeld

Is HTML5 the "Magic Bullet"?
Is HTML5 the "Magic Bullet"?Is HTML5 the "Magic Bullet"?
Is HTML5 the "Magic Bullet"?Sanders Kleinfeld
 
HTML5 Is the Future of Book Authorship
HTML5 Is the Future of Book AuthorshipHTML5 Is the Future of Book Authorship
HTML5 Is the Future of Book AuthorshipSanders Kleinfeld
 
Automated Equation Processing and Rendering Workflows for Publishers
Automated Equation Processing and Rendering Workflows for PublishersAutomated Equation Processing and Rendering Workflows for Publishers
Automated Equation Processing and Rendering Workflows for PublishersSanders Kleinfeld
 
The Case for Authoring and Producing Books in (X)HTML5
The Case for Authoring and Producing Books in (X)HTML5The Case for Authoring and Producing Books in (X)HTML5
The Case for Authoring and Producing Books in (X)HTML5Sanders Kleinfeld
 

More from Sanders Kleinfeld (7)

Is HTML5 the "Magic Bullet"?
Is HTML5 the "Magic Bullet"?Is HTML5 the "Magic Bullet"?
Is HTML5 the "Magic Bullet"?
 
XSLT for Web Developers
XSLT for Web DevelopersXSLT for Web Developers
XSLT for Web Developers
 
The Ebook Avant-Garde
The Ebook Avant-GardeThe Ebook Avant-Garde
The Ebook Avant-Garde
 
Open Source for Publishing
Open Source for PublishingOpen Source for Publishing
Open Source for Publishing
 
HTML5 Is the Future of Book Authorship
HTML5 Is the Future of Book AuthorshipHTML5 Is the Future of Book Authorship
HTML5 Is the Future of Book Authorship
 
Automated Equation Processing and Rendering Workflows for Publishers
Automated Equation Processing and Rendering Workflows for PublishersAutomated Equation Processing and Rendering Workflows for Publishers
Automated Equation Processing and Rendering Workflows for Publishers
 
The Case for Authoring and Producing Books in (X)HTML5
The Case for Authoring and Producing Books in (X)HTML5The Case for Authoring and Producing Books in (X)HTML5
The Case for Authoring and Producing Books in (X)HTML5
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Engineering Next-Generation Publishing Workflows

  • 1. Engineering Next- Generation Publishing Workflows IDPF Digital Book 2013 May 30, 2013 Sanders Kleinfeld O’Reilly Media, Inc.
  • 3. How do you write a “book”?
  • 4. How do you write an (e)book?
  • 6. Anatomy of an ebook: EPUB What you see <?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http:// www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Chapter 1. A Python Q&amp;A Session</title> <link rel="stylesheet" href="core.css" type="text/css" /> <meta name="generator" content="DocBook XSL Stylesheets V1.74.0" /> </head> <body> <div class="chapter" title="Chapter 1. A Python Q&amp;A Session"> <div class="titlepage”> <div> <div> <h1 class="title"> <a id="a_python_q_ampersand_a_session”></a> Chapter 1. A Python Q&amp;A Session </h1> </div> </div> </div> <p>If you’ve bought this book, you may already know what Python is and why it’s an important tool to learn. If you don’t, you probably won’t be sold on Python until you’ve learned the language by reading the rest of this book and have done a project or two. But before we jump into details, the first few pages of this book will briefly introduce some of the main reasons behind Python’s popularity. To begin sculpting a definition of Python, this chapter takes the form of a question-and-answer session, which poses some of the most common questions asked by beginners.</p> What’s inside
  • 7. Ebooks are made of code. If you are an ebook publisher, you are in the software-development business. An Inconvenient Truth:
  • 9. How do you develop an (e)book?
  • 10. Five Key Principles of a Modern (e)Book Workflow #1. Semantic Markup Matters #2. Single Source, Multiple Outputs #3. Automate Your Headaches Away #4. Versioning is the New Spell-Check #5. Always think “Digital First”
  • 12. First Chapter of My Memoirs Microsoft Word
  • 13. Underlying Representation of Content (Word XML) <w:body><w:p w:rsidR="0073527D" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:jc w:val="right"/ ><w:rPr><w:sz w:val="96"/><w:szCs w:val="96"/></w:rPr></w:pPr><w:r w:rsidRPr="007F1550”><w:rPr><w:sz w:val="96"/><w:szCs w:val="96"/></w:rPr>! ! <w:t>1</w:t>! ! </w:r></w:p><w:p w:rsidR="007F1550" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:jc w:val="right"/>! <w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr></w:pPr><w:r w:rsidRPr="007F1550”><w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr>! ! <w:t>Autobiography of Me</w:t>! ! </w:r></w:p><w:p w:rsidR="007F1550" w:rsidRPr="007F1550" w:rsidRDefault="007F1550" w:rsidP="007F1550">! <w:pPr><w:jc w:val="right"/><w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr></w:pPr></w:p>! <w:p w:rsidR="007F1550" w:rsidRPr="00032659" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:rPr>! <w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr></w:pPr><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/>! <w:szCs w:val="48"/></w:rPr>! ! <w:t xml:space="preserve">I was born in 1980, I love chocolate ice cream, and I am a </w:t>! ! </w:r><w:r w:rsidRPr="00032659”><w:rPr><w:i/><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>! ! <w:t>wicked awesome</w:t>! ! </w:r><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>! ! <w:t xml:space="preserve"> writer, </w:t></w:r>! ! <w:proofErr w:type="spellStart"/><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/><w:szCs w:val="48"/></ w:rPr>! ! <w:t>yo</w:t>! ! </w:r><w:proofErr w:type="spellEnd"/>! …!
  • 14. Three Problems with this XML •  Markup is not semantic! •  It conflates content and presentation •  Um, yuck 
  • 15. Semantic Markup in a Nutshell Semantic markup describes the function of your content, not its formatting SEMANTIC MARKUP SAYS: “This is a section heading” NOT: “This text is in Garamond, 36 pt, bold, center-aligned”
  • 16. Semantic Markup Option #1: DocBook •  DocBook is a semantic XML markup vocabulary introduced in 1991 •  It was primarily designed for representing technical documentation, but is well-suited for representing any prose content •  DocBook DTDs are available here: http://www.oasis-open.org/docbook/xml/
  • 17. DocBook Representation of Book Content <?xml version="1.0" encoding="utf-8"?>! <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/ 4.5/docbookx.dtd">! <chapter>! <title>Autobiography of Me</title>! <para>I was born in 1980, I love chocolate ice cream, and I am a <emphasis>wicked awesome</emphasis> writer, yo!</para>! </chapter>!
  • 18. Text Editors with GUI DocBook Support XMLmind XML Editor (http://www.xmlmind.com/xmleditor/) Oxygen XML Editor (http://www.oxygenxml.com/)
  • 19. Semantic Markup Option #2: AsciiDoc •  AsciiDoc is a lightweight, wiki-like markup language for prose content •  It was created by Stuart Rackham in 2002. •  The AsciiDoc toolchain is written in Python, and relies heavily on text processing with regular expressions.
  • 20. AsciiDoc Representation of Book Content == Autobiography of Me! ! I was born in 1980, I love chocolate ice cream, and I am a _wicked awesome_ writer, yo!!
  • 21. Text Editor with AsciiDoc Support O’Reilly Atlas
  • 24. Ebooks are composed of HTML… So, why not write them in HTML?
  • 25. HTML5 = New Structural Semantics •  <article> •  <aside> •  <header> •  <figure> •  <footer> •  <nav> •  <section>
  • 26. But eBooks require a richer content model!!! •  More robust semantics for book- specific elements—e.g, chapter, appendix, glossary •  Explicit, enforceable rules for structure—e.g, no <h1>s lower in the hierarchy than <h2>s
  • 27. Introducing the HTMLBook Project: http://github.com/oreillymedia/HTMLBook
  • 28. “That’s nice, but what’s in it for me if I develop my (e)book in DocBook or AsciiDoc or HTML?”
  • 30. Welcome to Conversion City Enjoy Your Stay! Conversion! Conversion! Conversion!
  • 32. Advantages of the Single-Source Model: •  All authoring/edits are made to just one set of files. No need to maintain multiple sets of files. •  Outputs are produced by transforms, not conversions. •  Transforms are automated, fast, infinitely repeatable, and do not require cleanup afterward. •  The model is extensible. Add new output formats by adding a new transform. Workflow doesn’t need to be reinvented.
  • 33. ASC/DB Single-Source Workflow: AsciiDoc DocBook XML asciidoc.py DocBook XSL EPUB Stylesheets + Custom CSS EPUB DocBook XSL HTML5 Stylesheets HTML5 Print PDF Web PDF AntennaHouse + Print CSS3 AntennaHouse + Web CSS3 EPUB DocBook XSL EPUB Stylesheets Custom XSL for EPUB postprocessing + KF8/Mobi7 CSS Mobi-ready EPUB Kindlegen Mobi (KF8)Source Content Intermediate Output Final Output For Sale (optional; can start with DocBook)
  • 34. HTML5 Single-Source Workflow: HTML5 EPUB Print PDF Web PDF AntennaHouse + Print CSS3 AntennaHouse + Web CSS3 EPUB Custom XSL for EPUB postprocessing + KF8/Mobi7 CSS Mobi-ready EPUB Kindlegen Mobi (KF8) Source Content Intermediate Output Final Output For Sale Packaging XSL + CSS Packaging XSL + CSS
  • 35. O’Reilly Atlas Ebook Build UI #1. Pick ebook formats to build #2. Pick content files to build #3. Click “Build”
  • 37. 1776: http://commons.wikimedia.org/wiki/File:Quill_(PSF).svg! 2012: Manuscript edits cannot be automated Manuscript edits can be automated http://www.flickr.com/photos/asurroca/3699873444/! Some rights reserved by ASurroca!
  • 38. Tools for Scripting Word Documents •  Macros •  Visual Basic for Applications (VBA) •  PowerShell
  • 39. Tools for Scripting Plaintext (AsciiDoc/XML) Documents •  Ruby •  Python •  Perl •  Java •  XPath/XSLT/XQuery •  JavaScript •  Regex •  Emacs/vi •  sed •  And many more…
  • 40. Fix My Manuscript with One Line of Code! Request #1: “In the important scientific article below, please change all superscripts to subscripts, except in informal equation elements” <chapter id="chap1">! ! <title>Makin’ Water and Energy</title>! ! <para>Makin’ water is really easy. The formula is ! H<superscript>2</superscript>O, so you just take some H<superscript>2</superscript>, and add some O.</para>! ! <para>Also, here’s how you make energy (per Einstein):</para>! ! <informalequation>! <mathphrase>! E = mc<superscript>2</superscript>! </mathphrase>! </informalequation>! </chapter>! DocBook XML Manuscript: PDF Output:
  • 41. Fix My Manuscript with One Line of Code! Solution #1: XPath to the rescue! <chapter id="chap1">! ! <title>Makin’ Water and Energy</title>! ! <para>Makin’ water is really easy. The formula is ! H<subscript>2</subscript>O, so you just take some H<subscript>2</subscript>, and add some O.</para>! ! <para>Also, here’s how you make energy (per Einstein):</para>! ! <informalequation>! <mathphrase>! E = mc<superscript>2</superscript>! </mathphrase>! </informalequation>! </chapter>! Revised DocBook Manuscript: PDF Output: $ xmlstarlet ed -r "//superscript[not(ancestor::informalequation)]" -v "subscript" book.xml! ! XML command Make an edit r = rename Select superscripts… …that are not…. …inside… …informal equations. v = replacement value Replace with subscripts. Do all this on book.xml
  • 42. Fix My Manuscript with One Line of Code! Request #2: “House style for dates is YYYY-MM-DD Can you please fix in manuscript below?” AsciiDoc Manuscript: PDF Output: == Kindergarten Lemonade Sales! ! .Lemonade sales by Kindergarten Lemonade, LLC! [options="header"]! |================! |Date|Lemonade Sold|! |3/15/12|6 glasses|! |4/22/10|10 glasses|! |5/31/12|2 glasses|! |7/14/11|4 glasses|! |8/19/12|1 glass|! |9/24/12|432 glasses|! |================!
  • 43. Fix My Manuscript with One Line of Code! Solution #2: Regex FTW! AsciiDoc Manuscript: PDF Output: == Kindergarten Lemonade Sales! ! .Lemonade sales by Kindergarten Lemonade, LLC! [options="header"]! |================! |Date|Lemonade Sold|! |2012-03-15|6 glasses|! |2010-04-22|10 glasses|! |2012-05-31|2 glasses|! |2011-07-14|4 glasses|! |2012-08-19|1 glass|! |2012-09-24|432 glasses|! |================! $ perl -p -e 's#^(.*)([1-9])/([0-9]{2})/([0-9]{4})(.*)$#$1$4-0$2-$3$5#g' book.asc! Perl script! Print each line… Run the following regex Capture the following pattern: Chars before date Digits in month Digits in day Digits in year Chars after date Specify replacement pattern: Chars before date Year Month Day Chars after date Perform on this file
  • 44. #4 Versioning is the New Spell-Check
  • 45. Two Questions About Your (e)Book’s Editorial Lifecycle 1. Will more than one person be working on the manuscript files? 2. Will there be more than one draft of the manuscript?
  • 46. If you answered yes to either question, you need a version- control system.
  • 47. Key Feature #1 of Version Control: Revision Snapshots
  • 48. Key Feature #2 of Version Control: Diffing
  • 49. What if we versioned manuscripts like software developers version code?
  • 50. Revision snapshots in GitHub Pro Git: https://github.com/progit/progit
  • 51. Diffing in GitHub (English to Portuguese translation)
  • 53. There is a difference between a digitized text and a digital text
  • 54. Digitized Text = Digital Last “Let’s make a print book and then get it converted to an ebook.” Digital Text = Digital First “Let’s make an ebook.”
  • 55. What Does Digital First Look Like?
  • 56. Welcome to Atlas [Beta] http://atlas.oreilly.com/ Interactive examples!
  • 57. Welcome to Atlas [Beta] http://atlas.oreilly.com/ Inline Commenting!
  • 58. Welcome to Atlas [Beta] http://atlas.oreilly.com/ Integrated Multimedia!

Editor's Notes

  1. Note that there is no markup for the #1 in chapter heading