2. Overview
• What is pdf.js
• How PDF is structured
• Processing in pdf.js
• Images & Fonts
• Problems
• Todo
• Demo
3. What is pdf.js
• building faithful & efficient PDF renderer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
• Mozilla Labs Project - Open Source
4. How PDF is structured
Header PDF version
sequence of objets
Body
[Objects] fonts, drawing cmds, images,
words, bookmarks, form fields
xRef Table mapping objID byte offset
Trailer root objID, xRef byte offset
PDF file root obj = ref to pages catalog
5. Processing in pdf.js
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N) Internal
• page.startRendering(graphics) Representation
• read & convert all PDF cmds ➟ IR PartialEvaluator
• load required objects (fonts, images)
• graphics.executeIR(IR) CanvasGraphics
6. 1. page=PDFDoc.getPage(2) 5 0 obj
stream maybe
➟ obj#3 << encoded!
/Length 8 0 R
2. page.startRendering(...) >>
➟ obj#4, obj#5 stream
/GS1 gs
/F0 12 Tf
3 0 obj BT
<< 100 700 Td
/Type /Page (Hello World!) Tj
/MediaBox
0 612 792]
[0 ET
/Resources
4 0 R 50 600 m
/Parent
2 0 R 400 600 l
/Contents
5 0 R S
>> endstream
endobj endobj
7. xRef, catalog, IR
5 0 obj + resources PartialEvaluator Form
<<
/Length 8 0 R
>> setGState:
[ LW: 10 ]
stream dependency:
[ font0 ]
/GS1 gs setFont:
font0, 12
/F0 12 Tf beginText
BT moveText:
100, 700
100 700 Td showText:
“Hello World!”
(Hello World!) Tj endText
ET moveTo:
50, 600
50 600 m lineTo:
400, 600
400 600 l stroke
S
endstream
endobj CanvasGraphics
9. Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS:
@font-face { font-family:'font0'; src:url
(data:font/opentype;base64, ...)
• some fonts can’t be converted :(
• use drawing commands?
10. Problems
platform =
browser + OS
• No way to detect font is loaded (hacks)
• Font width (wrong on some platforms)
• Subpixel font size depending on platform
• Text selection
• Printing
• Speed
• use workers (postMessage lose shape)
• partial rendering
11. Todo
• more font work, printing, speed
• support more rendering spec
• explore using SVG
• PDF forms, “advanced PDF features”
• infrastructure: automated testing, requireJS
• test more PDF (need your help!)