3. An Introduction to Open XML
Housekeeping
Mobile ‘phones
Fire Exits
Toilets
3
4. An Introduction to Open XML 4
Session Overview
This session will provide an explanation and
demonstration of how we can programmatically create
and use WordML and ExcelML documents
I will be using the Open XML SDK to make life easier
No manual creation and management of .zip files / containers
Let System.IO.Packaging, etc. take care of that
Avoids a discussion about code bloat, XML bloat and
performance (which is actually very good)
It won’t be a political view of the “document wars” debate
There will be no XPS vs PDF vs Open XML vs ODF / OpenDocument
content!
5. An Introduction to Open XML
If you learn one thing from my session…
On this day…June 8th…
1978: Woman takes world sailing record
Yachtswoman Naomi James breaks the solo round-the-world
sailing record by two days.
7. An Introduction to Open XML
Disclaimer
This session includes some content from Microsoft slide
decks
Not going to be an in-depth look at the Open XML API
Code and demonstrations to get you started
Simplified version of the methods I use to generate custom
reports in a non-production version of production application!
I’m a developer, not a designer!
No flashy graphics or fancy documents
Let’s ignore the i4i injunction a Judge in Texas imposed on
Microsoft Word!
7
8. An Introduction to Open XML
About Me
60+ presentations delivered:
IMTC 2008, epicenter 2009
NRW06, NRW07
DeveloperDeveloperDeveloper (UK / Ireland Community Events)
Scottish Developers
Agile Scotland
British Computer Society (BCS)
UK Borland User Group (DDG)
Visual Basic User Group (VBUG)
VBUG .net Winter 2001 conference
XML One 2001
60+ articles/book reviews published:
The Delphi Magazine
developers’ magazine (Dotnet Developers’ Group - DDG)
ASPToday.com (now Wiley, previously Wrox)
ASP.NET Pro, International Developer
CSharpCorner, DeveloperFusion
8
Open XML
XML
XSLT
XQuery
XML Schema
SOAP
WML
IntraWeb
Web Services
C# InterOp with Delphi
RUP
UML
TDD in C#, VB.net and Delphi 8
Scrum
9. An Introduction to Open XML 9
Agenda
Motivation
The Tools
What: Open XML SDK 2, API Design
How: Demos, Code Generation, Injection, Content Controls
Why: Summary
Resources
10. An Introduction to Open XML 10
Motivation
There are times when we are too focused on application
development
New/useful tools techniques are passed by
60-90 minute sessions like these, personally, help me save time by:
Identifying new/useful tools and techniques
Demonstrating new/useful tools and techniques
Your takeaway: is Open XML something you should be investigating
further, or not as the case may be
I have been using Excel automation (COM type libraries) for
report creation…since 1999
Gone through the “macro dilemma” – to use macros or not?
For Win32 Borland Delphi applications
For Win32 .net C# applications
11. An Introduction to Open XML 11
The Tools
Visual Studio 2010 Professional
Open XML SDK 2 RTM (March 2010)
Sits inside the .NET 3.5 SP1 space (more about this later on);
SDK makes use of LINQ
Office 2010 Standard
Only required for viewing documents
Unlike COM-based automation, an Office client is not required
A boon if you are preparing reports server-side
Previously used
Visual Studio 2008 Professional
Office 2007
Open XML SDK CTPs
12. An Introduction to Open XML 12
Agenda
Motivation
The Tools
What: Open XML SDK 2, API Design
How: Demos - Manual, Code Generation, Injection
Why: Summary
Resources
13. An Introduction to Open XML
Open XML SDK 2
Productivity Tool
DocumentReflector for code generation
OpenXMLClassExplorer explore the Open XML markup and the
ECMA 376 specification
OpenXMLDiff graphically compare Open XML files
OpenXMLValidator to validate entire documents or “document
parts” against Office 2007 or Office 2010 file formats
13
14. An Introduction to Open XML
What is Open XML?
…an open standard for word-processing documents,
presentations, and spreadsheets that can be freely
implemented by multiple applications on different
platforms
…faithful representation of existing word-processing
documents, presentations, and spreadsheets that are
encoded in binary formats defined by Microsoft® Office
applications, i.e. tightly coupled
…purpose of the Open XML standard is to de-couple
documents created by Microsoft Office applications so
that they can be manipulated by other applications
independent of proprietary formats and without the loss
of data
https://connect.microsoft.com/content/content.aspx?ContentID=9521&SiteID=589&wa=wsignin1.0
14
15. An Introduction to Open XML
Before…Open XML SDK V2
Namespaces, element names and attributes were irksome
to remember and to get right
Generally, constants were used to make managing
namespaces, etc. that bit easier
Lack of strong typing
Code would compile
May produce incorrect results at run-time
15
<w:document xmlns:w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'>
<w:body><w:p><w:r><w:t>some text</w:t></w:r></w:p>
</w:body>
</w:document>
16. An Introduction to Open XML
Now…Open XML SDK V2
Strongly Typed Object Model
Node identification using strings is a thing of the past
Loosely typed System.Xml.Linq.XElement usage can be replaced
e.g. DocumentFormat.OpenXml.WordProcessing.Paragraph
Spelling mistakes are caught by compile-time type checking
Obviously strong typing is preferable
16
AFTER
var paragraphs = doc.MainDocumentPart
.Document.Body.Elements<Paragraph>()
.Select
BEFORE
var paragraphs =
doc.MainDocumentPart
.GetXDocument()
.Element(w + "document")
.Element(w + "body")
.Elements(w + "p")
.Select
18. An Introduction to Open XML
API Design
System Support
18
.Net Framework 3.5 – The Open XML SDK leverages the advanced technology provided
by .Net Framework 3.5, especially LINQ To XML, which makes manipulating XML much
easier and more intuitive
System.IO.Packaging – The Open XML SDK needs to be able to add/remove parts
contained within the Open XML Format packages. Included as part of .Net Framework
3.0 were a set of generic packaging APIs capable of adding and removing parts of OPC
(Open Package Convention) conforming packages. Given that Open XML Formats are
based on OPC, the SDK uses System.IO.Packaging APIs to open, edit and save Open XML
Packages
Open XML Schemas – The Open XML SDK is based on Open XML Formats, which are
represented and described as schemas. These schemas make up the foundation of the
Open XML SDK, since the SDK enables Open XML developers to build solutions on top
of Open XML Formats
19. An Introduction to Open XML
API Design
Open XML File Format Base Level
Stream Reading/Writing
includes stream reader and writer interfaces targeting Open XML elements and attributes
similar to XmlReader/XmlWriter, easier to use as the interfaces are Open XML aware
Open XML Low Level DOM
Manipulate the Open XML tree directly by working with strongly typed objects and classes
instead of traditional XML nodes
Awareness of namespaces as well as element/attribute names is reduced
Intellisense for properties, etc.
Leverages LINQ
Open XML Packaging API
Sits above System.IO.Packaging (.NET 3.0)
allows developers to manipulate Open XML parts with strongly typed classes and objects
Shipped in Open XML SDK v1.0
19
20. An Introduction to Open XML
API Design
Validation & Helpers
Validation Layer
Open XML base layer does not guarantee creation of valid Open
XML documents!
Our reliance on XML Schema, XSD files, is reduced if not removed
The SDK takes care of it on our behalf
Helper Functions
Work directly on the XML elements and are functionally limited
by the file format standard
e.g. deletion of a WordML paragraph – a helper function may
ensure that all additional steps are taken to leave the document is
a valid state…
20
21. An Introduction to Open XML
The Importance of Validation
http://blogs.msdn.com/brian_jones/archive/2009/04/08/
announcing-the-release-of-the-open-xml-sdk-version-2-
april-2009-ctp.aspx
21
<w:body>
<w:p>
<w:r>
<w:t>hello world</w:t>
</w:r>
</w:p>
...
</w:body>
<w:body>
<w:p>
<w:t>hello world</w:t>
</w:p>
...
</w:body>
22. An Introduction to Open XML 22
Agenda
Motivation
The Tools
What: Open XML SDK 2, API Design
How: Demos, Code Generation, Injection, Content Controls
Why: Summary
Resources
23. An Introduction to Open XML
WordML
Document Structure
23
Take a .docx, an .xlsx or a
.pptx file, rename it as a
.zip file
Open using Compressed
Folders or your favourite
zip utility
Very readable, but
without the SDK, difficult
to manage, especially in
code
24. An Introduction to Open XML
Document Parts
A document part is…
analogous to a file on the file system
stored inside the package in a specific location reachable via a URI
stored with a specific content type
mainly XML but other native types as well
Images, sounds, video, OLE objects
Content type is enforced
Example: cannot tag JPEG part as GIF
[Open Excel - sample file – look for the image]
24
25. An Introduction to Open XML
ExcelML
Document Structure
25
Relationships are stored
in XML streams in the
package
Ties elements inside the
package to each other
Allows navigation of
document without parsing
parts
Package relationships
stream URI: /_rels/.rels
Part relationships stream
URI: _rels/[partname].rels
27. An Introduction to Open XML
Content Controls
New in Word 2007
Manageable via the Word Content Control Toolkit
Programmatic access to specific “fields” within a
document
“Bindable”
Can be bound to XML nodes
Makes use of the customXML folder
27
28. An Introduction to Open XML
Enabling the Developer ribbon – Word 2007
28
29. An Introduction to Open XML
Enabling the Developer ribbon – Word 2010
29
30. An Introduction to Open XML
Why Use Content Controls?
In situations where small amounts information is collected
from many users:
How often have you seen a spreadsheet being e-mailed to
hundreds of users, asking them to fill in “some” cells?
Give them a Word document with Content Controls
Use a custom-written .NET application that aggregates the
information in the Content Controls into an Excel spreadsheet
30
31. An Introduction to Open XML 31
demo
Content Controls
CustomXML
in Word 2007 / Word 2010
32. An Introduction to Open XML
Deployment
All that you need to deploy are:
Your OpenXML-enabled application
DocumentFormat.OpenXml.dll
WindowsBase.dll
.NET (VPC test…)
c:Program FilesReference AssembliesMicrosoftFrameworkv3.0WindowsBase.dll
http://blogs.msdn.com/dmahugh/archive/2006/12/14/finding-windowsbase-dll.aspx
33
33. An Introduction to Open XML 34
Agenda
Motivation
The Tools
What: Open XML SDK 2, API Design
How: Demos - Manual, Code Generation, Injection
Why: Summary
Resources
34. An Introduction to Open XML
Summary
Open XML is little more than a moderately complex XML
document
XML is readily accessible
in the .NET framework
in VB6
in Java
in Python, etc.
An Office installation is not required
Office client not required on the server
Enables Office document creation from non-Microsoft platforms
“…it’s just zip, it’s just XML…” - Doug Mahugh
http://channel9.msdn.com/posts/AdamKinney/Open-XML-File-Formats
35
35. An Introduction to Open XML
Summary
Start from a template document
Easy replication of existing [client] documents
Use the DocumentRefector to generate Open XML code
Refactor your report data into the generated code
Learn from the reflected / generated code
Open XML code is cleaner, more readable and more
maintainable than its COM counterpart
Open XML documents can be consumed using
applications and platforms from vendors other than
Microsoft
36
36. An Introduction to Open XML 37
Resources (web-sites & blogs)
Open XML Format SDK 2.0
http://url.ie/tik
Microsoft’s Open XML portal
http://www.openxmldeveloper.org/
If you are interested in Open XML / ODF conversion
http://sourceforge.net/projects/odf-converter
http://www.twitter.com/openxml
Microsoft folks:
Brian Jones http://blogs.msdn.com/brian_jones/
Doug Mahugh http://blogs.msdn.com/dmahugh/
Kevin Boske http://blogs.msdn.com/kevinboske/
Erika Ehrli http://blogs.msdn.com/erikaehrli/
Eric White http://blogs.msdn.com/ericwhite/
37. An Introduction to Open XML
Resources (web-sites & blogs)
Word 2007 Content Control Toolkit on CodePlex
http://www.codeplex.com/dbe
Matthew Scott’s Content Controls and CustomXML
Channel 9 video
http://url.ie/u05
Wouter van Vugt
http://blogs.code-counsel.net/Wouter/default.aspx
A collection of Open XML resources:
http://www.craigmurphy.com/blog/?p=871
Including these slides and C# source code
38
38. An Introduction to Open XML 39
Resources (Books)
Open XML Explained
Wouter van Vugt
http://openxmldeveloper.org/articles/1970.aspx
39. An Introduction to Open XML
Contact Information
Craig Murphy
http://www.twitter.com/CAMURPHY
Updated slides, notes and source code:
http://www.CraigMurphy.com
http://www.CraigMurphy.com/blog
Using the Productivity Tool, you can:Generate Open XML SDK source code based on document content. The source code could be used to regenerate all or part of the document.Compare source and target Open XML documents to highlight the differences. You can reveal the differences in the document part structure as well as the content differences. Based on those differences, you can generate source code that employs the Open XML SDK 2.0 to create the target document from the source.Validate documents. You can validate an entire document, specific document parts or a segment of content against Office 2007 or Office 2010 file formats.Display documentation for the Open XML SDK 2.0, the ISO/IEC 29500 Open XML File Formats standard, and the Microsoft Office implementer notes.
This content set provides documentation and guidance for the strongly-typed classes in the Open XML SDK 2.0 for Microsoft Office.Welcome to the Open XML SDK 2.0 for Microsoft Office. The SDK is built on the System.IO.Packaging API and provides strongly-typed classes to manipulate documents that adhere to the Office Open XML File Formats Specification. The Office Open XML File Formats specification is an open, international, ECMA-376, Second Edition and ISO/IEC 29500 standard. The Open XML file formats are useful for developers because they are an open standard and are based on well-known technologies: ZIP and XML. The Open XML SDK 2.0 simplifies the task of manipulating Open XML packages and the underlying Open XML schema elements within a package. The Open XML SDK 2.0 encapsulates many common tasks that developers perform on Open XML packages, so that you can perform complex operations with just a few lines of code.
Stream Reading/Writing – This component includes stream reader and writer interfaces specifically targeting Open XML elements and attributes. The readers and writers behave similar to XmlReader/XmlWriter, but are easier to use since the interfaces are Open XML aware.Open XML Low Level DOM – This component represents the xml wrapper of the Open XML schemas. Developers are able to use this component to manipulate the Open XML tree directly by working with strongly typed objects and classes instead of traditional XML nodes that require developers to be aware of namespaces as well as element/attribute names. The major advantage of having strongly typed classes and objects is that developers can easily see what properties are defined on a given class through intellisense. For example, a developer will know exactly what properties and children can exist off of a Paragraph object. In addition working with objects abstracts the requirement of remembering namespaces and element/attribute/value names since these concepts are implicitly defined by classes. This component is leverages many of the designs of LINQ in order to further improve the ease of use of this SDK.Open XML Packaging API – This component is built on top of the .Net Framework 3.0 System.IO.Packaging component. Instead of providing generic access to the parts contained in the Open XML Package, this component allows developers to manipulate Open XML parts with strongly typed classes and objects. This component has already shipped as the Open XML SDK v1.0.
The Validation layer provides validation support when developing Open XML documents. Manipulating Open XML Formats by using the Open XML Base layer makes it much easier for developers to work on the Open XML tree, but doing so does not guarantee the production of valid Open XML documents. This validation layer assists developers by allowing developers to validate created Open XML documents against the Open XML schemas and additional syntax constraints as defined in the standard. Instead of relying on XSD files, prose within the standard and observed application behaviors, developers are able to leverage the SDK to cover much of this manual work. Manipulating the Open XML files requires that developers be familiar with the standard so that they won’t corrupt the files by breaking certain constrains in the standard. This difficulty becomes apparent when working with the Open XML base layer and is evidenced in the results of the validation layer. For example, deleting a paragraph in a WordprocessingML document is not simply just deleting the paragraph node. There are a variety of extra steps required to delete a paragraph and maintain the integrity of a valid Open XML document.The SDK provides higher level helper functions or code snippets that can deal with common complex file format operations. These helper functions or snippets make the appropriate xml and part/relationship modifications when performing complex tasks. These helper functions or snippets don’t abstract away from the actual xml itself, but rather perform operations on the xml elements by taking advantage of the validation awareness. For example, deleting a paragraph element in a WordprocessingML document may result in corruption. A potential helper function would perform this delete operation and do the necessary extra steps to clean the resulting xml to ensure validity. These delete helper functions or snippets can be applied to other elements that are hard to delete, like tables and comments. In other words, these higher level functions or snippets perform directly on the xml elements and are constrained, in terms of functionality, by the file format standard itself.