This document is a presentation about Apache POI given by Paolo Mottadelli at ApacheCon US 2009 in Oakland. It discusses POI modules for reading and writing Office file formats like Excel, Word and PowerPoint. It also covers topics like common HSSF and XSSF access in POI, OpenXML package concepts, text extraction, simple tasks in Excel and Word, formula evaluation, and use cases for content management systems, financial forecasting and document import.
Take control of your SAP testing with UiPath Test Suite
Apache POI Recipes for Excel, Word and PowerPoint Documents
1. Apache POI
Recipes
Paolo Mottadelli - ApacheCon Oakland 2009
http://chromasia.com
Thursday, November 5, 2009
2. paolo@apache.org
my to-do list
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
3. paolo@apache.org
POI @ Content Tech
✴ Document to application (and back)
✴ Publish data
✴ Build a doc from your content
✴ Know your documents
✴ Extract text
✴ Extract content
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
5. paolo@apache.org
POI modules (1): OLE2
✴ POIFS: reading/writing Office
Documents
✴ HSSF r/w Excel Spreadsheets
✴ HWPF r/w Word Docs
✴ HSLF r/w PowerPoint Docs
✴ HPSF r/w property sets
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
6. paolo@apache.org
POI modules (2): OOXML
✴ XSSF: r/w OXML Excel
✴ XWPF: r/w OXML Word
✴ XSLF: r/w OXML PowerPoint
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
7. POI 3.5
http://chromasia.com
Thursday, November 5, 2009
8. paolo@apache.org
OOXML dev status
✴ XSSF: Final in POI-3.5
✴ XWPF: Draft (basic features)
✴ XSLF: Not covered (only text ext.)
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
9. paolo@apache.org
HSSF & XSSF
✴ Common user model interface
✴ User model based on existing HSSF
✴ Using OpenXML4J and SAX
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
11. paolo@apache.org
Common H/XSSF access
✴ org.apache.poi.ss.usermodel
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
12. paolo@apache.org
Upgrading to POI-3.5
✴ HSSFFormulaEvaluator.CellValue
✴ convert from .hssf. to .ss.
✴ HSSFRow.MissingCellPolicy
✴ convert from .hssf. to .ss.
✴ RecordFormatException in DDF
✴ convert from .hssf. to .util. Dreadful Drawing
Format
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
14. paolo@apache.org
made (very) simple
Open XML
✴ XML based
✴ WordprocessingML
✴ SpreadsheetML
✴ PresentationML
✴ Stored as a package
✴ Open Packaging Conventions
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
15. paolo@apache.org
Package concepts
✴ Package (the container)
✴ Part (xml file)
✴ Relationship
✴ package-relationship
✴ part-relationship
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
16. paolo@apache.org
Expanded package, Excel
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
17. paolo@apache.org
WordprocessingML
✴ body
✴ paragraphs
✴ runs
✴ properties (for runs and pars)
✴ styles
✴ headers/footers ...
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
23. paolo@apache.org
Extractors
✴ POITextExtractor
✴ POIOLE2TextExtractor
getT xt()
e
✴ POIXMLTextExtractor
✴ XSSFExcelExtractor
✴ XWPFWordExtractor
✴ XSLFPowerPointExtractor
✴ If text is all what you need
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
24. paolo@apache.org
Text extraction
✴ made simple
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
39. paolo@apache.org
Use Case
✴ Upload a document
✴ Detect document mimetype
✴ Extract text and metadata
✴ Create search index
✴ Search (and find) the document
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
40. paolo@apache.org
Without Tika
✴ Detect the document mimetype
✴ (source/target mimetype)
✴ Get the proper ContentTransformer
✴ (ContentTransformerRegistry)
✴ Tranform Doc Content to Text
✴ (PoiHssfContentTransformer) I here
PO
✴ Create Lucene index
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
41. paolo@apache.org
With Tika
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
42. paolo@apache.org
Extension use case
✴ Adding support for Office Open
XML documents (Office 2007+)
✴ Word 2007+
✴ Excel 2007+
✴ PowerPoint 2007+
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
43. paolo@apache.org
POI text extractors
✴ Remember?
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
44. paolo@apache.org
Apache Tika (Excel)
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
45. paolo@apache.org
Apache Tika
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
46. paolo@apache.org
Apache Tika (Word)
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
47. paolo@apache.org
Apache Tika (Word)
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
49. paolo@apache.org
Make your wb look pro-
✴ Rich text
✴ Graphics
✴ Formulas & Named Ranges
✴ Data validations
✴ Conditional formatting
✴ Cell comments
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
52. paolo@apache.org
Formula evaluation
✴ The evaluation engine enables you
to calculate formula results from
within a POI application
✴ Formulas may be added to your
workbook by POI
✴ Evaluation is available for .xls
and .xlsx
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
53. paolo@apache.org
Formula evaluation (continued)
✴ All arithmetic operators are
implemented
✴ Over 280 Excel built in functions
are supported
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
54. paolo@apache.org
Formula evaluation (code)
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
58. paolo@apache.org
importDocument()
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
59. paolo@apache.org
getParagraphs(...)
✴ Makes use of
✴ org.apache.poi.hwpf.usermodel.Range
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
60. paolo@apache.org
importDocument()
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
61. paolo@apache.org
getTitle(...)
✴ Gets the first paragraph’s text
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
62. paolo@apache.org
importDocument()
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
63. paolo@apache.org
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
67. paolo@apache.org
More Examples
✴ http://poi.apache.org/spreadsheet/examples.html
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
68. paolo@apache.org
Even more
✴ Get in touch
✴ http://poi.apache.org/
✴ Get informed
✴ dev@poi.apache.org
✴ Get involved
✴ http://svn.apache.org/repos/asf/poi/trunk/
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
69. paolo@apache.org
✴ Get slides
✴ http://www.slideshare.net/paolomoz/apache-poi-recipes
Thanks
- ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009