Calling all adventurers who want to extract geospatial data from PDFs, including scanned documents and CAD floorplans, and convert it into useful information and insights. Get tips for fixing bad data, and on expediting workflows with bulk processing.
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
Extracting Geospatial Data from PDFs
1. – FME Summer Camp –
Extracting Geospatial Data from PDFs
2. Who We Are
Tiana Warner
Product Marketing Manager
Dmitri Bagh
Scenario Creation Analyst
Jake Molnar
Software Developer
3.
4. ● How reading PDFs with FME works
● Extracting complex map frames
● CAD floor plans
● Scanned maps and text
● Poorly formed data
● Enterprise PDFing
Agenda
Note to steph - panel
Ask questions and
watch here for links
5.
6. first
Read source PDF files
Maps, images,
geometry, text, tables,
attributes, metadata ...
Validate and repair
Check geometry and
attributes, and fix errors
in extracted data.
Transform
Calculate, measure,
integrate, apply labels,
regular expressions ...
third
Write to output
Convert to a useful format like
GIS, CAD, Excel, database,
Word, BI software ...
last
WORKFLOW: EXTRACTING DATA
FROM PDFs
16. Demo 2b. Why rasterize?
Because it’s not always easy to makes sense out of this word:
3-M16October2000457MMature16April2001182–156.8020.67ToHawaii
Text. All words merged into a
single spaceless line
Raster. Everything
stays where it belongs
17. Demo 2b: Why rasterize?
Yes, I know the trick to how to use custom icons in the FME Data Inspector
18. Demo 3. Words seem to be together… But they are not!
20. Enterprise PDFing
Use FME Server to run PDF workflows on a schedule or in response to an event.
● Every time a new PDF arrives.
● Enable end users to upload PDFs for processing.
● Take care of bulk processing a lot of PDFs.
FME Cloud is the cloud-hosted version of FME Server >
22. Demo 5: Scanned PDFs
Old maps and hand-drawn plans are
not straightforward to read.
● They are not georeferenced
● They are not properly oriented
● And they are PDFs!
23. Demo 5: Scanned PDFs
FME Server allows making web services and process results of user input
24. Demo 5: Scanned PDFs
Homework (challenge) - try to extract the
buildings from this map.
25. Helpful Transformers for Scanned PDFs
● TesseractCaller parses text. Calls a third-party OCR library from within FME.
● RasterConvolver performs raster operations. Edge detection helps extract geometry.
● RasterExpressionEvaluator does pixel-by-pixel calculations to prep for geometry extraction.
● RasterToPolygonCoercer automatically converts raster to vector.
● PotraceCaller turns bitmaps into vector graphics. Calls a third-party library.
Performing OCR with TesseractCaller
26. Helpful Transformers for Fixing Bad Data
● GeometryValidator to detect and repair bad geometry.
● Snapper to snap broken lines together.
● AreaBuilder to assemble polygons from lines.
● AttributeManager to add and update the attributes associated with a data feature.
● MapTextLabeller to annotate the geometry with high-quality cartographic labels.
27. Key Takeaways
● Use FME’s PDF Reader to automatically extract text, maps, and other
geospatial data from PDFs.
● Read badly behaved PDFs with the help of transformers.
○ Parse scanned text and geometry using OCR and feature extraction.
○ Repair geometry and attributes with *Validator transformers.
● For enterprise/automation scenarios, set up FME Server to run PDF
conversion workflows automatically.
28. Resources
● Extracting Geospatial Data from PDFs: Top 3 Challenges
● 3 Ways to Convert Raster Images to Vector for CAD/GIS
● OCR for FME
● Using the RasterConvolver
● Tutorial: Getting Started with PDF Reading in FME
● Try FME Cloud: safe.com/fmecloud
○ Tutorial: batch processing with FME Desktop fme.ly/batch