Garuda Robotics x DataScience SG Meetup (Sep 2015)
TUgis2010 Conference Presentation
1. UNDERSTANDING THE EFFECTS OF MAP SCALE:
OPTIMIZATION THROUGH GENERALIZATION
Jason M. Wheatley, GISP
Senior GIS Analyst – Water Resources Services
2. GENERALIZATION
• “…is designed to reduce the complexities of the
real world by strategically reducing ancillary and
unnecessary details.”
• “…is perhaps the most intellectually challenging
task for cartographers.” (analyst, etc.)
• Difficult to automate
– No “one size fits all”
– Subjective
3. WHY GENERALIZE?
• Everything we work with is a generalization
• Data availability
• Processing power
• Time/Funding
• Project/Mapping Specs
– Target Scale! – Make more meaningful/useful mapped
products
– Plethora of high-resolution information available
– Does not require the level of detail present in source
information
4. WAYS WE GENERALIZE
• Tabular
– Remove unnecessary
information
– Aggregate data based on
similar values (geometry)
• Geometry
– Point, line, polygon
density
– Cell size, re-sampling
– Thinning source
information
6. OPTIMIZING
• “…Doing More With Less”
– “Less With More”
• Improve performance/efficiency of existing and
future processes through generalization of
mapping inputs/outputs
• Proactive approach to project
planning/methodology development
• “Sweetspot” source data to produce most effective
information with least effort
7. GOAL (LESS IS MORE)
• Improve speed
– Draw time, processing time, overall workflow
• Storage savings
– Eliminate unnecessary information
• Network performance
– Smaller files = less bandwidth = increased speed
• Time savings
– Dedicated to QA/QC
• Effective products
– Information that is optimized for target scale
– More representative of real-world features
8. SWEETSPOTTING
• Know project/mapping specs
• Analyze inputs and identify locations where
increased efficiency may be realized
• Test multiple methods
• Analyze results to find best formula to efficient, yet
effective product
• Implement
9. G&O NFIP BACKGROUND
• Over 20 years as a FEMA Production and Technical
Services (PTS) contractor
• Currently PTS contractor and QC reviewer for FEMA
Regions I, V, VII, and X as part of Strategic Alliance for
Risk Reduction (STARR) Joint Venture
• Performed thousands of Hydrologic and Hydraulic
(H&H) analyses
• Produced more than 33,000 FIRM/DFIRM panel maps
• Revised more than 4,500 Flood Insurance Studies
(FIS)
10. INSPIRATION/
PROBLEM IDENTIFICATION
• Map Production
– Panning/Zooming draw times
• Several second refresh rates
• Large vector datasets with excessive detail
– Mapping “noise”
– Geoprocessing
– Printing
• Larger files with longer print
processing
• Storage/Serving
– Large vector datasets
– OrthoImagery
– LiDAR
11. INSPIRATION/
PROBLEM IDENTIFICATION
• Surface processing
– Buffer waterways to generate “domain”
– Extract LiDAR groundshot from domain
• May not have enough coverage
– Construct TIN ground surface for flood extraction
• Studies in FEMA Region 7 – Iowa
– High-resolution LiDAR point files
(LAS and XYZI) available from the
GeoTREE Iowa LiDAR Mapping
Project
http://geotree2.geog.uni.edu/lidar/
12. BOONE COUNTY, IOWA
• Voluminous data
– 1.4m avg. point
spacing
– 2.5 Mil groundshot
points per 4.0 Mil m2
– 400 tiles in county
– Approx. 1 Bil
groundshot points
in county
• Extremely difficult
to process seamless
TIN surfaces for
larger domains
13. PROPOSED SOLUTIONS
• Generalize Product (vector)
– Smoothing/Simplifying lines
• Must meet FEMA DFIRM mapping standards (FBS Audit)
• Still requires TIN generation
• TIN extraction not uniform so process is more difficult
• LiDAR Thinning
– Iowa possesses little relief
• Still requires more processing/storage to generate TIN
• Eliminating detail from Raw data
• Raster Elevation Surface
– Generate GRID(s) (2m cellsize) from raw groundshot
• Applies point mean to each cell
• Easy to control generalization
• Smaller file size
• County-wide surface
14. TIN VS. GRID
• Difference in level of detail, or just a difference in
interpolation?
• TIN
– Elevation of each point is preserved
• Vertical error (+/-7”) also preserved
• Eliminates area from laser pulse
(0.5m – 1m)
– Slope/Aspect determined by
triangulating three adjacent points
– Vertices of extraction non-uniform
due to varying triangulations
• Harder to select generalization
tolerances
– Greater uncertainty in sample
voids
15. TIN VS. GRID
• GRID
– Elevation points are “leveled” through cell averaging
• Vertical error also leveled
– Applies elevation values to an
area rather than specific
x/y coordinate
– Vertices of extraction are more
uniform due to equal cell size
• Easier to select generalization
tolerance
– Interpolation considers more
information in void areas
24. 326 Vertices over
5,808 meters of
line
800 Vertices over
5,827 meters of
line
LINE
GENERALIZATION
25. • TINs used for re-delineated flooding
• Flooding produced 1,280,003
vertices
• Simplified by 1m = 177,311 vertices
• Poly size 39.1MB vs. 5.5MB
1:6,000
1:500
27. ORTHOIMAGERY
• Compress/Mosaic image tiles into single file
• Smaller file sizes (TIFF vs. ECW or JP2)
• Less data to manage
• Improved interaction
• No loss in image integrity at max. mapping scale
• Inexpensive options to handle large image
processing
28. CONCLUSIONS
• LiDAR elevation data for Iowa could afford
generalization
• Time savings allows for more time dedicated to
QA/QC
• Produce quality product more efficiently
• TINs are not necessarily more accurate than
Rasters when interpolating surfaces
29. RASTER (GRID) - PROS
• Surface generation was completed in 1/10th of the
time it takes to produce TIN surfaces
• Estimated 80% storage savings
• Flooding extraction was also completed in about
½ the time it takes for TIN extraction
• Linework more smooth, representative of real
world phenomena, and streamlined map
production
• Capable of meeting FEMA DFIRM mapping
specifications
30. CONSIDERATIONS
• Benefits realized more with production work
• Time spent experimenting with processing
• Somewhat subjective “optimized” choice
• Results will vary for locations of greater relief
• Further tests should be performed in order to
confirm 3cell results
31. COMMENTS/QUESTIONS?
• Acknowledgements
– Aurore Larson, P.E., CFM – Greenhorne & O’Mara – Water Resources Services
– Carmen Burducea, CFM – Greenhorne & O’Mara – Water Resources Services
– Zachary J. Baccala, Senior GIS Analyst – PBS&J – Floodplain Management
Division
• References
– Cohen, Chelsea, The Impact of Surface Data Accuracy on Floodplain Mapping,
University of Texas, 2007.
– Foote, K.E., Huebner, D.J., Error, Accuracy, and Precision, The Geographer’s Craft
Project, Dept. of Geography, The University of Colorado at Boulder, 1995.
– Galanda, Martin, Optimization Techniques for Polygon Generalization, Dept. of
Geography, University of Zurich, 2001.
– Lagrange, Muller, and Weibel, GIS and Generalization: Methodology and Practice,
1995.
– Li, B., Wilkinson, G. G., Khaddaj, S., Cell-based Model For GIS Generalization,
Kingston University, 2001.
– North Carolina Floodplain Mapping Program, LiDAR and Digital Elevation Data
(Factsheet), 2003.
Notas del editor
In terms of GIS and Mapping…
Generalization is designed to reduce the complexities of the real world by strategically reducing ancillary and unnecessary details.
Generalization has been quoted as perhaps the most intellectually challenging task for cartographers, analyst, etc.
It’s the task of determining what to show vs. what not to show and At what detail to show it? Can the information used in a study afford less detail; In what ways can it be generalized; and How to generalize it?; and… Will it still meet specifications?
There is no one size fits all approach to generalization. In most cases it is going to be different between studies, sometimes within the same study. So what worked best for one study, may not necessarily work best for another. Source information may be different or specs may be different.
And finding “acceptable” generalization is very subjective. What one person perceives as optimized, another may perceive as too general.
So why do we generalize?
Well…everything we work with within the GIS environment is a generalization of real world features or phenomena.
Sometimes we have no choice but to produce generalized information because available source data may not be as detailed as we prefer. Perhaps the hardware we possess cannot handle large detailed datasets, or we may just not have the time or funding to complete a study at a desired detail.
The focus of this discussion however is voluntary generalization. This comes from understanding your target scale and working within the specs of that scale to produce high quality information, efficiently.
Today we have access to an abundance of high-resolution information be it digital elevation data, high-resolution orthoimagery, or fine detailed vector datasets. In some cases, our work may not require the level of detail present in some of this information.
I’m not stating the using information in its RAW form is incorrect, but in many cases it is not “optimized” for the task at hand.
Ways we generalize…
We may generalize Tabular data in order to remove unnecessary information. It’s good to get rid of the “clutter”. Often new problems are solved using data that was originally developed for another purpose so they may contain information that is not pertinent to target study.
Related to geometry, we may aggregate information based on similar values. For example, I might need a hydric soils overlay but not need information on all of the different hydric soil types. I would dissolve the table based on a hydric value in order to generalize the data to give me only what I need.
To generalize geometry we may simplify point, line or polygon features in order to remove detail
In terms of raster data, cell size can be made larger, or values can be re-sampled using a number of different techniques in order to “smooth” a surface
And with TIN generation, we might extract key points before building a surface
When mapping we have to decide what level of detail to used based on the scale of the map.
With interactive maps you have the option of using information of varying detail based on the zoom level of the map. With static maps you have to decide what level of detail you are going to use based on the mapping scale.
So you might ask yourself… Do I really need the 1:4,800 scale water polygon layer for my 1:250,000 scale map? OR would it be best to represent these features with lines?
Another decision is what to include, and what not to include in a map. If you are mapping at 1:250,000 scale, perhaps you don’t need a street-level roads layer and it would be better to just show Highways.
When talking about data development/creation (which is the generalization focus of this discussion), we can choose to generalize source information, the results, or both?
So the idea here is to use generalization to optimize products, and the processes that go into their production.
The theme of this year’s conference is “Doing More with Less”. Depending on how you want to look at it, you might say I’m doing “Less with More”. Though I’m coming at this from the perspective of improving the performance and efficiency of existing and/or future processes through optimized generalization of inputs/outputs.
And improving efficiency is certainly something that needs to be looked at in economically constrained times.
I want to promote a proactive approach to project planning and methodology development and to really take a look at not only what the project goals are, but look at the information that goes into reaching those goals.
The methods I’m discussing with you today will illustrate some ways G&O has attempted to “optimize” production in order to improve our product and the efficiency in which we produce it.
Ours goals were to improve speeds in terms of our interactions with the information, as well as overall workflows.
We want to eliminate unnecessary information in order to save storage space, as well as improve the speed at which this information is delivered to the staff working on production. Hence saving time on the actual production and having the ability to dedicate more time to other tasks like QA/QC.
While time and space savings are important, most important of course is creating effective products that meet specifications.
The idea of “sweetspotting” in this case is to test several generalization methods and identify which method is the optimal method for the product.
In order to sweetspot generalization you have to know what your project specifications are. You have to analyze all inputs and identify places in these inputs that may provide means to more efficient production. Test several methods that can exploit these means. Analyze the results to find the best formula
Once their tested and proven effective you can finally implement at the production level.
The examples I’m using today are taken from G&O’s National Flood Insurance Program work.
G&O has over 20 years experience as a FEMA PTS contractor.
We are currently a PTS contractor and QC reviewer for 4 FEMA regions as part of the Strategic Alliance for Risk Reduction Joint Venture.
We have performed thousands of Hydrologic and Hydraulic analyses, produced more than 33,000 FIRM and Digital FIRM panel maps and have revised more than 4,500 FISs
This research came to be as a result of several identified issues encountered during floodplain delineation and DFIRM production.
During the mapping phase we were realizing tremendous draw times with each interaction. The vector flooding datasets were filled with so much detail (vertices) that we often had to turn the flooding polygon off in order to efficiently map panels. At the mapping scale of DFIRM panels (1:6,000) this excessive detail becomes what I would consider “mapping noise”. Any geoprocessing or editing would take considerable amounts of time to complete.
These large files also proved to be an issue when it came to printing as jobs would back up as a result of the large file sizes.
As for Storage and serving issues…
Most DFIRM studies utilize orthoimagery as a base for the maps. Often these are available in only large TIFF image tile collections and take up tremendous storage space. Also made mapping difficult without some means to serve the imagery in a cataloged fashion.
And of course anyone who has worked with LiDAR data knows how voluminous it can be even in it’s RAW compressed formats. Add to that the processed surfaces and you can imagine how some of study directories could reach into several hundred GBs.
Ground and water surface elevations are generated as TINs.
Most high-definition LiDAR data prevents us from generating a single seamless TIN for an entire county so the ground surfaces had to be created using buffers or domains of water features. Even some domains proved to be too large for TIN generation and had to be scaled down or divided into separate TINs.
Groundshots are extracted from these domains and then converted to TIN surfaces. If a TIN was found to not be large enough to encompass the potential flooding it had to be edited or recreated using a larger domain.
This process worked and still works fine but as you can imagine it takes substantial time and effort to complete.
Then we started flood mapping in FEMA Region 7 – Iowa. The Iowa LiDAR Mapping project makes available high-resolution LiDAR information free through an online viewer. The data is available for download in both LAS and XYZI formats
Being that this was the highest quality data available, it was chosen for the surface modeling for studies in Iowa.
Some statistics on the resolution of this information…
The Iowa LiDAR flights resulted in a defined 1.4 meter average point spacing. I determined that the average 4 million sq meter tile contained about 2.5 million groundshot points. There were 400 tiles in Boone County resulting in approximately 1 billion ground shot points in the county.
Aside from small streams and tributaries it was evident that it was going to be almost impossible to generate seamless TIN surfaces for medium to large water domains.
It was identified that most of the issues we encountered resulted from the production of high resolution surfaces and the detailed information extracted from them. We assumed that looking at these processes might be the best place to start in order to optimize our production.
One option would be to continue with the existing surface processing, and implement a generalization on the extraction (flooding) only. However, the lines must pass the Flooding Boundary Standards Audit (FBS Audit) and generalizing may cause too many errors. It still requires the laborious task of TIN creation. And due to the “irregular” nature of a TIN, coming up with a simplification/smoothing offset value that will address all locations would be difficult.
Another option would be to thin the RAW LiDAR data prior to building surfaces. Iowa is a location with little relief so this certainly seemed valid. However, it require defining a method to extract only key points. It would still require processing and storing the TIN files, and essentially removing samples from the raw data.
Lastly, we could drop the TIN surface process and move the elevation into a Raster format. Point values would be averaged and applied to whatever cellsize specified, Since the structure of a raster is equal cellsizes it make it easier to control generalization. The file size would be smaller than a TIN surface, and it would be fairly easy to generate a seamless, county-wide surface.
So then the discussion becomes TIN vs. GRID
There are several white papers and discussions about this very topic, even specific to DFIRM mapping. Most of which leaned towards TINs holding more accuracy. My professional opinion is that this is not an issue of detail, but a difference in the method of interpolation.
With a TIN surface it is true that the elevation of each surface point is preserved. However, so is any vertical error the point may contain. In the case of Iowa it is +/-7 inches. Also by applying a value to a “point” you ignore the fact that most LiDAR laser pulses have a diameter And that they are not collected as the “spot” you see when you process mass points. In some cases these can be 0.5 meter to 1.0 meter in diameter.
With TINs slope and aspect are determined by the triangulation of three adjacent points representing that there is no variance in the plane of the 3 points regardless of the distance they are apart.
As mentioned before point spacing and hence the triangulation is non-uniform making it harder to generalize effectively
And finally there is greater uncertainly in sample voids that occur during collection and/or processing due to this 3 point adjacency interpolation.
With GRIDs the elevation samples are “leveled” through locational averaging. This in turn also levels out vertical error found in those samples.
The mean elevation value is then applied to an area, rather than a specific horizontal position.
Extractions made from GRIDs tend to have a more uniform vertex distribution due to the equal cell size so this make it easier to come up with a generalization method that will resolve a majority of these artifacts.
And interpolation in sample void areas considers more information by using all surrounding cell values
I’ve had substantial experience working with LiDAR for the purposes of generating contours. I feel like Contours serve as a good baseline when assessing the generalization capacity of LiDAR data.
Here 4 contour layers created from varying LiDAR generated surfaces.
You’ll notice the lessening of detail in the linework with more generalization. Personally I feel like the TIN contours are too jagged. While the 6cell resampled grid produced contours that are perhaps too generalized. Though the do look nice from a cartographic perspective.
This is a comparison of the TIN contours with the 3cell resampled GRID contours. Positionally, I feel as though there is not enough difference to say the GRID contours are inaccurate when compared to the TIN contours
Definitely less detail and some noticeable variances in locations greater slope. However, the GRID contours are definitely more pleasing to the eye.
This illustrates, briefly, the general processes involved in flood delineation.
A ground elevation surface is generated from LiDAR or another source
XS are created at a specified interval along the floodway
A water elevation surface is generated based on channel width, stream flow, and terrain.
These two elevation surfaces are overlaid and the intersection of the two surfaces is extracted as the floodplain delineation.
So for Boone, IA we tested TIN surface types vs. Raster or GRID surfaces types. We also created a few generalized GRID surfaces to test.
The same process as illustrated in the previous slide was performed for all 5 surfaces.
A TIN and a GRID were created with the RAW LAS groundshot. The GRID was built with 2m cells using the mean of contained points.
And 3 re-sampled grids where produced from the 2m grid. 3 cells, 6 cells, and 9 cells.
So here are some visual comparison of the extracted flooding from the varying surfaces. Moving from the left to the right you can see the level of detail diminishes with greater generalization factors. From a mapping perspective I may consider the extractions on the right more suiting, however they may not meet FEMA flooding specifications.
Our generated flooding is tested for it’s accuracy using the FBS Audit. This audit is a 2 pass test. And depending on the Risk Class of a study determines the percentage of points that must pass the audit. Risk Class A is the most strictly tested stating that at least 95% of the flooding sample locations must pass the audit so this is what we used for the tests. There are 5 defined Risk Classes and these are based on population, population density, and/or anticipated growth in floodplain area.
This generally illustrates the two phases of the test.
First points are generated every 100 ft along the flooding line. The elevations of both the Water surface and ground surface are extracted and applied to the sample points. The values are then compared. Any point with a elevation difference of 1’ or less passes pass 1 of the FBS Audit.
The second phase extracts the “failed” sample points and buffers them by 38’. The min and max Z value of the ground surface is determined for the buffer coverage. The water surface elevation at each point must fall within that 38’ buffer around the point.
A combined percentage of 95% of the FBS points must pass in order for the flooding to be deemed acceptable.
Now 38’ seems like a large buffer, especially when dealing with high-resolution LiDAR elevation. Though the concept is that a shift of 38’ is un-noticeable at the mapping scale of 1:6,000 or less.
After performing the Audit for all of the surfaces, these were the results. I found it interesting that the 2m GRID proved to be more accurate in this test than the TIN. The GRID in pass1 had almost 5% more passing points. Due to the fact that all of the surfaces passed at 100% at 38’ I added more defined test with the 25’ 5’ and 1’ buffers, in order to better identify which method is the most accurate. I won’t read off all of the results, but you can see for the most part they seem to follow the pattern you may expect. With increased generalization you can see a decrease in the pass percentage.
However, the 3cell resampled GRID didn’t follow this pattern. It produced several higher marks than either the TIN or RAW GRID surface. This is normally the point in which I would explain these results… But I’m finding it difficult to do so.
I would like to theorize that by resampling the GRID we were able to eliminate some of the elevation spikes (errors) and produce a more smoothed line from it. However, this doesn’t really make sense because all of the linework was tested against the RAW forms (2m GRID). I even compared the 3cell resampled line with the TIN surface and it produced greater numbers down to 5 ft.
I retested these several times to ensure there was no error on my part with the tests.
We ended up selecting the 3cell resampled flooding line. We felt that the softening of the line made for smoother flooding and we were confident that the accuracy would hold up to study-wide FBS tests. While all of the generalized test surfaces passed FEMA standards, we felt comfortable with the least generalized version. The 6 and 9 cell grids produced flooding that was only slightly smoother than the 3cell flooding and had more influence on horizontal position especially in areas of greater relief.
This is a view of a cross-section comparing the TIN surface to the 3cell resampled GRID surface.
With the bottom zoomed in view you can see the elevation differences between the two… The pink line represents the TIN surface, where the black line represents the GRID surface. This also illustrates the irregular spacing between elevation points that I mentioned earlier.
So now we’ve potentially optimized surface generation and flooding creation, but I still wasn’t satisfied.
Part of my concern was the level of detail our information contained. While the 3cell flooding produced the smoother line we were looking for, it still contained extensive detail in terms of vertices.
The line generated from the 3cell GRID had 800 vertices over a total length of 5,827 meters.
So we tried several simplification tolerances in order to remove “unnecessary” vertices. After several attempts we concluded that using a tolerance of one unit (meter) resulted in a much simpler line without compromising the integrity of the raw line.
The line now contains only 326 vertices over 5,808 meters, and there is no real loss in shape
For an example of the difference a simplified line makes. Here is a different study that used TIN surfaces for re-delineated flooding. The flooding produced a polyline layer consisting of almost 1.3 million vertices. I simplified the line by 1m and produced the same layer but with only 177,311 vertices.
At 1:6,000 you can only barely see a difference in the line work. The red line represents the raw, and yellow the simplified. Even at a scale far beyond DFIRM mapping scale (1:500) you can only see very small shifts between the two layers.
The simplified line and poly layer draw almost instantaneously at any scale, where the raw line/poly can take several seconds depending on the zoom level. When your constantly zooming and panning while mapping and annotating DFIRM panels this can add up quickly.
That line was then run through the FBS Audit to make sure we were still within passing and these were the results.
The simplified line actually passed higher at Pass 1 with 63.4%
It was almost unaffected within the 5ft buffer, and within 1ft the generalized and simplified flooding line still had a higher pass percentage than the TIN
Just to briefly address the how we addressed the orthoimagery issue…
We would mosaic/compress image tile collections into a single seamless file, removing the need of managing a multitude of image tiles, and offering a tremendous space saving when comparing TIFFs to the compressed versions.
Interaction with these mosaics was improved with almost instantaneous draw times.
There was no loss of image integrity as a result of compress at the maximum mapping scale of 1:6,000
And there are a number of inexpensive options to handle and process large image processing.
Through these tests we found out that the Iowa LiDAR elevation data could afford generalization and still maintain necessary accuracy.
We were able to save time in both surface generation, flooding delineation, as well as map production. This time could now be dedicated to more in-depth QA/QC.
While we gained efficiency we didn’t loose accuracy or product quality. In fact our product was still accurate far beyond specifications.
We also learned that TINs are not necessarily more accurate than Rasters when interpolating surfaces.
Some specifics that using a Raster brought to the process…
We were able to generate a complete county-wide elevation model in about 1 tenth of the time it takes to generate sectional TIN surfaces
When comparing the sample location GRID vs. TIN it was estimated that we would use about 80% less storage space with Rasters
The process of flood extraction was also completed in about ½ the time it takes with TINs
The linework produced was more smooth, and their performance streamlined map production
Raster surfaces produce flooding that are capable of meeting and exceeding FEMA DFIRM mapping specifications.
Some considerations of our tests are that…
The benefits of “sweetspotting” are certainly going to be more realized with production work. If you are just composing a map or two than perhaps the effort is not worth it. But I would certainly recommend these techniques for any production level tasks, especially if they are all based in the same region or utilizing the same source data.
This experimentation does take time to complete so it would require consideration during the scoping/budgeting phase of project development. The good thing is once you have a testing method in place you can just replicate that method for other datasets or locations.
Again, the optimized choice may always be a subjective decision, but at least with this method you can at least have a better idea of available options.
The results here would probably not work well in a location like western CO where surface elevation is much more diverse.
I would like to continue to test different locations within Boone, IA to see if I can replicate the 3cell re-sample GRID results. In fact I had tried to demonstrate a larger section of River consisting of 4 tiles, however I couldn’t get TIN surfaces generated. I kept running out of memory and the process would fail.
I can say however, that Boone County Flooding did pass at like 99% with the 3cell GRID flooding generation.