SlideShare a Scribd company logo
1 of 45
NACIS Annual Meeting
Oct 9-11, 2013 | Greenville, South Carolina
Workshop

More than just colouring in: building
maps with a solid analytical foundation
Linda Beale PhD
Visualization process

•

Based on clear need and purpose
-

Who is the audience?

-

•

What is the intended purpose?
What medium is to be used?

These goals can not all accomplished by visual tricks
What does GIS offer cartography?

•

Is it a hammer to crack a walnut?
-

Combining different data to get new information

-

•

Bringing data together from disparate sources
Part of the process of making informative and different maps

Is statistical analysis really needed?
-

The development and application of methods to
collect, analyze and interpret data

-

The science of learning from data
Spatial analysis is about solving problems
•
•
•
•
•
•
•

What is inside an area?
What is nearby?
Where are the events concentrated?
Where do things move over time?
Why things occur where they do?
How can we estimate values for a whole area?
What is a suitable location for …?

• Maps are needed to communicate the result
Getting clarity

•

Descriptive statistics
-

Help understand the data as part of analysis or to quantify data
-

Commonly aspatial i.e. result is not dependant on location

-

Some spatial methods
Basic descriptors
Method

Use

ArcGIS tools

Total

Count or sum of
values



Smallest and
largest values



Minimum,
Maximum






Mode

Most commonly
occurring



Median

Central value








Mean

Average value




Summary Statistics / Spatial Join /
Frequency / Tabulate Intersection
Neighborhood & Zonal Statistics (Spatial
Analyst)
Statistics / Summary Statistics / Spatial
Join
Histogram (Geostatistics)
Neighborhood & Zonal Statistics / Get &
set raster properties (Spatial Analyst)

Spatial Join
Neighborhood & Zonal Statistics (Spatial
Analyst)
Spatial Join
Histogram (Geostatistics)
Neighborhood & Zonal Statistics (Spatial
Analyst)

Statistics / Summary Statistics/ Spatial Join
Neighborhood & Zonal Statistics (Spatial
Analyst)
Data distributions

Method

Use

ArcGIS

Range

Max-min




Standard
deviation

Average deviation about
the mean



nth Quantile

Value that is nth way
through a sorted list







Summary Statistics / Spatial
Join
Neighborhood & Zonal
Statistics (Spatial Analyst)
Summary Statistics
Neighborhood & Zonal
Statistics (Spatial Analyst)
Display Properties
Histogram (Geostatistics)
Demo

Finding quantity
by area
Summarizing by area
Demo

Finding
percentage area
Tabulate Intersection
Spatial descriptors

Method

ArcGIS tools

Mean



Central value



Distribution









Mean Center
Linear Directional Mean
Central Feature
Median Center
Standard Distance
Directional Distribution
Demo

Finding direction
Linear directional mean
Normalization

•

Aspatially:
-

Normalization is to transform a set of measurements so that
they may be compared in a meaningful way
-

•

Examples: Standard score (z values), coefficient of variation

Spatially:
-

Normalization transforms measures of magnitude (counts or
weights) into measures of intensity

-

Using normalization we can take into account the differences
between the areas (e.g. size of area, population size etc)
Understanding quantity

•

We see quantity related to size
As we often see it….
Or…

•

So, we must map ‘like’ with ‘like’
Demo

Normalization
Distributions and patterns

•

Density surfaces of count per unit area
-

Looking at concentrations of features

-

Seeing patterns of features
-

•

Hotspots, Heat maps

A density surface reflects the likelihood of an event
occurring in each cell (bivariate probability density
function)
Demo

Showing
distribution
Density analysis
Maps can lie…so can statistics

•

Assumptions must be met for example, statistical tests are
either:
-

•

Parametric: Data distribution assumptions must be met
Non-parametric: Distribution-free

Analysis often concerned with explaining differences
-

Hypothesis testing

-

Statistical significance does not mean ‘important’
Demo

Identifying
Clusters
Hotspot analysis
Demo

Temporal
patterns
Coxcombs or Rose diagrams
Demo

Comparisons
Percentage Difference
Spatial autocorrelation

•

“Everything is related to everything else, but near things
are more related than distant things."
Tobler (1970)

•

Spatial autocorrelation statistics evaluate the degree of
spatial dependency among observations
Interpolation
•

from Latin interpolates
-

•

Meaning: to estimate a value that lies between two other values

Interpolation is required when:
-

We have samples from something that is continuous

-

A discrete surface has a different resolution (or cell size) to that
required
Spatial interpolation
•

Spatial interpolation is based on the notion that points
which are close together in space tend to have similar
attributes (Tobler’s First Law of Geography)

•

If the relationship between points and their values is
determined by:
-

distance between points = isotropy

-

distance and direction = anisotropy

Interpolated values are reliable only to the extent that the
spatial dependence of the phenomenon can be assumed
Interpolation in ArcGIS
•

IDW

•

Kriging

•

Natural Neighbor

•

Spline

•

Spline with Barriers

•

Topo to Raster

•

Topo to Raster by File

•

Trend

•

Global polynomial

•

Local polynomial

•

Inverse distance weighted

•

Radial basis functions

•

Diffusion kernel

•

Kernel smoothing

•

Ordinary kriging

•

Simple kriging

•

Universal kriging

•

Indicator kriging

•

Probability kriging

•

Disjunctive kriging

•

Gaussian geostatistical simulation

•

Areal interpolation

•

Empirical Bayesian kriging
Deterministic methods

•

The data contains the full range of possible values

•

Things close to one another are more alike than those
farther apart

•

The outcome is exactly known and based on the input
Natural neighbor

•

Weighted average technique based Voronoi

Delauney triangulation
(in dotted lines) on top
of voronoi

•

Delauney triangulation: The geometric dual of
Voronoi i.e. natural bisection between voronoi which
reverses the face inclusion
IDW (inverse distance weighting)

•

Output is limited to the range of the values used to
interpolate

•

Based on the assumption that the interpolating surface
should be influenced most by the nearby points and less
by the more distant points
-

•

Assumes the surface is driven by local variation

Weights assigned diminish with distance from the
interpolation point
-

Sample points should have an even distribution
Demo

Interpolation to
area
Natural Neighbor and IDW
Geostatistical methods

•

Uses the relationships between your data locations and
their values, assuming:

Data is normally distributed
- Data exhibits stationary (no local variation)
- Data has spatial autocorrelation
- Data is not clustered
-

-

-

simple kriging has declustering options

Data has no local trends
- local trends can be removed during interpolation
(and these trends are accounted for in the
prediction calculations)
Kriging

Assumes that spatial variation can be decomposed into 3 main
components:
1.

Deterministic variation or trend/drift

Trend analysed by trend surface analysis techniques
2.

Spatially correlated, random variation
Spatially correlated variation analysed by computing the
semivariance

3.

Spatially uncorrelated variation (noise)
Provides measures of the certainty or accuracy of the
predictions
Normal distribution

-

Histogram

-

A normal QQ plot (probability plot)

-

Bell-shaped

-

No outliers

-

Mean ≈ Median

-

Skewness ≈ 0

-

Kurtosis ≈ 3
Transformations

•

Transformations can be used to bring data
to a normal distribution

e.g. logarithms, box-cox, square root
Data stationarity

•

Statistical properties of data (e.g. mean, variance) are
independent of absolute location

•

Covariance depends on only on the relative locations of
the sites (e.g. the distance and direction between them)
and not their exact location

•

Create a Voronoi map symbolized by:
•

Entropy

•

Standard Deviation
Trend

•

Systematic changes in the mean of the data values
across the area of interest
-

Can be difficult to distinguish from autocorrelation and
anisotropy

-

Trend removal options
Dealing with outliers

Outliers statistically affect your data
•

They may be real and important or may be errors
(such as input errors)

Possible solution:
•

Remove outliers from the modeling step
(semivariogram)

•

Use the full dataset for prediction
The semivariogram

Shows the spatial autocorrelation of the measured sample
points
semivariance

•

sill

partial sill
range

nugget
0

•

lag

Semivariogram(distanceh) =
0.5 * average[(valuei – valuej)2]
Empirical Bayesian Kriging

•

Spatial relationships are modeled automatically

•

Results often better than interactive modeling

•

Uses local models to capture small scale effects
-

Doesn’t assume one model fits the entire data
Using EBK

•

Advantages

Requires minimal interactive modeling
- Standard errors of prediction are more accurate
than other kriging methods
- More accurate than other kriging methods for small
or nonstationary datasets
-

•

Disadvantages

Processing is slower than other kriging methods
- Limited customization
-
Selecting the best model

•

Predictions should be unbiased

Mean prediction error should be near zero (depends on
the scale of the data) so,
- standardised mean nearest to 0
-

•

Predictions should be close to known values
-

•

Small root mean prediction errors

Correctly assessing the variability:

average standard-error nearest the root-mean-square
prediction error
- standardised root-mean-square prediction error nearest
to 1
-
Demo

Interpolation to
area
Empirical Bayes Kriging
Take away points…

•

Good analysis is an important part of cartography

•

Even basic statistics can be powerful

•

Spatial data is more complex…
but often reveals so much more
Demo

Think before you
map…
Building Maps with Solid Analytical Foundations

More Related Content

What's hot

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAttaullah Khan
 
Basics of Statistical Analysis
Basics of Statistical AnalysisBasics of Statistical Analysis
Basics of Statistical Analysisaschrdc
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSarfraz Ahmad
 
Introduction to Statistics - Basic concepts
Introduction to Statistics - Basic conceptsIntroduction to Statistics - Basic concepts
Introduction to Statistics - Basic conceptsDocIbrahimAbdelmonaem
 
Graphical Perception of Multiple Time Series
Graphical Perception of Multiple Time SeriesGraphical Perception of Multiple Time Series
Graphical Perception of Multiple Time SeriesNiklas Elmqvist
 
Validation of digital soil maps
Validation of digital soil mapsValidation of digital soil maps
Validation of digital soil mapsExternalEvents
 
Excel and research
Excel and researchExcel and research
Excel and researchNursing Path
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)Chhom Karath
 
15. descriptive statistics
15. descriptive statistics15. descriptive statistics
15. descriptive statisticsAshok Kulkarni
 
Statistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuStatistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuSudhir INDIA
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
Aed1222 lesson 6
Aed1222 lesson 6Aed1222 lesson 6
Aed1222 lesson 6nurun2010
 
Plotting histogram in bigdata analytics
Plotting histogram in bigdata analyticsPlotting histogram in bigdata analytics
Plotting histogram in bigdata analyticsRajalakshmiK19
 

What's hot (20)

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Basics of Statistical Analysis
Basics of Statistical AnalysisBasics of Statistical Analysis
Basics of Statistical Analysis
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Introduction to Statistics - Basic concepts
Introduction to Statistics - Basic conceptsIntroduction to Statistics - Basic concepts
Introduction to Statistics - Basic concepts
 
Graphical Perception of Multiple Time Series
Graphical Perception of Multiple Time SeriesGraphical Perception of Multiple Time Series
Graphical Perception of Multiple Time Series
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Validation of digital soil maps
Validation of digital soil mapsValidation of digital soil maps
Validation of digital soil maps
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Excel and research
Excel and researchExcel and research
Excel and research
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)
 
15. descriptive statistics
15. descriptive statistics15. descriptive statistics
15. descriptive statistics
 
Tabular and Graphical Representation of Data
Tabular and Graphical Representation of Data Tabular and Graphical Representation of Data
Tabular and Graphical Representation of Data
 
Statistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuStatistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahu
 
R training4
R training4R training4
R training4
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Biostatistics Frequency distribution
Biostatistics Frequency distributionBiostatistics Frequency distribution
Biostatistics Frequency distribution
 
analysis and presentation of data
analysis and presentation of dataanalysis and presentation of data
analysis and presentation of data
 
Aed1222 lesson 6
Aed1222 lesson 6Aed1222 lesson 6
Aed1222 lesson 6
 
Descriptive statistics ii
Descriptive statistics iiDescriptive statistics ii
Descriptive statistics ii
 
Plotting histogram in bigdata analytics
Plotting histogram in bigdata analyticsPlotting histogram in bigdata analytics
Plotting histogram in bigdata analytics
 

Similar to Building Maps with Solid Analytical Foundations

Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2malathieswaran29
 
SEMINAR Presentation ppt.pptx
SEMINAR Presentation ppt.pptxSEMINAR Presentation ppt.pptx
SEMINAR Presentation ppt.pptxWageYado
 
SEMINAR Presentation ppt.pptx
SEMINAR Presentation ppt.pptxSEMINAR Presentation ppt.pptx
SEMINAR Presentation ppt.pptxWageYado
 
Basic geostatistics
Basic geostatisticsBasic geostatistics
Basic geostatisticsSerdar Kaya
 
Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptxgina458018
 
Enhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with CurvesEnhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with Curvesmartinjgraham
 
Review presentation for Orientation 2014
Review presentation for Orientation 2014Review presentation for Orientation 2014
Review presentation for Orientation 2014DUSPviz
 
Vector data model
Vector data model Vector data model
Vector data model Pramoda Raj
 
Vector data model
Vector data modelVector data model
Vector data modelPramoda Raj
 
Seminar on gis analysis functions
Seminar on gis analysis functionsSeminar on gis analysis functions
Seminar on gis analysis functionsPramoda Raj
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RRevolution Analytics
 
SPATIAL POINT PATTERNS
SPATIAL POINT PATTERNSSPATIAL POINT PATTERNS
SPATIAL POINT PATTERNSLiemNguyenDuy
 
GIS CHAPTER 4.pptnhhhhhhghghghhhhghghggh
GIS CHAPTER 4.pptnhhhhhhghghghhhhghghgghGIS CHAPTER 4.pptnhhhhhhghghghhhhghghggh
GIS CHAPTER 4.pptnhhhhhhghghghhhhghghgghpeterhaile1
 

Similar to Building Maps with Solid Analytical Foundations (20)

GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYSTGEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
 
GIS
GISGIS
GIS
 
SEMINAR Presentation ppt.pptx
SEMINAR Presentation ppt.pptxSEMINAR Presentation ppt.pptx
SEMINAR Presentation ppt.pptx
 
SEMINAR Presentation ppt.pptx
SEMINAR Presentation ppt.pptxSEMINAR Presentation ppt.pptx
SEMINAR Presentation ppt.pptx
 
Basic geostatistics
Basic geostatisticsBasic geostatistics
Basic geostatistics
 
Exploratory Spatial Analytics (ESA)
Exploratory Spatial Analytics (ESA)Exploratory Spatial Analytics (ESA)
Exploratory Spatial Analytics (ESA)
 
Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptx
 
Enhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with CurvesEnhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with Curves
 
Review presentation for Orientation 2014
Review presentation for Orientation 2014Review presentation for Orientation 2014
Review presentation for Orientation 2014
 
Vector data model
Vector data model Vector data model
Vector data model
 
Vector data model
Vector data modelVector data model
Vector data model
 
Seminar on gis analysis functions
Seminar on gis analysis functionsSeminar on gis analysis functions
Seminar on gis analysis functions
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
 
Gis basic
Gis basicGis basic
Gis basic
 
SPATIAL POINT PATTERNS
SPATIAL POINT PATTERNSSPATIAL POINT PATTERNS
SPATIAL POINT PATTERNS
 
Statistics
StatisticsStatistics
Statistics
 
Data Visulalization
Data VisulalizationData Visulalization
Data Visulalization
 
GIS CHAPTER 4.pptnhhhhhhghghghhhhghghggh
GIS CHAPTER 4.pptnhhhhhhghghghhhhghghgghGIS CHAPTER 4.pptnhhhhhhghghghhhhghghggh
GIS CHAPTER 4.pptnhhhhhhghghghhhhghghggh
 
Unit III.pptx
Unit III.pptxUnit III.pptx
Unit III.pptx
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Building Maps with Solid Analytical Foundations

  • 1. NACIS Annual Meeting Oct 9-11, 2013 | Greenville, South Carolina Workshop More than just colouring in: building maps with a solid analytical foundation Linda Beale PhD
  • 2. Visualization process • Based on clear need and purpose - Who is the audience? - • What is the intended purpose? What medium is to be used? These goals can not all accomplished by visual tricks
  • 3. What does GIS offer cartography? • Is it a hammer to crack a walnut? - Combining different data to get new information - • Bringing data together from disparate sources Part of the process of making informative and different maps Is statistical analysis really needed? - The development and application of methods to collect, analyze and interpret data - The science of learning from data
  • 4. Spatial analysis is about solving problems • • • • • • • What is inside an area? What is nearby? Where are the events concentrated? Where do things move over time? Why things occur where they do? How can we estimate values for a whole area? What is a suitable location for …? • Maps are needed to communicate the result
  • 5. Getting clarity • Descriptive statistics - Help understand the data as part of analysis or to quantify data - Commonly aspatial i.e. result is not dependant on location - Some spatial methods
  • 6. Basic descriptors Method Use ArcGIS tools Total Count or sum of values  Smallest and largest values  Minimum, Maximum    Mode Most commonly occurring  Median Central value     Mean Average value   Summary Statistics / Spatial Join / Frequency / Tabulate Intersection Neighborhood & Zonal Statistics (Spatial Analyst) Statistics / Summary Statistics / Spatial Join Histogram (Geostatistics) Neighborhood & Zonal Statistics / Get & set raster properties (Spatial Analyst) Spatial Join Neighborhood & Zonal Statistics (Spatial Analyst) Spatial Join Histogram (Geostatistics) Neighborhood & Zonal Statistics (Spatial Analyst) Statistics / Summary Statistics/ Spatial Join Neighborhood & Zonal Statistics (Spatial Analyst)
  • 7. Data distributions Method Use ArcGIS Range Max-min   Standard deviation Average deviation about the mean  nth Quantile Value that is nth way through a sorted list    Summary Statistics / Spatial Join Neighborhood & Zonal Statistics (Spatial Analyst) Summary Statistics Neighborhood & Zonal Statistics (Spatial Analyst) Display Properties Histogram (Geostatistics)
  • 10. Spatial descriptors Method ArcGIS tools Mean  Central value  Distribution     Mean Center Linear Directional Mean Central Feature Median Center Standard Distance Directional Distribution
  • 12. Normalization • Aspatially: - Normalization is to transform a set of measurements so that they may be compared in a meaningful way - • Examples: Standard score (z values), coefficient of variation Spatially: - Normalization transforms measures of magnitude (counts or weights) into measures of intensity - Using normalization we can take into account the differences between the areas (e.g. size of area, population size etc)
  • 13. Understanding quantity • We see quantity related to size
  • 14. As we often see it….
  • 15. Or… • So, we must map ‘like’ with ‘like’
  • 17. Distributions and patterns • Density surfaces of count per unit area - Looking at concentrations of features - Seeing patterns of features - • Hotspots, Heat maps A density surface reflects the likelihood of an event occurring in each cell (bivariate probability density function)
  • 19. Maps can lie…so can statistics • Assumptions must be met for example, statistical tests are either: - • Parametric: Data distribution assumptions must be met Non-parametric: Distribution-free Analysis often concerned with explaining differences - Hypothesis testing - Statistical significance does not mean ‘important’
  • 23. Spatial autocorrelation • “Everything is related to everything else, but near things are more related than distant things." Tobler (1970) • Spatial autocorrelation statistics evaluate the degree of spatial dependency among observations
  • 24. Interpolation • from Latin interpolates - • Meaning: to estimate a value that lies between two other values Interpolation is required when: - We have samples from something that is continuous - A discrete surface has a different resolution (or cell size) to that required
  • 25. Spatial interpolation • Spatial interpolation is based on the notion that points which are close together in space tend to have similar attributes (Tobler’s First Law of Geography) • If the relationship between points and their values is determined by: - distance between points = isotropy - distance and direction = anisotropy Interpolated values are reliable only to the extent that the spatial dependence of the phenomenon can be assumed
  • 26. Interpolation in ArcGIS • IDW • Kriging • Natural Neighbor • Spline • Spline with Barriers • Topo to Raster • Topo to Raster by File • Trend • Global polynomial • Local polynomial • Inverse distance weighted • Radial basis functions • Diffusion kernel • Kernel smoothing • Ordinary kriging • Simple kriging • Universal kriging • Indicator kriging • Probability kriging • Disjunctive kriging • Gaussian geostatistical simulation • Areal interpolation • Empirical Bayesian kriging
  • 27. Deterministic methods • The data contains the full range of possible values • Things close to one another are more alike than those farther apart • The outcome is exactly known and based on the input
  • 28. Natural neighbor • Weighted average technique based Voronoi Delauney triangulation (in dotted lines) on top of voronoi • Delauney triangulation: The geometric dual of Voronoi i.e. natural bisection between voronoi which reverses the face inclusion
  • 29. IDW (inverse distance weighting) • Output is limited to the range of the values used to interpolate • Based on the assumption that the interpolating surface should be influenced most by the nearby points and less by the more distant points - • Assumes the surface is driven by local variation Weights assigned diminish with distance from the interpolation point - Sample points should have an even distribution
  • 31. Geostatistical methods • Uses the relationships between your data locations and their values, assuming: Data is normally distributed - Data exhibits stationary (no local variation) - Data has spatial autocorrelation - Data is not clustered - - - simple kriging has declustering options Data has no local trends - local trends can be removed during interpolation (and these trends are accounted for in the prediction calculations)
  • 32. Kriging Assumes that spatial variation can be decomposed into 3 main components: 1. Deterministic variation or trend/drift Trend analysed by trend surface analysis techniques 2. Spatially correlated, random variation Spatially correlated variation analysed by computing the semivariance 3. Spatially uncorrelated variation (noise) Provides measures of the certainty or accuracy of the predictions
  • 33. Normal distribution - Histogram - A normal QQ plot (probability plot) - Bell-shaped - No outliers - Mean ≈ Median - Skewness ≈ 0 - Kurtosis ≈ 3
  • 34. Transformations • Transformations can be used to bring data to a normal distribution e.g. logarithms, box-cox, square root
  • 35. Data stationarity • Statistical properties of data (e.g. mean, variance) are independent of absolute location • Covariance depends on only on the relative locations of the sites (e.g. the distance and direction between them) and not their exact location • Create a Voronoi map symbolized by: • Entropy • Standard Deviation
  • 36. Trend • Systematic changes in the mean of the data values across the area of interest - Can be difficult to distinguish from autocorrelation and anisotropy - Trend removal options
  • 37. Dealing with outliers Outliers statistically affect your data • They may be real and important or may be errors (such as input errors) Possible solution: • Remove outliers from the modeling step (semivariogram) • Use the full dataset for prediction
  • 38. The semivariogram Shows the spatial autocorrelation of the measured sample points semivariance • sill partial sill range nugget 0 • lag Semivariogram(distanceh) = 0.5 * average[(valuei – valuej)2]
  • 39. Empirical Bayesian Kriging • Spatial relationships are modeled automatically • Results often better than interactive modeling • Uses local models to capture small scale effects - Doesn’t assume one model fits the entire data
  • 40. Using EBK • Advantages Requires minimal interactive modeling - Standard errors of prediction are more accurate than other kriging methods - More accurate than other kriging methods for small or nonstationary datasets - • Disadvantages Processing is slower than other kriging methods - Limited customization -
  • 41. Selecting the best model • Predictions should be unbiased Mean prediction error should be near zero (depends on the scale of the data) so, - standardised mean nearest to 0 - • Predictions should be close to known values - • Small root mean prediction errors Correctly assessing the variability: average standard-error nearest the root-mean-square prediction error - standardised root-mean-square prediction error nearest to 1 -
  • 43. Take away points… • Good analysis is an important part of cartography • Even basic statistics can be powerful • Spatial data is more complex… but often reveals so much more

Editor's Notes

  1. This template is based on Esri Corporate Template v1.1, March 7, 2013
  2. Rarely do we have prepared data Often requires some processing/analysis
  3. Is it a hammer to crack a walnut?
  4. Many traditionally form approaches known as EDA methods but when does exploration turn into discovery? Perhaps better termed exploratory and discovery analysisEasy to understand and explainTend to represent central values rather than extremesTend to be aspatial but there are also a number of spatial approaches that have valid uses
  5. There are a numberof methods we can use to describe our data and a number of different tools we can use in ArcGIS. We can also calculate statistical values on an attribute table for total (or sum), min, max, meanIt also gives us further valuable information such as the number of NULL values in our data. Remember that zero values are included in numerical calculations so shouldn’t be used in cases where, for example, there are missing data. The last three methods aremeasures of central tendency but theseare often not adequate to fully describe data. Two data sets can have the same mean but they can be entirely different. We can better understand databy with the extent of variability. This is given by the measures of dispersion. Range, interquartile range, and standard deviation are the three commonly used measures of dispersion.Also have harmonic mean for average quantities such as speed: so that if you have x unit per y and the x's are knownHarmonic Mean is appropriate e.g. We want to find theaverage speed in Kilometers per hour and you know the kilometres (distance travelled), so the x, then use Harmonic mean. If the y's e.g. Hours (time- of journey) are given use arithmetic mean.
  6. Range: is the difference between the largest and smallest. Simplest of all dispersion measures.Standard deviation: is the most commonly used measure of dispersion, average deviation from the mean. If the observations are from a normal distribution it is much simpler to understand as 68% of observations lie between mean ± 1 SD 95% of observations lie between mean ± 2 SD and 99.7% of observations lie between mean ± 3 SD.If your data is skewed or you have ordinal data then median and interquartile range should be used to measure dispersion.Another way of looking at Standard Deviation is by plotting the distribution as a histogram of responses. A distribution with a low SD would display as a tall narrow shape, while a large SD would be indicated by a wider shape. Quantiles: Divide the sample data into equal-sized subgroups of adjacent values or a probability distribution into distributions of equal probability.Five quantiles are called quintiles so that if we look at the values below the 1st quintile we have 20% of the data and the 1st quintile shows 20% of values, the 2nd quintile is 40% and so on. So we can see that the Median, as the central value is the 50th percentile. Quartiles is dividing the data into 4 parts, often shown as interquartile range. So, for example, the 75th percentile is the upper quartile.This measure gets around issues of having one or two really high/low values that distort the range, however, both range and quartiles only take into account a few values in the dataset.
  7. Sometimes we want to describe the spatial distribution as apposed to the values at locations.And, there are a number of tools in ArcGIS that allow you describe your data spatially.The median center in this case represents the centre of minimum travel, as opposed to median in the aspatial sense. The name for this spatial statistic differs by county as some countries e.g. the UK define the median center as being analogous to the median of a set of data.So, let’s have a look at some examples of basic descriptions before we move on…
  8. In aspatial terms:Adjusting values measured on different scales to a notionally common scale allows you to draw comparisons between variables that are not the same e.g. different units. This can be a valuable approach in geographical analysis when we often use proxy or indicator variables to effectively ‘fill in gaps’ when we do not have data.The standard score is the number of standard deviations an observation is above or below the mean. A positive standard score represents a valueabove the mean, while a negative standard score represents a value below the mean. Found by subtracting the population mean from each value and then dividing the difference by the standard deviation of the whole dataset.Not to be confused with the concept of ‘’normal distribution.We must also consider these same statistical principles when we work spatially and visualise our results:Commonly use rates and ratiosChoropleth maps should show normalized values not counts collected over unequal areas or populations
  9. We can instinctively convert visual images into information about quantity and intensity. We can instinctively see that the glass on the right has half the amount of drink…we can make comparisons and gain information.What is the glass is different…it is no longer easy to judge.
  10. Summary statistics, such as averages, medians, or percentages are already measures of intensity and should not be normalized.
  11. Measures the intensity of a spatial point pattern based on sample observations. It creates a continuous surface showing the density of features or values irrespective of arbitrary administrative boundaries. Useful for estimating the intensity of one type of event relative to another e.g. disease cases compared to a control group.
  12. Using test statistics we can show the strength of relationships between samplesParametricObservations should be independent and drawn from a population with a normal distributionPopulation is homoscedastic i.e. equal variancesResults of all inferential statistics are only valid if they are applied to random samplesFrom your data it may seem or be apparent that there are differences but when analyzing data we must be objective.Part of this approach means investigating a theory termed hypothesis testingStatistical tests can’t show us what is true but they can show us what is not true. So, when defining a hypothesis we are trying to disprove it. If I was to take the example of heights of men and women, my hypothesis would be that men and women are, on average, the same height. I would then use a statistical test to try to reject my hypothesis and reject the null hypothesis.If I sampled 3 men and 3 women I could find that women are taller than men but if I increase my sample size to a more representative number then I would likely have found the opposite and so be able to reject my hypothesis.The implication is that any relationship seen in a sample dataset could be a result of chance and a more complete set of samples may show a different relationship.Statisticians always start from the assumption that sample results are not representative of the whole.The probability that a hypothesis can be rejected is called the significance level. This should be defined before the test is carried out.The smaller sample size the harder it is to know if it is representative of the whole population (i.e. real situation). Degrees of freedom, in some way, represents the size of the sample.Significance is a statistical term that tells how sure you are that a difference or relationship exists. “Statistical significance" does not mean "significant" in the sense of "important.“ Statistical significance tells you if the relationship you observed in your sample is likely to hold up in the population. It, therefore, tells you if you can generalize your finding from your sample to your population.How much taller do men have to be than women to say that they are taller. This depends on the average heights, number of samples. This is what the p-value quantifies how sure I am that they are taller.
  13. Demos: Export to excel and resources center?Test statistics give a single value so are not something we would want to map but may be something that you can use to support your analysis.A number of statistics tests are available in excel and we can use the new export to excel tool (at 10.2). There are also tools available for download on the resources center written by the analysis teamA more extensive suite of tests are available in the python library Scipy, which you can then use from the python command line or in your scripts.Land cover by watershed: Change in forested area between 2001 and 2006
  14. The problem with the presence of spatial autocorrelation is that it corrupts standard statistical tests. So, there is real need for true spatial statistical tests.Spatial autocorrelation is determined by both similarities in position and by similarities in attributesSpatial autocorrelation that is more positive than expected from random indicate the clustering of similar values across geographic space, while significant negative spatial autocorrelation indicates that neighboring values are more dissimilar than expected by chance.In ArcGIS, for statistical hypothesis testing, Moran's I values are transformed to Z-scores in which values greater than 1.96 or smaller than −1.96 indicate spatial autocorrelation that is significant at the 5% level.
  15. So what is spatial interpolation: Closer points should have less difference in value than points farther apart
  16. 8 SA tools and 15 in GA. We won’t be covering them all but hopefully I will cover enough of the key points that you can explore others on your own.
  17. Input must represent the high and lows of values
  18. Finds the closest subset of input samples to a query point and applies weights to them based on proportionate areas (from Voronoi/Theisson polygons) to interpolate a value.Local interpolator - uses a subset of samples that surround a query point, and interpolated values are always within the range of the samples used. The surface passes through the input samples and is smooth everywhere except at locations of the input samples.The proportion of overlap between theisson polygons (proximal solution) and an overlaid voronoi defines the weight.Not affected by data distribution unlike other distance IDW
  19. Combines the ideas of proximity in theisson polygons and gradual change in trend surfaceExact interpolator - Input must represent the high and lows of values Assumes spatial autocorrelation in the dataPower parameter: controls the weighting by distance. Higher value gives nearest points more emphasis (surface will be less smooth). The optimal value is where the minimum mean absolute error is at its lowest.Same result from each extension given the same inputs
  20. Placing the model through the points (i.e. finding the curve of best fit) gives us a measure of the spatially correlated random component. Semivariance is the measure of interdependence between the values, based on how close they are to each other.
  21. With kriging in spatial analyst the data can not be transformed so it must be normally distributed. In Geostatistical analyst it can be transformed.Calculations use transformed data and then back transformation is done automaticallyNormal Score transformation: Fits a mixture of normal distributions to the data
  22. Another assumption of many geostatistical techniques is that your data is stationary:Its statistical properties are independent of absolute location i.e. mean, variance, do not depent upon location. Covariance depends on only on the relative locations of the sites, the distance and direction between then and not their exact location.In a spatial or temporal context, such dependence is called autocorrelation.The statistical parameters (mean and standard deviation) of the process do not change over spaceA stationary process has the property that the mean, variance and autocorrelation structure do not change over space. Stationarity can be defined in precise mathematical terms, but for our purpose we mean a flat looking series, without trend, constant variance over time, a constant autocorrelation structure over time and no periodic fluctuationsEBK can be effectively used with non-stationary data
  23. Histogram: values in far removed bars to the left or right may indicate outliersQQ plot: values at the tails of a normal can also be outliersSemivariogram cloud - Shows the relationship between two points. Points close together have high differences in their values may be outliers
  24. To reduce the number of points in the empirical semivariogram, the pairs of locations are grouped based on their distance from one another. This grouping process is known as binning.The lag size is the distance between the points in the bins.The default is selected using a reasonable rule of thumb based on the data extent but a more robust method is to use the average nearest neighbor tool (in the spatial statistics toolbox). This will help you find the average distance between points and their nearest neighbors. If your data is clustered, you might need to use a smaller lag size than the average nearest neighbor to obtain a more accurate measure for the nugget in the semivariogram.The nugget represents the smallest distance between points in the data and the shortest distance for which you can understand a relationship with distance and value.A easier approach is to use the optimise button. It will help fit the semivariogram model primarily focussing on the range parameter and is based on minimising the mean square error.
  25. EBK makes multiple simulations of the semivariogram and we are looking for the median to fall within the 25th and 75th percentiles.Each location uses a weighted sum of the distributions.It creates different, local models across the area. You can overlap these models to create a smooth surface.AdvantagesRequires minimal interactive modeling Standard errors of prediction are more accurate than other kriging methods More accurate than other kriging methods for small or nonstationary datasetsDisadvantagesProcessing is slower than other kriging methodsLimited customization
  26. Predictions should be unbiased and centered on the true values. If the prediction errors are unbiased, the MEAN PREDICTION ERROR error should be near zero.But this value depends on the scale of your data so the standardised mean (MEAN STANDARDIZED) is also reported. So, this should also be near zero.If the root mean squared standardized errors are >1 you are underestimating variability in your predictions and, if root mean squared standardized errors are < 1 you are overestimating.
  27. Demos:When does it makes sense: think about what the data shows (totals with sample data, interpolation with emissions)How can you use descriptive statistics – comparisons with other (reference) areas.Spatial linear mean > anisotropy