Adamas university2018 f

Prof Ashis Sarkar, W.B.S.E.S.
Formerly, HOD: Geography
Presidency College / University and Chandernagore College
Managing Editor & Publisher: Indian Journal of Spatial Science
www.indiansss.org
Statistical Analysis in Geography
3-Day National Workshop
on
Geospatial Data Analysis using Open Source Software
Organised by
Department of Geography (School of Science)
Adamas University, Barasat, West Bengal - 700 126
www.adamasuniversity.ac.in
(17 – 19 April, 2018)

The term “Geography” is used to —…."describe or picture or write
about the Earth": Eratosthenes (276–194 BC).
But, mere names of places...are not geography... It has higher
aims than this: it seeks —
(1) to classify phenomena,
(2) to compare,
(3) to generalize,
(4) to ascend from effects to causes, and in doing so,
(5) to trace out the laws of nature and to mark their
influences upon man.
This is simply 'a description of the world‘ : Hence, Geography is a
Science—
- a thing not of mere names but
- of argument and reason,

Geography is important in Real World —
1. to avoid the errors of other people and countries and to profit from
their ideas (Aristotle, 384–322 BC).
2. to keep our principles of living with nature alive ….compare them to
those in other countries (Aurelius, 121–180 AD).
3. as it teaches us to put things in perspective amid earthquakes,
volcanism, floods and wars (Montaigne, 1533–1592).
4. it’s the job of a geographer to be exact, truthful and unbiased about
time, witness for the past, examples and counsel for the present and
warning for the future (Cervantes, 1547–1616).
5. to judge our own customs and actions better by examining those of
others living in different situations (Descartes, 1596–1650).
6. to learn the universal principles of human nature by seeing man in
different kinds of situations and circumstances (Pascal, 1623–1662).
7. to understand, live and exist peacefully in our current world
scenario (Hume, 1711–1776).
8. to ensure order in society by having an informed and educated
people and to increase understanding (Adam Smith, 1723–1790).

Geography has 5 fundamental themes —
Location: position on the earth’s surface
Place: physical and human characteristics
Relationships within places: man – environment
Movement: human interaction on the earth
Regions: how they form and change
Earth – in relation to – Man: Physical Geography
Man – in relation to – Earth: Human Geography
The Choice is Yours!!
Physical Geography
Geotectonics, Geomorphology, Climatology, Soil Geography, Plant Geography,
Environmental Geography embrace the domain of Earth Science, Atmospheric Science
and Life Science:
Hypotheses, Theories and Universal Laws of Physics, Chemistry, and Mathematics
Human Geography
Principles of Economic Geography including Agricultural, Industrial, Commercial, and
Transport Geography, Social / Cultural Geography , Population Geography, Schools of
Thoughts embrace the domain of Economics, Sociology, Demography, Psychology,
Earth Science, Atmospheric Science and and Philosophy:
Hypotheses and Theories :: Mechanistically applied from Others

Geography is a very ancient discipline but suffers from the absence of
‘Universal Laws’. Eventually, the IVY League of Universities shut down
their Geography Departments in the late 1950s. No Law – No Science –
No Research – No Funding: the raison d’etre of “QR”: an all-out effort to
build Theories and Laws.
GeoCube
The scope and content of “Modern Geography” is viewed as a
GeoCube, each one of its 6 Planes represents a Geographer’s
perspective —
1. Exploring our World (Global Issues: geographers are now
exploring)
2. Fascinating Earth (Physical Earth: geographers’ main
concern)
3. Living Together (Human Society, Economy and Development
that geographers normally address)
4. Shrinking Planet (Human Exploitation of Earth: geographer’s
major concern)
5. Useful Geography (Man’s Activity Space: measured,
monitored, mapped and modeled by GIS & RS)
6. Earth from all Angles (Regional Complexities: geographers
focus on).

A.Exploring our World: Global Issues
1. Species Extinction 2. Deforestation 3.Ozone Hole
Formation
4. Climate Change 5. Weather Forecasting 6.Population Growth
and Distribution

A. Exploring Our World
7. Aging Population 8. Tourism 9. War
1.Earthquakes 2. Tsunami 3. Volcanoes
B. Fascinating Earth (Physical Earth)

4. Storms 5. Hurricanes 6. Floods
B. Fascinating Earth (Physical Earth)
7. Drought 8. Forest Fires 9. Conflicts in
Earth Systems

C. Living Together (Human Society, Economy and Development)
1. Ethnicity 2. Language 3. Literacy
and Religion
4. Health 5. Migration 6. Mobility

7. Poverty 8. Economic Development 9. Pollution in Cities
C. Living Together (Human Society, Economy and Development)
1. Nature 2. Minerals 3. Water Resources
D. Shrinking Planet (Human Exploitation of our Earth)

4. Land Resources 5. Energy Resources 6.Agriculture
D. Shrinking Planet (Human Exploitation of our Earth)
7. Food Resources 8. Housing 9. Waste and
Distribution Pollution

E. Useful Geographies (Man’s Activity Space)
1. Risk 2. Transport 3.Planning
4. Education 5. GIS 6. Geographic Skills

7. Using Maps 8. Facilities Management 9. Resources
Management
E. Useful Geographies (Man’s Activity Space)
1. Mountains 2. Deserts 3. Wetlands
F. Earth from all Angles (Regional Complexities)

F. Earth from all Angles (Regional Complexities)
4. Rivers 5. Oceans and Seas 6. Karst Landscape
7. Polar Region 8. Rural Landscape 9.Urban Landscape

Why “Statistics” in Geography??

The measurement, mapping, monitoring and modelling
of any of these 54 Elements either individually or in
combination or their interrelations form the Core
Geography.
What we just need to comprehend:
1.A real-time, well-designed depiction of our habitat, economy
and society, along with their interrelations, i.e., MAPs.
2.Naturally, a huge GDM needs to be prepared, stored,
integrated, and analysed for information building.
Hence, come into play the fundamental pillars of modern
‘scientific geography’:
1. Statistical Techniques / Methods (for RDBMS, analysis and
Interpretation)
2. Cartography: The Art and Science of Map Making (for
visualization)

Geography…. deals with the description and explanation of the areal
differentiation of the earth's surface ”.
The statement has two parts ―
1. The first part concerns: how one should study phenomena; It includes the
cognitive description and explanation (cognitive = coherent, rational and
realistic, description = analysis of inter-connections within the geographical
objects / events, explanation = necessity of such inter-connections). Thus, it
refers to the methods of geography.
2. The second part concerns: what one should study; It identifies a domain of
objects and events defining the operations of description and explanation.
Thus, it concerns the goals or substantive objectives of geographical
studies.
Hence, lies the foundation of the Scientific Geographical Analysis:
“goal - oriented and object - specific descriptive, numerical (statistical,
morphometric, cause-and-effect, evolutionary, functional-and-ecological and
system), spatial and cartographic analysis of geographical data, its
presentation and geographical explanation”.
It can be viewed multilaterally — as an activity, a process and an organized
attempt at communicable understanding.

The QRs led to a ‘deep-rooted redirection of the discipline’ of
geography (Billinge, 1984). The effects are manifested in three
major ways —
1. The teaching of statistical techniques remains a key and
virtually universal element in the training of new geographers.
2. The growing interest in geographical information systems
(GIS) and remote sensing (RS) and a resurgence of interest in
spatial statistics seems to buttress the quantitative approach in
geography for the foreseeable future (Campbell, 1991).
3. Statistical techniques have survived a counter-reaction to the
QR because at a practical level they have something to offer
the Marxists, Structuralists, Political, Economic and even
Behaviouralist geographers.
Modern geography is really less concerned with establishing its
scientific credentials than with promoting its usefulness to society.

Geographers study:
1. how and why elements differ from place to place, as well as
2. how spatial pattern changes through time.
Thus, geographers begin with the question ‘where?', exploring
how features are distributed on a physical or cultural landscape,
observing spatial patterns and its trend.
The contemporary analysis has shifted from ‘where’ to ‘why’—
1. why a specific spatial pattern exists?
2. what spatial processes may have affected the said pattern?
3. why such processes operate.
Only via these 'why' questions, geographers try to understand the
mechanisms of change, which are infinite in their complexity: to
which “Spatial Statistics” is the key .

Thus, geographers have in their closet a huge database
that needs to be processed for information by:
1. describing and summarizing spatial data,
2. making generalizations about complex spatial patterns,
3. estimating the probability of outcomes for an event at a given
location,
4. using sample data to infer characteristics for a larger set of
geographic data,
5. determining if the magnitude or frequency of some
phenomena differs from one location to another, and
6. learning whether an actual spatial pattern matches some
expected pattern.

Data analysis in Geography concerns the methodology for
collecting, analyzing, and presenting data. It involves the
application of statistical techniques in the following ways ―
1. first, these help summarize the findings of studies (example:
total rainfall during a period in a state),
2. second, these help understanding of the phenomenon under
study (example: rainfall is more in the southern districts),
3. third, these help forecast the state of variables (example:
draught is likely during the next year),
4. fourth, these help evaluate performance of certain activity
(example: more rainfall means more rice production),
5. fifth, these help decision making (example: finding out the
best location for a Polio Booth),

6. sixth, they also help to establish whether relationships
between the characteristics of a set of observations are
genuine or not, and
7. finally, certainly the results of the analysis make a valuable
contribution to the body of geographical knowledge.

Based on the nature of variable, the analytical techniques in
geography may be classified as below ―
A. Univariate Analysis
It concerns methods with one variable only. The data can be
portrayed as a series of points along an appropriately scaled line.
Thus, these allow the distribution of points along the line to be
described and statistically stated.
Data Description:
1. Frequency Distribution (absolute, relative, cumulative),
2. Central Tendency (mean, median, mode),
3. Dispersion (range, mean deviation, standard deviation, quartile
deviation, coefficient of variation, standard score), and
4. Shape of the Frequency Distribution (skewness, kurtosis),
5. The approximation as to the nature of probability distribution in
the population.

Descriptive Statistics are used to:
1. Quantitatively describe a data set characterizing a variable for a
particular region, e.g., elevation, relief, slope, temperature,
rainfall, soil properties, irrigation intensity, cropping intensity,
agricultural production, industrial production, population, rural
percent, population density, urban percent, etc.
2. Compare data sets between different geographical regions.
3. Test the difference of a Sample Mean from the Regional Mean.
4. Test the difference between two different Sample Means.
5. Construct Distribution Maps based on measures of Central
Tendency and Dispersion.
6. Construct Probability Map based on Expected Value, Mean and
Standard deviation.

1) Description:
Minimum, Maximum, Range, Mean, Median, Mode, Standard Deviation,
Quartile Deviation, Variability
2) Comparison of Data Sets:
Minimum, Maximum, Range, Mean, Median, Mode, Standard Deviation,
Quartile Deviation, Variability
3) Test the difference of a Sample Mean from the Regional Mean:
Student t –test
4) Test the difference between two different Sample Means:
Student t – test
5) Construct Distribution Maps based on measures of Central
Tendency and Dispersion:
Choropleth Map (class: mean and sd), Isopleth Map (variability, standard
score), Diagrammatic Map (Dispersion).
6) Construct Probability Map (based on Expected Value, Mean and
Standard deviation): Isopleth / Choropleth Map

R
Block x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
B1 11.82 21.69 5.36 16.94 45.51 15.86 9.95 55 2.84 13465 100 91.67
B2 11.91 31.71 5.25 14.58 51.6 19.01 10.44 52 4.86 10579 100 100
B3 10.36 25.11 4.93 16.22 47.9 16.91 11.43 46 4.57 7892 96.48 88.73
B4 17.16 8.37 5.36 14.54 55.35 20.9 11.31 44 13.1 10506 92.17 88.7
B5 28.5 24.76 4.92 11.86 62.63 25.55 13.59 31 13.64 11617 94.79 94.31
B6 33.27 5.15 5.49 14.64 56.01 20.97 9.87 39 2.79 7750 89.63 86.67
B7 6.51 48.97 4.83 12.18 52.93 20 16.57 38 5.15 7204 100 88.24
B8 7.39 19.4 5.25 13.98 54.43 20.57 13.66 42 2.93 13699 92.09 88.84
B9 5.93 51.86 4.67 12.8 53.52 20.85 16.12 37 5.06 9213 100 94.81
B10 22.44 22.03 4.82 12.84 55.59 21.3 14.54 39 3.5 18067 88.93 87.3
B11 19.52 25.46 5.06 12.86 59.95 23.75 15.95 34 3.34 11418 100 94.83
B12 10.57 12.77 5.3 16.76 45.58 14.81 9.04 55 2.97 13234 100 90.08
B13 12.39 11.38 4.95 15.19 56.13 21.42 11.52 49 3.72 13616 91.61 91.61
B14 14.37 9.8 5.5 15.81 48.78 17.39 9.97 51 3.3 34267 97.35 92.92
B15 36.44 6.42 5.46 13.89 57.95 22.05 10.46 45 3.87 9158 84.91 84.91
B16 35.37 10.7 5.19 12.5 58.94 23.25 11.12 36 1.36 7602 100 78.43
B17 28.86 22.3 5.42 13.47 56.37 21.77 13.11 41 4.73 7261 92.74 87.9
B18 25.59 31.95 5.19 13.43 55.54 21.79 14.26 34 37.7 7212 89.42 89.42
B19 27.16 4.85 5.34 15.01 53.88 20.61 10.56 53 2.48 6674 88.79 87.07
B20 14.54 24.74 4.9 12.93 59.33 23.59 15.34 41 4.2 12230 100 88.07
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
Mean 19.38 20.93 5.15 13.97 54.86 20.87 12.57 42.47 6.49 11536.79 94.68 89.62
Standard Error 2.31 3.11 0.06 0.32 0.98 0.58 0.55 1.61 1.88 1438.04 1.15 1.05
Median 17.16 22.03 5.19 13.89 55.54 20.97 11.52 41.00 3.87 10506.00 94.79 88.73
Standard
Deviation 10.08 13.57 0.26 1.40 4.27 2.54 2.39 7.03 8.21 6268.25 5.01 4.57
Sample Variance 101.59 184.19 0.07 1.96 18.21 6.43 5.70 49.49 67.34 39290962.40 25.06 20.91
Kurtosis -1.26 0.54 -1.20 -0.70 0.26 0.92 -1.30 -0.93 12.78 10.06 -1.26 1.76
Skewness 0.32 0.93 -0.25 0.42 -0.49 -0.65 0.31 0.29 3.43 2.90 -0.29 -0.02
Range 30.51 47.01 0.83 4.90 17.05 10.74 7.53 24.00 36.34 27593.00 15.09 21.57
Minimum 5.93 4.85 4.67 11.86 45.58 14.81 9.04 31.00 1.36 6674.00 84.91 78.43
Maximum 36.44 51.86 5.50 16.76 62.63 25.55 16.57 55.00 37.70 34267.00 100.00 100.00
Sum 368.28 397.73 97.83 265.49 1042.41 396.49 238.86 807.00 123.27 219199.00 1798.91 1702.84
Raw Data Matrix
Descriptive Statistics

n
s
x 
1
)( 2


n
xx
Test : Whether sample mean is significantly different from population mean?
Student’s t is the test statistic. Let x be a random variable distributed according to a
normal distribution, then the random variable t follows a Student distribution with (n – 1)
degrees of freedom:
Student’s t = Sample standard deviation, s =
Example:
Let the rainfall figures of 10 Stations in a region are = {47, 51, 48, 49, 48, 52, 47, 49, 46, 47} and
the regional mean = 50
The hypotheses are:
Null Hypothesis: H0: μ = 50 (sample comes from a population with a mean = 50).
Alternative Hypothesis: H1: μ ≠ 50 (sample does not come from a population with a mean = 50).
Computation
Sample Mean = 484/10 = 48.4 s = √(32.4/(10-1) = 1.90
Student’s t = (48.4 – 50) / (1.90/√10) = - 2.67 df = (10 – 1) = 9
At α = 5%, t critical = 2.262 (From Table). Therefore, t >t critical at 5%
Hence, Null Hypothesis is rejected in favour of Alternative Hypothesis.
Inference
At 5% significance level, the sample mean is significantly different from population mean, i.e., the
sample of rainfall figures does not come from a region (population) with a mean = 50.

The descriptive statistics of altitudinal figures of two physiographic units X and Y are given as:
Unit – X (40): Mean = 600m, SD = 90m, Unit – Y (50) : Mean = 650m, SD = 70m
The hypotheses are:
Null Hypothesis: H0: the two units are not different from each other.
Alternative Hypothesis: H1: the two units are different from each other.
Sp = √[{(40 -1).902 + (50 – 1).702} / (40+50-2)] = 745.65 / 9.38 = 79.49
Student’s t = (650 – 600) / [79.49√{(1/40)+(1/50)}] = 50/16.86=2.966
df = (50+40-2) = 88
At α = 1% tcritical = 2.634 (from Table). Therefore, t > tcritical at 1%
Hence, Null Hypothesis is rejected in favour of Alternative Hypothesis.
Inference
At 1% significance level, the two physiographic units are significantly different from each other
The hypotheses to be tested are -
Test : Whether two sample means are significantly different from each other or
not.
For comparing 2 Sample means, Students t-test is used for 2 Sample situation.

C
Choropleth Map
Probability Map

B. Bivariate Analysis
In this two variables are analysed together. The two measurements made on
one object give co-ordinates of a point in a 2D - space and the data set can be
plotted as a 2D-Scatter. It describes and analyses the shape of the scatter for
investigating the relationship between the data points and / or between the
variables.
1. Bivariate Correlation (nature of association between variables)
a) Pearson’s Product Moment Method (Ratio Data)
b) Spearman’s Rank Correlation Method (Ordinal Data)
c) Test of Significance (Student’s t)
2. Bivariate Regression (estimation of best-fit line followed by interpolation /
extrapolation)
a) Eye Estimation
b) Semi-average Method
c) Least Squares Method
d) Standard Error of Estimate and Confidence Limits
e) Test of Significance: Goodness-of-fit, Coefficient of Determination
3. Bivariate Classification

Sc
Y = 0.0056 x + 0.0884
R2
= 0.22
r = 0.47
0.4
0.5
0.6
0.7
0.8
80 85 90 95 100 105
Block x y
B1 96.88 0.676
B2 94.47 0.619
B3 92.83 0.631
B4 94.04 0.597
B5 96.34 0.565
B6 96.41 0.632
B7 97.01 0.63
B8 93.04 0.655
B9 91.3 0.598
B10 93.8 0.592
B11 92.56 0.619
B12 95.03 0.684
B13 93.82 0.641
B14 97.74 0.644
B15 83.77 0.55
B16 100 0.606
B17 92.71 0.542
B18 92.66 0.574
B19 95.5 0.632
B20 95.65 0.648
0.4
0.5
0.6
0.7
0.8
80 85 90 95 100 105
Low - High
High - Low
High - High
Low - Low
B15
B5
B16
Scatter Plot
and
Relationship
Bivariate Classification

C. Time-series Analysis
It concerns methods with two variables, one of which is time. In
this, sequence of data in time is explored for order/randomness.
Hence, the situation is conceived as a continuously varying curve
in which search for an ordered regularity is made and tested.
Trend Analysis (Regression and Curve-fitting)
1. Linear
2. Moving Average
3. Polynomial
4. Power
5. Logarithmic and
6. Exponential
Goodness-of-fit, Confidence Limits

Year y
1998 11.82
1999 11.91
2000 10.36
2001 17.16
2002 28.5
2003 33.27
2004 6.51
2005 7.39
2006 5.93
2007 22.44
2008 19.52
2009 10.57
2010 12.39
2011 14.37
2012 36.44
2013 35.37
2014 28.86
2015 25.59
2016 27.16
2017 14.54
0
5
10
15
20
25
30
35
40
1995 2000 2005 2010 2015 2020
3-year MA
y = 10218e-1.4593x
R2
= 0.93
1
10
100
1000
10000
0 1 2 3 4 5 6 7
x y
1 1211
2 540
3 225
4 95
5 4
6 1
x y
1 55555555
2 4432891
3 342515
4 221222
5 218765
6 217543
7 212897
8 211897
9 200786
10 190887
1 188765
12 178654
13 156543
14 111997
15 99675
16 77654
17 44567
18 11234
19 7699
20 5000
Y = 9000000 x -1.9243
R2
= 0.77
1000
10000
100000
1000000
10000000
1 10 100
Exponential Regression
Model Power Regression Model

D. Directional Analysis
It concerns data, measured in terms of angles (azimuth). Directions can be
easily ordinated on a circle: hence, the term, circular statistics. Such data
are of two types ―
1. Directional (sensu stricto) data and
2. Oriented data.
These can be analyzed for order and randomness, because orientations of
geographical features on the earth’s surface are believed to be the result of
specific geographical processes.
A circular distribution is one whose total probability is concentrated on the
circumference of a unit circle. Two frequently used families of distributions for
circular data are:
a) von Mises Distribution and
b) Uniform Distribution
• Mean Direction
• Median Angle
• Circular Variance
• Test of Mean Resultant Length
• Test of Uniformity (χ2 - test and Rayleigh’s test)
• Test of Randomness (Rayleigh’s test and V- test)

Directional Analysis is used to:
1. Describe the nature of distribution of the pattern of orientation of a line
feature, e.g., ridges, mountain ranges, fault traces, strikes, fold traces,
valleys / streams, dunes, eskers, drumlins, etc with the help of Vector
Graphs.
2. Quantitatively describe a directional data set in terms of mean, and circular
variance.
3. Compare directional data sets between different geographical regions.
4. Test the uniformity of distribution in a directional data set.
5. Test the randomness of distribution in a directional data set

Example-1
xR = Σsin θi = 5.28665
yR = Σcos θi = -16.59260
θM = tan -1 (xR / yR ) = -17°40'
LR = 17.41440
LRM = LR / n = 0.45827
σθ = (1 – 0.45827) = 0.54173
n = 38 df = (38 – 2) = 36
LRM (Critical) = 1.01022
Thus, LRM (Computed) < LRM (Critical)
Hence, we fail to reject the null hypothesis of a single
preferred direction; however, there may be more than one
modal direction.
Example -2
n = 38
R = 38 x 0.45828 = 17.41464
Therefore, Z = (17.41464)2 / 38 = 7.98078
F
rom Table, p > 0.001. Therefore, H0 is rejected in favour of
H1, i.e., the bearings are non-randomly distributed and are
significantly clumped.

E. Network Analysis
It concerns methods with data attributed to certain geographical
features, viz., drainage and transport, both of which are linear
features and can be easily transformed topologically.
There are specific methods of drainage network analysis usually
performed to describe and explain its geomorphological
characteristics in terms of composition and structure.
1. Composition of Drainage Networks
2. Drainage Network Parameters: Bifurcation, Area and
Length Ratios
3. Building Laws of Drainage Network Composition
The transport networks are analysed for its structure, connectivity,
and accessibility.
1. Structure of Transport Network
2. Connectivity of Transport Network
3. Accessibility

F. Spatial Analysis
1. It concerns methods with geographically referenced spatial
data.
2. Three (or four) variables are analysed together, 2 (or 3) of
which are spatial co-ordinates (latitude, longitudes, with or
without altitude).
3. The remaining variable is a measurement of geographical
interest and is regarded as varying continuously over the
space.
a) Point Pattern
b) Line Pattern
c) Spatial Pattern
d) Regional Pattern
e) Spatial Relationships
f) Spatial Regression

1) Point Pattern: Direct on a 2D Map or Transformed on a 2D Map
a) Spatial Mean (Weighted and Non-weighted)
b) Spatial Median (Grades of Intensity / Density)
c) Spatial Mode (Minimum Aggregate Travel point)
d) Spatial Dispersion (Standard Distance, Standard Deviational Ellipse)
2) Line Pattern: Drainage and Transport
3) Spatial Pattern (Point Pattern)
1. Test of Uniformity: Chi Square Test, Quadrat Analysis,
2. Test of Randomness: Nearest Neighbour Analysis, Entropy Analysis
4) Regional Pattern:
1. Quantitative Mapping and Analysis (with Mean & Standard
Deviation, Variability, Standard Score),
2. Dominant & Distinctive Analysis,
3. Location Quotient, and
4. Inequality Analysis
5) Spatial Relationship: Join Count Statistics, Spatial Autocorrelation

G. Multivariate analysis:
It concerns general methods applicable to any number of
variables analysed simultaneously and is usually applied to more
(often many more) than three variables. If these are m variables,
the data may be imagined as points in m - dimensional space.
The objective is to reduce the dimensionality so that the shape of
the data scatter can be better explored for relationships between
and among the variables:
Analysis of Dependence: Correlation Matrix
Multivariate Linear Regression
Analysis of Interdependence: Principal Component Analysis
Factor Analysis
Classification: Discriminant Analysis
Cluster Analysis

Correlation Matrix
1. It is drawn to identify the significant relations between and
among variables.
2. It is used to identify significant variables in a huge GDM.
3. It can be used to build linkage diagrams among places.
Multivariate Linear Regression
1. It is solved to quantitatively determine the dependence of a
single dependant variable on a number of independent
variables.
2. The coefficients help of compare between and among
places.
3. R2 value is tested for goodness-of-fit.

Principal Component Analysis
1. It is performed to identify the Principal Components along which the
Scatter Plot in a m-dimensional space is oriented.
2. Component Scores are then found from ‘variance-covariance’ matrix.
3. The Eigen Values help to identify the significant components, and
variables that mostly explain the total variance in the data set
4. Interpretation is based on finding which variables are most strongly
correlated with each component (purely a subjective process).
7. Scatter plots of component scores help to identify the locations of
each point in the plot to see if places with high levels of a given
component tend to be clustered in a particular regions of the country.
8. Principal components are often treated as dependent variables for
regression and analysis of variance.

Factor Analysis
1. It is better performed by using Principal Component Method..
2. The Eigen Values of the Components help to identify the
significant Components, Factors and finally Variables that
significantly explain the total Variance in the data set (hence,
factor reduction: the other name).
3. Interpreting factor loadings is similar to that of the PC coefficients.
4. Communalities help to assess the adequacy of Factor Analysis.
5. The Initial Solution is fine tuned with Varimax Rotation for Final
Solution that optimizes the Eigen Values.
6. The variables whose variance is mostly explained can be
identified from the Factors – 1, 2, 3 or 4 (with Eigen Value > 1).
7. The Scree plot also helps to identify the Factors.
8. Factor Scores are computed by Regression Analysis and saved
as Variables.
9. Factor Score – 1 can be used for Univariate Classification /
Regionalisation.
10.Factor Score - 1 and 2 can be used for Bivariate Classification /
Regionalisation.

Cluster Analysis
1. It is a data exploration (mining) tool for dividing a multivariate
dataset into “natural” clusters (groups).
2. Methods to explore whether previously undefined clusters (groups)
exist in the dataset are used.
3. It is used when the sample units come from an unknown number of
distinct populations or sub-populations.
4. The main objective is to describe those populations with the
observed data.
5. Methods for measuring distances or similarities between objects
should be carefully chosen.
6. Linkage methods for measuring the distances between clusters
should be carefully chosen.
7. Based on the agglomeration schedule, dendrograms are prepared
and interpreted in terms of cluster composition.
8. It has become popular in geoinformatics, bioinformatics and genome
research.

Discriminant Analysis
1. It is a classification problem, where two or more groups or clusters or
populations are known a priori and one or more new observations are
classified into one of the known populations based on the measured
characteristics.
2. The aim is to maximally separate the groups, to determine the most
parsimonious way to separate groups, and to discard variables which are
little related to group distinctions
3. The model is composed of a discriminant function based on linear
combinations of predictor variables that provide the best discrimination
between groups.
4. It is similar to regression analysis. A discriminant score can be calculated
based on the weighted combination of the independent variables:
Di = a + b1x1 + b2x2 +…+ bnxn
Di is predicted score (discriminant score), x is predictor and b is
discriminant coefficient.
5. Maximum likelihood technique is used to assign a case to a group from a
specified cut-off score. If group size is equal, the cut-off is mean score. If
group size is not equal, the cut-off is calculated from weighted means.

Scale
System
Defining
Relation
Possible
Transformations
Central
Tendency
Dispersion Tests
Nominal
Equivalence
y = f ( x ),
where f ( x )
means any one-
to-one
substitution
Mode % in the
Mode
Non - parametric :
Chi - square,
Contingency coefficient,
Goodman-Kruskal's
Lambda,
Phi - coefficient
Ordinal
Equivalence
Greater than
y = f ( x ),
where f ( x )
means any
increasing
monotonic
transformation
Median Percentiles
Non - parametric :
Spearman's Rho
Kendall's Tau
Kolmogorov-Smirnov
Goodman-Kruskal's
Gamma
Phi – coefficient
Interval
Equivalence
Greater than known
ratio of any two
intervals
any linear
transformation :
y = a. x + b
( a > 0 )
Mean Standard
Deviation
Parametric and
Non-parametric :
t - test
F - test
Pearson's r
Point Biserial etc.
Ratio
Equivalence
Greater than Known
ratio of any two
intervals
Known ratio of any
two scale values
y = c. x
( c > 0 ) all all
Parametric and
Non-parametric :
t - test
F - test
Pearson's r
Point Biserial etc.
Scale of Measurement and Statistical Measures and Tests

Measurement
Scale
Ratio, Interval,
Ordinal,
Nominal Data
Ratio, Interval,
Ordinal Data
Ratio, Interval
Data
Ratio Data
1-Component Case
- f, Mo Me, Px Mean, Variance GM, HM, CV
2-Component Case
Nominal Scale
Data
Ordinal Scale
Interval + Ratio
Scale
Chi Square
Contingency
Coefficient
U Test
Spearman’s rs
Kendall’s τ
Comparison of Mean
Comparison of Variance
Pearson’s r
Linear and Non-Linear
Regression
Multi-Component Case
Interval + Ratio
Scale
Multiple Variance Analysis
Co-Variance Analysis
Multiple Correlation
Multiple Regression
Statistical Tests / Methods and Measurement Scale

The presentation so far helps us to conclude that –
1. Statistics (descriptive, numerial and inferential) has a great role to play
in “Analysis in Geographical Investigation.
2. As Geography is a Spatial Science, techniques of “spatial analysis” have
been utmost important in the contemporary period.
3. With the advent of GIS and RS Software, it has become easier to
perform the most laborious and complicated manipulations with just a
click o the mouse.
4. As geographical events have multivariate relations, multivariate analysis
are often applied for exploratory studies and are increasingly used in
recent days with the increasing availability of statistical packages. .
1. QGIS, uDIG, ILWIS, GeoDa, GvSIG, gvSIG , SAGA GIS, MapWindow, Diva
GIS, OrbisGIS, +
2. Landserf, TAS, TerraView, CatchmentSim, JT Maps, +
3. R, Statistica, SPSS, +.
4. Thus, there’s a bucket load of free GIS software that can:
a) perform hundreds of advanced GIS processing tasks.
b) generate stunning cartography and mapping products.
c) manage your company’s geospatial assets efficiently

A Word of Caution:
1. Collect data strictly as per the need to test your hypothesis.
2. Store your data in the form of a GDM.
3. Choose carefully, the particular method of statistical
technique considering the sample size, data type and data
quality.
4. Use the most suited technique to test the significance of the
results.
5. Represent all needed using most suited Techniques of
Cartographic Visualization.
6. Apply your geographical mind for Scientific Explanation of
your endevour.
Therefore, Statistical Techniques make sense and
enhance the quality of geographical investigation
in building sound information base, formulating
and testing hypotheses, and extracting universal

List of Further Reading
1. Statistical analysis in geography is unique as it concerns
‘spatial’ or ‘geographically referenced’ data (with co-ordinates).
2. The variety of techniques being almost infinite, the GDA has to
pick and choose the best suited one for his specific job.
3. Today, there are software almost for every single purpose of a
geographer. And any enterprising researcher can explore the
internet and download for himself.
4. Its presentation requires the ‘application of the most suited’
cartographic techniques.
5. Its interpretation needs the ‘wisest use of geographical
principles’ leading to the scientific geographical explanation.

1.Barber, G. M. (1988): Elementary Statistics for Geographers, The Guilford
Press, London
2.Berry, B. J. L. and Marble, D. F. (ed. 1968): Spatial Analysis - a reader in
statistical geography, Englewood Cliff, NJ
3.Clark, W. A. V. and P. Hosking (1986): Statistical Methods for Geographers,
John Wiley, NY
4.Cressie, N. A. C. (1993): Statistics for Spatial Data, John Wiley & Sons, NY
5.Ebdon, D. (1977): Statistics in Geography - a practical approach, Basil
Blackwell, Oxford
6.Gregory, S. (1963): Statistical Methods and the Geographer, Longman,
London
7.Haggett, P., A. W. Cliff and A. Frey (1977): Locational Methods, Vol-I & II,
Edward Arnold, London
8.Hammond, R. and McCullagh, P. S. (1974): Quantitative Techniques in
Geography, Claredon Press, Oxford
9.Harvey, D. H. (1969): Explanation in Geography, Edward Arnold Pub., London
10.Johnston, R. J. (1978): Multivariate Statistical Analysis in Geography, New
York : London
11.King, L. J. (1969): Statistical Analysis in Geography, Englewood Cliffs, NJ :
Prentice Hall
12.Kitanidis, P. K. (1997): Introduction to Geostatistics, Cambridge University
Press

13.Limb, M. and Dwyer, C (ed. 2001): Qualitative Methodologies for Geographers,
London: Arnold
14.Lindsay, J. M. (1997): Techniques in Human Geography, Routledge
15.Matthews. M. H. and Foster, I. D. L. (1989) : Geographical Data - sources,
presentation and analysis, OUP
16.O’Brien, L (1992): Introducing Quantitative Geography, Routledge, London
17.Ripley, B. D. (1981): Spatial Statistics, Wiley, NY
18.Robinson, G. (1998): Methods and Techniques in Human Geography, Wiley, NY
19.Rogerson, P. A. (2001): Statistical Methods for Geography, Sage, London
20.Shaw, R. L. and Wheeler, D. (1985): Statistical Techniques in Geographical Analysis,
John Wiley & Sons, NY
21.Streich, T. A. (1986): Geographic Data Processing – an overview, California Univ.
Press, Santa Barbara
22.Taylor, P. J. (1977): Quantitative Methods in Geography – an introduction to spatial
analysis, Houghton Mifflin, Boston
23.Unwin, D. (1981): Introductory Spatial Analysis, New York : Methuen
24.Walford, N. (2002): Geographical Data – characteristics and sources, Wiley, NJ
25.Worthington, B. D. R. and R. Gont (1975): Techniques in Map Analysis, McMillan Ltd,
London
26.Wrigley, N. and Bennett, R. J. (ed.1981): Quantitative Geography, Methuen, London

Thank You
Ethics in Statistical Geography
1. Be Honest with Data
Enumeration, Measurement and
Collection
2. Be Wise while selecting the
Statistical Technique(s) for your
intended Purpose
3. Explore the Results, observe the
Geographical Associations and
go for the Statistical Inferences
4. Be Precise and very Simple while
translating the “Language of
Statistics” into the “Language of
Geography”
Looking for a Publication in a Peer
Reviewed Journal?
Visit: www.indiansss.org
Indian Journal of Spatial Science
Contact:
editorijss2012@gmail.com
+91 98365 52173

Adamas university2018 f

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Adamas university2018 f

Similar a Adamas university2018 f (20)

Más de Prof Ashis Sarkar

Más de Prof Ashis Sarkar (20)

Último

Último (20)

Adamas university2018 f