SlideShare una empresa de Scribd logo
1 de 183
Descargar para leer sin conexión
Dr. Mohan Kumar, T. L. 1
Chapter: 1 INTRODUCTION
1.1. Introduction:
In the modern world of computer and information technology, the importance of
statistics is very well recognized by all the disciplines. Statistics has originated as a
science of statehood and found applications slowly and steadily in Agriculture,
Economics, Commerce, Biology, Medicine, Industry, Planning, Education and so on.
The word statistics in our everyday life means different things to different people.
For a layman, ‘Statistics’ means numerical information expressed in quantitative terms.
A student knows statistics more intimately as a subject of study like economics,
mathematics, chemistry, physics and others. It is a discipline, which scientifically deals
with data, and is often described as the science of data. For football fans, statistics are
the information about rushing yardage, passing yardage, and first downs, given a
halftime. To the manager of power generating station, statistics may be information
about the quantity of pollutants being released into the atmosphere and power
generated. For school principal, statistics are information on the absenteeism, test
scores and teacher salaries. For medical researchers, investigating the effects of a new
drug and patient dairy. For college students, statistics are the grades list of different
courses, OGPA, CGPA etc... Each of these people is using the word statistics correctly,
yet each uses it in a slightly different way and somewhat different purpose.
The term statistics is ultimately derived from the Latin word Status or
Statisticum Collegium (council of state), the Italian word Statista ("statesman”), and
The German word Statistik, which means Political state.
Father of Statistics is Sir R. A. Fisher (Ronald Aylmer Fisher). Father of Indian
Statistics is P.C. Mahalanobis (Prasanth Chandra Mahalanobis)
1.2 Meaning of Statistics:
The word statistics used in two senses, one is in Singular and the other is in
Plural.
a) When it is used in singular: It means ‘Subject’ or Branch of Science, which deals with
Scientific method of collection, classification, presentation, analysis and interpretation
of data obtained by sample survey or experimental studies, which are known as the
statistical methods.
When we say ‘apply statistics’, it means apply the statistical methods to analyze
and interpretation of data.
b) When it is used in plural: Statistics is a systematic presentation of facts and figures.
The majority of people use the word statistics in this context. They only meant simply
Dr. Mohan Kumar, T. L. 2
facts and figures. These figures may be with regard to production of food grains in
different years, area under cereal crops in different years, per capita income in a
particular state at different times etc., and these are generally published in trade
journals, economics and statistics bulletins, annual report, technical report, news
papers, etc.
1.3 Definition of Statistics:
Statistics has been defined differently by different authors from time to time. One
can find more than hundred definitions in the literature of statistics.
“Statistics may be defined as the science of collection, presentation, analysis and
interpretation of numerical data from the logical analysis”. -Croxton and
Cowden
“The science of statistics is essentially a branch of applied mathematics and
may be regarded as mathematics applied to observational data”.
-R. A. Fisher
“Statistics is the branch of science which deals with the collection, classification
and tabulation of numerical facts as the basis for explanations, description and
comparison of phenomenon”
-Lovitt
A.L. Bowley has defined statistics as: (i) Statistics is the science of counting, (ii)
Statistics may rightly be called the Science of averages, and (iii) Statistics is the science
of measurement of social organism regarded as a whole in all its manifestations.
“Statistics is a science of estimates and probabilities”
-Boddington
In general:
Statistics is the science which deals with the,
(i) Collection of data
(ii) Organization of data
(iii) Presentation of data
(iv) Analysis of data &
(v) Interpretation of data.
1.4 Types of Statistics:
There are two major divisions of statistics such as descriptive statistics and
inferential statistics.
i) Descriptive statistics is the branch of statistics that involves the collecting,
organization, summarization, and display of data.
Dr. Mohan Kumar, T. L. 3
ii) Inferential statistics is the branch of statistics that involves drawing conclusions
about the population using sample data. A basic tool in the study of inferential statistics
is probability.
1.5 Nature of Statistics:
Statistics is Science as well as an Art.
Statistics as a Science: Statistics classified as Science because of its characteristics
as follows
1. It is systematic body of studying knowledge.
2. Its methods and procedure are definite and well organized.
3. It analyzes the cause and effect relationship among variables.
4. Its study is according to some rules and dynamism.
Statistics as an Art: Statistics is considered as an art because it provides methods to
use statistical laws in solving problems. Also application of statistical methods
requires skill and experience of the investigator.
1.6 Aims of statistics: Objective of statistics is
1. To study the population.
2. To study the variation and its causes.
3. To study the methods for reducing data/ summarization of data.
1.7 Functions of statistics:
The important functions of statistics are given as follows:
1) To express the facts and statements numerically or quantitatively.
2) To Condensation/simplify the complex facts.
3) To use it as a technique for making comparisons.
4) To establish the association and relationship between different groups.
5) To Estimate the present facts and forecasting future.
6) To Tests of Hypothesis.
7) To formulate the policies and measures their impacts.
1.8 Scope/ Application of Statistics
In modern times, the importance of statistics increased and applied in every sphere
of human activities. Statistics plays an important role in our daily life, it is useful in
almost all science such as social, biological, psychology, education, economics,
business management, agricultural sciences, information technology etc...The
statistical methods can be and are being used by both educated and uneducated
people. In many instances we use sample data to make inferences about the entire
Dr. Mohan Kumar, T. L. 4
population.
1) Statistics is used in administration by the Government for solving various problems.
Ex: price control, birth-death rate estimation, farming policies related to import,
export and industries, assessment of pay and D.A., preparation of budget etc..
2) Statistics are indispensable in planning and in making decisions regarding export,
import, and production etc., Statistics serves as foundation of the super structure of
planning.
3) Statistics helps the business man in formulation of polices with regard to business.
Statistical methods are applied in market research to analyze the demand and
supply of manufactured products and fixing its prices.
4) Bankers, stock exchange brokers, insurance companies etc.. make extensive use of
statistical data. Insurance companies make use of statistics of mortality and life
premium rates etc., for bankers, statistics help in deciding the amount required to
meet day to day demands.
5) Problems relating to poverty, unemployment, food storage, deaths due to diseases,
due to shortage of food etc., cannot be fully weighted without the statistical balance.
Thus statistics is helpful in promoting human welfare.
6) Statistics is widely used in education. Research has become a common feature in all
branches of activities. Statistics is necessary for the formulation of policies to start
new course, consideration of facilities available for new courses etc.
7) Statistics are a very important part of political campaigns as they lead up to
elections. Every time a scientific poll is taken, statistics are used to calculate and
illustrate the results in percentages and to calculate the margin for error.
8) In Medical sciences, statistical tools are widely used. Ex: in order to test the
efficiency of a new drug or medicine. To study the variability character like Blood
Pressure (BP), pulse rate, Hb %, action of drugs on individuals. To determine the
association between diseases with different attributes such as smoking and cancer.
To compare the different drug or dosage on living beings under different conditions.
In agricultural research, Statistical tools have played a significant role in the analysis
and interpretation of data.
1) Analysis of variance (ANOVA) is one of the statistical tools developed by Professor
R.A. Fisher, plays a prominent role in agriculture experiments.
2) In making data about dry and wet lands, lands under tanks, lands under irrigation
projects, rainfed areas etc...
3) In determining and estimating the irrigation required by a crop per day, per base
Dr. Mohan Kumar, T. L. 5
period.
4) In determining the required doses of fertilizer for a particular crop and crop land.
5) In soil chemistry, statistics helps in classifying the soils based on Ph content,
texture, structures etc...
6) In estimating the yield losses incurred by particular pest, insect, bird, or rodent etc...
7) Agricultural economists use forecasting procedures to estimation and demand and
supply of food and export & import, production
8) Animal scientists use statistical procedures to aid in analyzing data for decision
purposes.
9) Agricultural engineers use statistical procedures in several areas, such as for
irrigation research, modes of cultivation and design of harvesting and cultivating
machinery and equipment.
1.9 Limitations of Statistics:
1) Statistics does not study qualitative phenomenon, i.e. it study only quantitative
phenomenon.
2) Statistics does not study individual or single observation; in fact it deals with only an
aggregate or group of objects/individuals.
3) Statistics laws are not exact laws; they are only approximations.
4) Statistics is liable to be misused.
5) Statistical conclusions are valid only on average base. i.e. Statistics results are not
100 per cent correct.
6) Statistics does not reveal the entire information. Since statistics are collected for a
particular purpose, such data may not be relevant or useful in other situations or
cases.
Dr. Mohan Kumar, T. L. 6
Chapter 2: BASIC TERMINOLOGIES
2.1 Data: Numerical observations collected in systematic manner by assigning numbers
or scores to outcomes of a variable(s).
2.2 Raw Data: Raw data is originally collected or observed data, and has not been
modified or transformed in any way. The information collected through censuses,
sample surveys, experiments and other sources are called a raw data.
2.3 Types of data according to source:
There are two types of data
1. Primary data
2. Secondary data.
2.3.1 Primary data: The data collected by the investigator him-self/ her-self for a
specific purpose by actual observation or measurement or count is called primary data.
Primary data are those which are collected for the first time, primarily for a particular
study. They are always given in the form of raw materials and originals in character.
Primary data are more reliable than secondary data. These types of data need the
application of statistical methods for the purpose of analysis and interpretation.
Methods of collection of primary data
Primary data is collected in any one of the following methods
1. Direct personal interviews.
2. Indirect oral interviews
3. Information from correspondents.
4. Mailed questionnaire method.
5. Schedules sent through enumerators.
6. Telephonic Interviews, etc...
2.3.2 Secondary data The data which are compiled from the records of others is
called secondary data. The data collected by an individual or his agents is primary data
for him and secondary data for all others. Secondary data are those which have gone
through the statistical treatment. When statistical methods are applied on primary data
then they become secondary data. They are in the shape of finished products. The
secondary data are less expensive but it may not give all the necessary information.
Secondary data can be compiled either from published sources or unpublished sources.
Sources of published data
1. Official publications of the central, state and local governments.
2. Reports of committees and commissions.
3. Publications brought about by research workers and educational associations.
Dr. Mohan Kumar, T. L. 7
4. Trade and technical journals.
5. Report and publications of trade associations, chambers of commerce, bank
etc.
6. Official publications of foreign governments or international bodies like U.N.O,
UNESCO etc.
Sources of unpublished data: All statistical data are not published. For example, village
level officials maintain records regarding area under crop, crop production etc... They
collect details for administrative purposes. Similarly details collected by private
organizations regarding persons, profit, sales etc become secondary data and are used
in certain surveys.
Characteristics of secondary data
The secondary data should posses the following characteristics. They should be
reliable, adequate, suitable, accurate, complete and consistent.
2.3.3 Difference between primary and secondary data
Primary data Secondary
The data collected by the investigator
him-self/ her-self for a specific purpose
The data which are compiled from the
records of others is called secondary data.
Primary data are those data which are
collected from the primary sources.
Secondary data are those data which are
collected from the secondary sources.
Primary data are original because
investigator himself collects them.
Secondary data are not original. Since
investigator makes use of the other
agencies.
If these data are collected accurately and
systematically, their suitability will be very
positive.
These might or might not suit the objects
on enquiry.
The collection of primary data is more
expensive because they are not readily
available.
The collection of secondary data is
comparatively less expensive because they
are readily available.
It takes more time to collect the data. It takes less time to collect the data.
These are no great need of precaution
while using these data.
These should be used with great care and
caution.
Dr. Mohan Kumar, T. L. 8
More reliable & accurate Less reliable & accurate
Primary data are in the shape of raw
material.
Secondary data are usually in the shape of
readymade/finished products.
Possibility of personal prejudice. Possibility of lesser degree of personal
prejudice.
Dr. Mohan Kumar, T. L. 9
Grouped data: When the data range vary widely, that data values are sorted and grouped
into class intervals, in order to reduce the number of scoring categories to a
manageable level, Individual values of the original data are not retained. Ex: 0-10, 11-20,
21-30
Ungrouped data: Data values are not grouped into class intervals in order to reduce the
number of scoring categories, they have kept in their original form. Ex: 2, 4, 12, 0, 3, 54,
etc..
2.4 Variable:
A variable is a description of a quantitative or qualitative characteristic that
varies from observation to observation in the same group and by measuring them we
can present more than one numerical values.
Ex: Daily temperature, Yield of a crop, Nitrogen in soil, height, color, sex.
2.4.1 Observations (Variate):
The specific numerical values assigned to the variables are called observations.
Ex: yield of a crop is 30 kg.
2.5 Types of Variables
Variable
Quantitative Variable (Data) Qualitative Variable (Data)
Continuous Variable (Data) Discrete Variable (Data)
2.5.1 Quantitative Variable & Qualitative variable
Quantitative Variable:
A quantitative variable is variable which is normally expressed numerically
because it differs in degree rather than kind among elementary units.
Ex: Plant height, Plant weight, length, no of seeds per pod, leaf dry weights, etc...
Qualitative Variable:
A variable that is normally not expressed numerically because it differs in kind
rather than degree among elementary units. The term is more or less synonymous with
categorical variable. Some examples are hair color, religion, political affiliation,
nationality, and social class.
Ex: Intelligence, beauty, taste, flavor, fragrance, skin colour, honesty, hard work
etc...
Attributes:
The qualitative variables are termed as attributes. The qualitatively distinct
characteristics such as healthy or diseased, positive or negative. The term is often
Dr. Mohan Kumar, T. L. 10
applied to designate characteristics that are not easily expressed in numerical terms.
Quantitative data:
Data obtained by using numerical scales of measurement or on quantitative
variable. These are data in numerical quantities involving continuous measurements or
counts. In case of quantitative variables the observations are made in terms of Kgs,
quintals, Liter, Cm, meters, kilometers etc...
Ex: Weight of seeds, height of plants, Yield of a crop, Available nitrogen in a soil,
Number of leaves per plant.
Qualitative data:
When the observations are made with respect to qualitative variable is called
qualitative data.
Ex: Crop varieties, Shape of seeds, soil type, taste of food, beauty of a person,
intelligence of students etc...
2.5.2 Continuous variable & Discrete variable (Discontinuous variable)
Continuous variable & Continuous data:
Continuous variables is a variables which assumes all the (any) values (integers
as well as fractions) in a given range. A continuous variable is a variable that has
an infinite number of possible values within a range.
If the data are measured on continuous variable, then the data obtained is
continuous data.
Ex: Height of a plant, Weight of a seed, Rainfall, temperature, humidity, marks of
students, income of the individual etc..
Discrete (Discontinuous) variable and discrete data:
A variables which assumes only some specified values i.e. only whole numbers
(integers) in a given range. A discrete variable can assume only a finite or, at most
countable number of possible values. As the old joke goes, you can have 2 children or 3
children, but not 2.37 children, so “number of children” is a discrete variable.
If the data are measured on discrete variable, then the data obtained is discrete
data.
Ex: Number of leaves in a plant, Number of seeds in a pod, number of students,
number of insect or pest,
2.6 Population:
The aggregate or totality of all possible objects possessing specified
characteristics which is under investigation is called population. A population consists
of all the items or individuals about which you want to reach conclusions. A population
is a collection or well defined set of individual/object/items that describes some
Dr. Mohan Kumar, T. L. 11
phenomenon of study of your interest.
Ex: Total number of students studying in a school or college,
total number of books in a library,
total number of houses in a village or town.
In statistics, the data set is the target group of your interest is called a
population. Notice that, a statistical population does not refer to people as in our
everyday usage of the term; it refers to a collection of data.
2.6.1 Census (Complete enumeration):
When each and every unit of the population is investigated for the character
under study, then it is called Census or Complete enumeration.
2.6.2 Parameter:
A parameter is a numerical constant which is measured to describe the
characteristic of a population. OR
A parameter is a numerical description of a population characteristic.
Generally Parameters are not know and constant value, they are estimated from sample
data.
Ex: Population mean (denoted as μ), population standard deviation (σ),
Population ratio, population percentage, population correlation coefficient (()
etc...
2.7 Sample:
A small portion selected from the population under consideration or fraction of
the population is known as sample.
2.7.1 Sample Survey:
When the part of the population is investigated for the characteristics under
study, then it is called sample survey or sample enumeration.
2.7.2 Statistic:
A statistic is a numerical quantity that measured to describes the characteristic
of a sample. OR
A Statistic is a numerical description of a sample characteristics.
Ex: Sample Mean ( ), Sample Standard. Deviation (s), sample ratio, sample
̅
X
proportionate etc..
2.8 Nature of data: It may be noted that different types of data can be collected for
different purposes. The data can be collected in connection with time or geographical
location or in connection with time and location. The following are the three types of
Dr. Mohan Kumar, T. L. 12
data:
1. Time series data. 2. Spatial data 3. Spacio-temporal data.
Time series data: It is a collection of a set of numerical values collected and arranged
over sequence of time period. The data might have been collected either at regular
intervals of time or irregular intervals of time. Ex: The data may be year wise rainfall in
Karnataka, Prices of milk over different months
Spatial Data: If the data collected is connected with that of a place, then it is termed as
spatial data. Ex: The data may be district wise rainfall in karnataka, Prices of milk in
four metropolitan cities.
Spacio-Temporal Data: If the data collected is connected to the time as well as place
then it is known as spacio-temporal data. Ex: Data on Both year & district wise rainfall
in Karnataka, Monthly prices of milk over different cities.
Chapter 3: CLASSIFICATION
3.1 Introduction
The raw data or ungrouped data are always in an unorganized form, need to be
organized and presented in meaningful and readily comprehensible form in order to
facilitate further statistical analysis. Therefore, it is essential for an investigator to
condense a mass of data into more and more comprehensible and digestible form.
3.2 Definition:
Classification is the process by which individual items of data are arranged in
different groups or classes according to common characteristics or resemblance or
similarity possessed by the individual items of variable under study.
Ex: 1) For Example, letters in the post office are classified according to their
destinations viz., Delhi, Chennai, Bangalore, Mumbai etc...
2) Human population can be divided in to two groups of Males and Females, or
into two groups of educated and uneducated persons.
3) Plants can be arranged according to their different heights.
Remarks: Classification is done on the basis of single characteristic is called one-way
classification. If the classification is done on the basis two characteristics is called
two-way classification. Similarly if the classification is done on the basis of more than
two characteristic is called multi-way or manifold classification.
3.3 Objectives /Advantages/ Role of Classification:
The following are main objectives of classifying the data:
1. It condenses the mass/bulk data in an easily understandable form.
2. It eliminates unnecessary details.
Dr. Mohan Kumar, T. L. 13
3. It gives an orderly arrangement of the items of the data.
3. It facilitates comparison and highlights the significant aspect of data.
4. It enables one to get a mental picture of the information and helps in drawing
inferences.
5. It helps in the tabulation and statistical analysis.
3.4 Types of classification:
Statistical data are classified in respect of their characteristics. Broadly there are
four basic types of classification namely
1) Chronological classification or Temporal or Historical Classification
2) Geographical classification (or) Spatial Classification
3) Qualitative classification
4) Quantitative classification
1) Chronological classification:
In chronological classification, the collected data are arranged according to the
order of time interval expressed in day, weeks, month, years, etc.,. The data is generally
classified in ascending order of time.
Ex: the data related daily temperature record, monthly price of vegetables, exports and
imports of India for different year.
Total Food grain production of India for different time periods.
Year Production (million tonnes)
2005-06
2006-07
2007-08
2008-09
208.60
217.28
230.78
234.47
2) Geographical classification:
In this type of classification, the data are classified according to geographical
region or geographical location (area) such as District, State, Countries, City-Village,
Urban-Rural, etc...
Ex: The production of paddy in different states in India, production of wheat in different
countries etc...
State-wise classification of production of food grains in India:
State Production (in tonnes)
Orissa
A.P
3,00,000
2,50,000
Dr. Mohan Kumar, T. L. 14
U.P
Assam
22,00,000
10,000
3) Qualitative classification:
In this type of classification, data are classified on the basis of attributes or
quality characteristics like sex, literacy, religion, employment social status, nationality,
occupation etc... such attributes cannot be measured along with a scale.
Ex: If the population to be classified in respect to one attribute, say sex, then we can
classify them into males and females. Similarly, they can also be classified into
‘employed’ or ‘unemployed’ on the basis of another attribute ‘employment’, etc...
Qualitative classification can be of two types as follows
(i) Simple classification (ii) Manifold classification
i) Simple classification or Dichotomous Classification:
When the classification is done with respect to only one attribute, then it is called
as simple classification. If the attributes is dichotomous (two outcomes) in nature, two
classes are formed, one possessing the attribute and the other not possessing that
attribute. This type of classification is called dichotomous classification.
Ex: Population can be divided in to two classes according to sex (male and female) or
Income (poor and rich).
Population Population
Male Female Rich Poor
ii) Manifold classification:
The classification where two or more attributes are considered and several
classes are formed is called a manifold classification.
Ex: If we classify population simultaneously with respect to two attributes, Sex and
Education, then population are first classified into ‘males’ and ‘females’. Each of these
classes may then be further classified into ‘educated’ and ‘uneducated’.
Still the classification may be further extended by considering other attributes
like income status etc. This can be explained by the following chart
Population
Male Female
Educated Uneducated Educated Uneducated
Rich Poor Rich Poor Rich Poor Rich Poor
4) Quantitative classification:
Dr. Mohan Kumar, T. L. 15
In quantitative classification the data are classified according to quantitative
characteristics that can be measured numerically such as height, weight, production,
income, marks secured by the students, age, land holding etc...
Ex: Students of a college may be classified according to their height as given in the
table
Height(in cm) No of students
100-125
125-150
150-175
175-200
20
25
40
15
Dr. Mohan Kumar, T. L. 16
Chapter: 4 TABULATION
4.1 Meaning & Definition:
A table is a systematic arrangement of data in columns and rows.
Tabulation may be defined as the systematic arrangement of classified
numerical data in rows or/and columns according to certain characteristics. It
expresses the data in concise and attractive form which can be easily understood and
used to compare numerical figures, and an investigator is quickly able to locate the
desired information and chief characteristics.
Thus, a statistical table makes it possible for the investigator to present a huge
mass of data in a detailed and orderly form. It facilitates comparison and often reveals
certain patterns in data which are otherwise not obvious. Before tabulation data are
classified and then displayed under different columns and rows of a table.
4.2 Difference between classification and tabulation:
∙ Classification is a process of classifying or grouping of raw data according to their
object, behavior, purpose and usages. Tabulation means a logical arrangement of
data into rows and columns.
∙ Classification is the first step to arrange the data, whereas tabulation is the second
step to arrange the data.
∙ The main object of the classification to condense the mass of data in such a way
that similarities and dissimilarities can be readily find out, but the main object of
the tabulation is to simplify complex data for the purpose of better comparison.
4.3 Objectives /Advantages/ Role of Tabulation:
Statistical data arranged in a tabular form serve following objectives:
1) It simplifies complex data to enable us to understand easily.
2) It facilitates comparison of related facts.
3) It facilitates computation of various statistical measures like averages,
dispersion, correlation etc...
4) It presents facts in minimum possible space, and unnecessary repetitions &
explanations are avoided. Moreover, the needed information can be easily
located.
5) Tabulated data are good for references, and they make it easier to present the
information in the form of graphs and diagrams.
4.4 Disadvantage of Tabulation:
1) The arrangement of data by row and column becomes difficult if the person does
Dr. Mohan Kumar, T. L. 17
not have the required knowledge.
2) Lack of description about the nature of data and every data can’t be put in the
table.
3) No one section given special emphasis in tables.
4) Table figures/data can be misinterpreted.
3.5 Ideal Characteristics/ Requirements of a Good Table:
A good statistical table is such that it summarizes the total information in an easily
accessible form in minimum possible space.
1) A table should be formed in keeping with the objects of statistical enquiry.
2) A table should be easily understandable and self explanatory in nature.
3) A table should be formed so as to suit the size of the paper.
4) If the figures in the table are large, they should be suitably rounded or
approximated. The units of measurements too should be specified.
5) The arrangements of rows and columns should be in a logical and systematic
order. This arrangement may be alphabetical, chronological or according to size.
6) The rows and columns are separated by single, double or thick lines to represent
various classes and sub-classes used.
7) The averages or totals of different rows should be given at the right of the table
and that of columns at the bottom of the table. Totals for every sub-class too
should be mentioned.
8) Necessary footnotes and source notes should be given at the bottom of table
9) In case it is not possible to accommodate all the information in a single table, it is
better to have two or more related tables.
4.6 Parts or component of a good Table:
The making of a compact table itself an art. This should contain all the
information needed within the smallest possible space
An ideal Statistical table should consist of the following main parts:
1. Table number 5. Stubs or row designation
2. Title of the table 6. Body of the table
3. Head notes ` 7. Footnotes
4. Captions or column headings 8. Sources of data
1. Table Number: A table should be numbered for easy reference and identification. The
table number may be given either in the center at the top above the title or just before
the title of the table.
2. Table Title: Every table must be given a suitable title. The title is a description of the
Dr. Mohan Kumar, T. L. 18
contents of the table. The title should be clear, brief and self explanatory. The title
should explain the nature and period data covered in the table. The title should be
placed centrally on the top of a table just below the table number (or just after table
number in the same line).
Schematic representation of table
Table No. : Table title
Head notes
Stub
Headings
Caption Row Total
Sub Head 1 Sub Head 2
Column
Head
Column
Head
Column Head Column
Head
Stubs entries
Body
............
...........
..........
Column Total GrandTotal
Foot notes
Source notes
3. Head note: It is used to explain certain points relating to the table that have not been
included in the title nor in the caption or stubs. For example the unit of measurement is
frequently written as head note such as ‘in thousands’ or ‘in million tonnes’ or ‘in crores’
etc...
4. Captions or Column Designation: Captions in a table stands for brief and self
explanatory headings of vertical columns. Captions may involve headings and
sub-headings as well.
Usually, a relatively less important and shorter classification should be tabulated in the
columns.
5. Stubs or Row Designations: Stubs stands for brief and self explanatory headings of
Dr. Mohan Kumar, T. L. 19
horizontal rows. Normally, a relatively more important classification is given in rows.
Also a variable with a large number of classes is usually represented in rows.
6. Body: The body of the table contains the numerical information. This is the most vital
part of the table. Data presented in the body arranged according to the description or
classification of the captions and stubs.
7. Footnotes: If any item has not been explained properly, a separate explanatory note
should be added at the bottom of the table. Thus, they are meant for explaining or
providing further details about the data that have not been covered in title, captions and
stubs.
8. Sources of data: At the bottom of the table a note should be added indicating the
primary and secondary sources from which data have been collected. This may
preferably include the name of the author, volume, page and the year of publication.
Dr. Mohan Kumar, T. L. 20
4.7 Types of Tabulation:
Tables may broadly classify into three categories.
I On the basis of no of character used/ Construction:
1) Simple tables 2) Complex tables
II On the basis of object/purpose:
1) General purpose/Reference tables 2) Special purpose/Summary tables.
III On the basis of originality
1) Primary or original tables 2) Derived tables
I On the basis of no of character used/ Construction:
The distinction between simple and complex table is based on the number of
characteristics studied or based on construction.
1) Simple table: In a simple table only one character data are tabulated. Hence this type
of table is also known as one-way or first order table.
Ex: Population of country in different state
2) Complex table: If
there two or more than two characteristics are tabulated in a table then it is called as
complex table. It is also called manifold table. When only two characteristics are shown
such a table is known as two-way table or double tabulation.
Ex: Two-way table: Population of country in different state and sex-wise
Whe n
three or more characteristics are represented in the same table is called three-way
tabulation. As the number of characteristics increases, the tabulation becomes so
complicated and confusing.
Ex: Triple table (three way table): Population of country in different State according to
State Population
KA
AP
MP
UP
-
-
-
-
Total -
State Population Total
Males Females
KA
AP
MP
UP
-
-
-
-
-
-
-
-
-
-
-
-
Total - - -
Dr. Mohan Kumar, T. L. 21
Sex and Education
Ex: Manifold (Multi way table):
When the data are classified according to more than three characters and
tabulated.
States Status
Population
Total
Male Female
Educate
d
Un
educate
d
Sub-total Educate
d
Un
educated
Sub-total Educate
d
Un
educated
Total
UP
Rich
Poor
Subtota
l
MP
Rich
Poor
Subtota
l
Total
II On the basis of object/purpose:
1) General tables: General purpose tables sometimes termed as reference tables or
information tables. These tables provide information for general use of reference. They
usually contain detailed information and are not constructed for specific discussion.
These tables are also termed as master tables.
Ex: The detailed tables prepared in census reports belong to this class.
2) Special purpose tables: Special purpose tables also known as summery tables which
provide information for particular discussion. These tables are constructed or derived
from the general purpose tables. These tables are useful for analytical and comparative
studies involving the study of relationship among variables.
Ex: Calculation of analytical statistics like ratios, percentages, index numbers, etc is
incorporated in these tables.
State Population Total
Males Females
Educated Uneducate
d
Educated Uneducate
d
KA
AP
MP
UP
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Total - - - - -
Dr. Mohan Kumar, T. L. 22
III On the basis of originality: According to nature of originality of data
1) Primary or original tables: This table contains statistical facts in their original form.
Figures in these types of tables are not rounded up, but original, actual & absolute in
natures.
Ex: Time series data recorded on rainfall, foodgrain production etc.
2) Derived tables: This table contains total, ratio, percentage, etc... derived from original
tables. It expresses the derived information from original tables.
Ex: Trend values, Seasonal values, cyclical variation data.
Chapter: 5 FREQUENCY DISTRIBUTIONS
5.1 Introduction:
Frequency is the number of times a given value of an observation or character or
a particular type of event has appeared/repeated/occurred in the data set.
Frequency distribution is simply a table in which the data are grouped into
different classes on the basis of common characteristics and the numbers of cases
which fall in each class are counted and recorded. That table shows the frequency of
occurrence of different value of an observation or character of a single variable.
A frequency distribution is a comprehensive way to classify raw data of a
quantitative or qualitative variable. It shows how the different values of a variable are
distributed in different classes along with their corresponding class frequencies.
In frequency distribution, the organization of classified data in a table is done
using categories for the data in one column and the frequencies for each category in the
second column.
5.2 Types of frequency distribution:
1. Simple frequency distribution:
a) Raw Series/individual series/ungrouped data: Raw data have not been manipulated
or treated in any way beyond their original measurement. As such, they will not be
arranged or organized in any meaningful manner. Series of individual observations is a
simple listing of items of each observation. If marks of 10 students in statistics of a
class are given individually, it will form a series of individual observations. In raw series,
each observation has frequency of one. Ex: Marks of Students: 55, 73, 60, 41, 60, 61, 75,
73, 58, 80.
b) Discrete frequency distribution: In a discrete series, the data are presented in such a
way that exact measurements of units are indicated. There is definite difference
between the variables of different groups of items. Each class is distinct and separate
from the other class. Discontinuity from one class to another class exists. In a discrete
Dr. Mohan Kumar, T. L. 23
frequency distribution, we count the number of times each value of the variable in data.
This is facilitated through the technique of tally bars. Ex: Number of children’s in 15
families is given by 1, 5, 2, 4, 3, 2, 3, 1, 1, 0, 2, 2, 3, 4, 2.
Children (No.s)
(x)
Tally Frequency (f)
0 | 1
1 ||| 3
2 |||| 5
3 ||| 3
4 || 2
5 | 1
Total 15
c) Continuous (grouped) frequency distribution:
When the range of the data is too large or the data measured on continuous
variable which can take any fractional values, must be condensed by putting them into
smaller groups or classes called “Class-Intervals”. The number of items which fall in a
class-interval is called as its “Class frequency”. The presentation of the data into
continuous classes with the corresponding frequencies is known as
continuous/grouped frequency distribution.
Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56,
74.
Class –Interval
(C.I.)
Tally Frequency
(f)
0-25 || 2
25-50 ||| 3
50-75 |||| || 7
75-100 ||| 3
Total 15
Types of continuous class intervals: There are three methods of class intervals namely
i) Exclusive method (Class-Intervals)
ii) Inclusive method (Class-Intervals)
iii) Open-end classes
i) Exclusive method: In an exclusive method, the class intervals are fixed in such a way
Dr. Mohan Kumar, T. L. 24
that upper limit of one class becomes the lower limit of the next immediate class.
Moreover, an item equal to the upper limit of a class would be excluded from that class
and included in the next class. Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42,
62, 72, 83, 15, 75, 87, 93, 56, 74.
Class –Interval
(C.I.)
Tally Frequency
(f)
0-25 || 2
25-50 ||| 3
50-75 |||| || 7
75-100 ||| 3
Total 15
ii) Inclusive method: In this method, the observation which are equal to upper as well as
lower limit of the class are included to that particular class. It should be clear that upper
limit of one class and lower limit of immediate next class are different.
Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93,
56, 74.
Class–Interval
(C.I.)
Tally Frequency
(f)
0-25 || 2
26-50 ||| 3
51-75 |||| || 7
76-100 ||| 3
Total 15
iii) Open-End classes: In this type of class interval, the lower limit of the first class
interval or the upper limit of the last class interval or both are not specified or not given.
The necessity of open end classes arises in a number of practical situations, particularly
relating to economic, agriculture and medical data when there are few very high values
or few very low values which are far apart from the majority of observations.
The lower limit of first class can be obtained by subtracting magnitude of next
Dr. Mohan Kumar, T. L. 25
class from the upper limit of the open class. The upper limit of last class can be
obtained by adding magnitude of previous class to the lower limit of the open class.
Ex: for open-end type
< 20 Below 20 Less than 20 0-20
20-40 20-40 20-40 20-40
40-60 40-60 40-60 40-60
60-80 60-80 60-80 60-80
>80 80 and Above 80-100 80 –over
Difference between Exclusive and Inclusive Class-Intervals
Exclusive Method Inclusive Method
The observations equal to upper limits of
the class is excluded from that class and
are included in the immediate next class.
The observations equal to both upper and
lower limit of a particular class is counted
(includes) in the same class.
The upper limit of one class and lower
limit of immediate next class are same.
The upper limit of one class and lower
limit of immediate next class are different.
There is no gap between upper limit of one
class and lower limit of another class.
There is gap between upper limit of one
class and lower limit of another class.
This method is always useful for both
integer as well as fractions variable like
age, height, weight etc.
This method is useful where the variable
may take only integral values like
members in a family, number of workers in
a factory etc., It cannot be used with
fractional values like age, height, weight
etc.
There is no need to convert it to inclusive
method to prior to calculation.
For simplification in calculation it is
necessary to change it to exclusive
method.
2. Relative frequency distribution:
It is the fraction or proportion of total number of items belongs to the classes.
Dr. Mohan Kumar, T. L. 26
Relative frequency of a class =
Actual Frequency of the class
Total frequency
Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56,
74.
Class –Interval
(C.I.)
Tally Frequency
(f)
Relative Frequency
0-25 || 2 2/15=0.1333
25-50 ||| 3 3/15=0.2000
50-75 |||| || 7 7/15=0.4666
75-100 ||| 3 3/15=0.2000
Total 15 15/15=1.000
3. Percentage frequency distribution:
Comparison becomes difficult and impossible when the total numbers of items
are too large and highly different from one distribution to other. Under these
circumstances percentage frequency distribution facilitates easy comparability.
The percentage frequency is calculated on multiplying relative frequency by 100.
In percentage frequency distribution, we have to convert the actual frequencies into
percentages.
Percentage frequency of a class = ( 100
Actual Frequency of the class
Total frequency
=Relative frequency ( 100
Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56,
74.
Class –Interval (C.I.) Tally Frequency (f) Percentage Frequency
0-25 || 2
×100 =13.33
2
15
25-50 ||| 3
×100 =20.00
3
15
Dr. Mohan Kumar, T. L. 27
50-75 |||| || 7
×100 =46.66
7
15
75-100 ||| 3
×100 =20.00
3
15
Total 15 100 %
4. Cumulative Frequency distribution:
Cumulative frequency distribution is running total of the frequency values. It is
constructed by adding the frequency of the first class interval to the frequency of the
second class interval. Again add that total to the frequency in the third class interval and
continuing until the final total appearing opposite to the last class interval, which will be
the total frequencies. Cumulative frequency is used to determine the number of
observations that lie above (or below) a particular value in a data set.
xi fi Cumulative
frequency
C.I. Tally Frequency
(f)
Cumulative Frequency
0-25 || 2 2
25-50 ||| 3 2+3=5
50-75 |||| || 7 2+3+7=12
75-10
0
||| 3 2+3+7+3=15 =N
Total 15
x1
x2
.
.
xn
f1
f2
.
.
fn
f1
f1+f2
.
.
f1+f2…..fn=N
∑fi= N
5. Cumulative percentage frequency distribution:
Instead of cumulative frequency, if we given cumulative percentages, the
distributions are called cumulative percentage frequency distribution. We can form this
table either by converting the frequencies into percentages and then cumulate it or we
can convert the given cumulative frequency into percentages.
Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93,
56, 74
Dr. Mohan Kumar, T. L. 28
(C.I.) Tally Frequency
(f)
Percentage
Frequency
Cumulative Percentage
Frequency
0-25 || 2
×100 =13.33
2
15
13.33
25-50 ||| 3
×100 =20.00
3
15
13.33+20=33.33
50-75 |||| || 7
×100 =46.66
7
15
13.33+20+46.66=79.9
9
75-10
0
||| 3
×100 =20.00
3
15
13.33+20+46.66+20=1
00
Total 15 100 %
6. Univariate frequency distribution:
Frequency distributions, which studies only one variable at a time are called
univariate frequency distribution.
7. Bivariate and Multivariate frequency distribution:
Frequency distributions, which studies two variable simultaneously are known as
bivariate frequency distribution and it can be summarized in the form of a table is called
bivariate (two-way) frequency table. If data are classified on the basis of more than two
variables, then distribution is known multivariate frequency distribution.
5.3 Construction of frequency distributions:
1) Construction of discrete frequency distribution:
When the given data is related to discrete variable, then first arrange all possible
values of the variable in ascending order in first column. In the next column, tally marks
(||||) are written to count the number of times particular values of the variable repeated.
In order to facilitate counting block of five cross tally marks (/) are prepared and some
space is left between every pair of blocks. Then count the number of tally marks
corresponding to a particular value of the variable and written against it in the third
column known as the frequency column. This type of representation of the data is
called discrete frequency distribution.
2) Construction of Continuous frequency distribution:
In case of continuous data, we make use of class interval method to construct
the frequency distribution.
Dr. Mohan Kumar, T. L. 29
Nature of Class Interval: The following are some basic technical terms when a
continuous frequency distribution is formed.
a) Class Interval: The class interval is defined as the size of each grouping of data. For
example, 50-75, 75-100, 100-125… are class intervals.
b) Class limits: The two boundaries of class i.e. the minimum and maximum values of a
class-interval are known as the lower limits and the upper limit of the class. In statistical
calculations, lower class limit is denoted by L and upper class limit by U. For example,
take the class 50-100. The lowest value of the class is 50 and highest class is 100.
c) Range: The difference between largest and smallest value of the observation is called
as Range and is denoted by ‘R’. i.e. R = Largest value – Smallest value= L - S
d) Mid-value or mid-point: The central point of a class interval is called the mid value or
mid-point. It is found out by adding the upper and lower limits of a class and dividing the
sum by 2.
i.e. Mid -point =
L +U
2
e) Frequency of class interval: Number of observations falling within a particular class
interval is called frequency of that class.
f) Number of class intervals: The number of class interval in a frequency is matter of
importance. The number of class interval should not be too many. For an ideal
frequency distribution, the number of class intervals can vary from 5 to 15. The number
of class intervals can be fixed arbitrarily keeping in view the nature of problem under
study or it can be decided with the help of “Sturges Rule” given by:
K = 1 + 3. 322 log10 n
Where n = Total number of observations
log = logarithm of base 10,
K = Number of class intervals.
g) Width or Size of the class interval: The difference between the lower and upper class
limits is called Width or Size of class interval and is denoted by ‘C’. The size of the class
interval is inversely proportional to the number of class interval in a given distribution.
The approximate value of the size (or width or magnitude) of the class interval ‘C’ is
obtained by using “Sturges Rule” as
i.e. Size of class interval =C =
Range
No.of Class Interval (K)
Size of class interval =C =
Largest Value – smallest value
1 +3.322 NLog10
Dr. Mohan Kumar, T. L. 30
Steps for construction of Continuous frequency distribution
1. For the given raw data select number of class interval of 5 to 15 or find out the
number
of classes by “Sturges Rule” given by:
K = 1 + 3. 322 log10 n
Where n = Total number of observations
log = logarithm of the number,
K = Number of class intervals.
2. Find out the width of class interval:
Width or Size of class interval =C =
Largest Value – smallest value
1 +3.322 NLog10
Round this result to get a convenient number. You might need to change the number of
classes, but the priority should be to use values that are easy to understand.
3. Find the class limits: You can use the minimum data entry as the lower limit of the first
class. To find the remaining lower limits, add the class width to the lower limit of the
preceding class (Add the class width to the starting point to get the second lower class
limit. Add the class width to the second lower class limit to get the third, and so on.).
4. Find the upper limit of the first class: List the lower class limits in a vertical column and
proceed to enter the upper class limits, which can be easily identified. Remember that
classes cannot overlap. Find the remaining upper class limits.
5. Go through the data set by putting a tally in the appropriate class for each data value.
Use the tally marks to find the total frequency for each class.
Dr. Mohan Kumar, T. L. 31
Chapter 6: DIAGRAMMATIC REPRESENTATION
6.1 Introduction:
One of the most convincing and appealing ways in which statistical results may
be presented is through diagrams and graphs. Just one diagram is enough to represent
a given data more effectively than thousand words. Moreover even a layman who has
nothing to do with numbers can also understands diagrams. Evidence of this can be
found in newspapers, magazines, journals, advertisement, etc....
Diagrams are nothing but geometrical figures like, lines, bars, squares, cubes,
rectangles, circles, pictures, maps, etc... A diagrammatic representation of data is a
visual form of presentation of statistical data, highlighting their basic facts and
relationship. If we draw diagrams on the basis of the data collected, they will easily be
understood and appreciated by all. It is readily intelligible and save a considerable
amount of time and energy.
6.2 Advantage/Significance of diagrams:
Diagrams are extremely useful because of the following reasons.
1. They are attractive and impressive.
2. They make data simple and understandable.
3. They make comparison possible.
4. They save time and labour.
5. They have universal utility.
6. They give more information.
7. They have a great memorizing effect.
6.3 Demerits (or) limitations:
1. Diagrams are approximations presentation of quantity.
2. Minute differences in values cannot be represented properly in diagrams.
3. Large differences in values spoil the look of the diagram and impossible to show
wide gap.
4. Some of the diagrams can be drawn by experts only. eg. Pie chart.
5. Different scales portray different pictures to laymen.
6. Similar characters required for comparison.
7. No utility to expert for further statistical analysis.
6.5 Types of diagrams:
In practice, a very large variety of diagrams are in use and new ones are
constantly being added. For convenience and simplicity, they may be divided under the
following heads:
Dr. Mohan Kumar, T. L. 32
1. One-dimensional diagrams 3. Three-dimensional
diagrams
2. Two-dimensional diagrams 4. Pictograms and
Cartograms
6.5.1 One-dimensional diagrams:
In such diagrams, only one-dimensional measurement, i.e height or length is
used and the width is not considered. These diagrams are in the form of bar or line
charts and can be classified as
1. Line diagram 4. Percentage bar diagram
2. Simple bar diagram 5. Multiple bar diagram
3. Sub-divided bar diagram
1. Line diagram:
Line diagram is used in case where there are many items to be shown and there
is not much of difference in their values. Such diagram is prepared by drawing a vertical
line for each item according to the scale.
∙ The distance between lines is kept uniform.
∙ Line diagram makes comparison easy, but it is less attractive.
Ex: following data shows number of children
No. of children
(no.s) 0 1 2 3 4 5
Frequency
1
0
1
4 9 6 4 2
Fig 1: Line diagram showing number of children
2. Simple Bar Diagram:
It is the simplest among the bar diagram and is generally used for comparison of
two or more items of single variable or a simple classification of data. For example data
related to export, import, population, production, profit, sale, etc... for different time
Dr. Mohan Kumar, T. L. 33
periods or region.
∙ Simple bar can be drawn vertical or horizontal bar diagram with equal width.
∙ The heights of bars are proportional to the volume or magnitude of the
characteristics.
∙ All bars stand on the same base line.
∙ The bars are separated from each other by equal interval.
∙ To make the diagram attractive, the bars can be coloured.
Ex: Population in different states
P o p u l a t i o n ( m ) 1 9 5 1
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
U P A P M H
c
Fig 2: Simple bar diagram showing population in different states
3. Sub-divided bar diagram:
If we have multi character data for different attributes, we use subdivided or
component bar diagram. In a sub-divided bar diagram, the bar is sub-divided into
various parts in proportion to the values given in the data and the whole bar represent
the total. Such diagram shows total as well as various components of total. Such
diagrams are also called component bar diagrams.
∙ Here, instead of placing the bars for each component side by side we may place
these one on top of the other.
∙ The sub divisions are distinguished by different colours or crossings or dottings.
∙ An index or key showing the various components represented by colors, shades,
dots, crossing, etc... should be given.
Ex: Fallowing table gives the expenditure of families A & B on the different items.
Item of
expenditure
Family
(A)
(Rs)
Family
(B)
(Rs)
Food 1400 2400
House rent 1600 2600
Population (million)
Year UP AP MH
195
1
63.2
2
31.2
5
29.9
8
Dr. Mohan Kumar, T. L. 34
Education 1200 1600
Savings 800 1400
TOTAL 5000 8000
Fig 3: Sub-divided bar diagram indicating expenditure of families A & B
4. Percentage bar diagram or Percentage sub-divided bar diagram:
This is another form of component bar diagram. Sometimes the volumes or
values of the different attributes may be greatly different in such cases sub-divided bar
diagram can’t be used for making meaningful comparisons, and then components of
attributes are reduced to percentages. Here the components are not the actual values
but converted into percentages of the whole. The main difference between the
sub-divided bar diagram and percentage bar diagram is that in the sub-divided bar
diagram the bars are of different heights since their totals may be different whereas in
the percentage bar diagram latter the bars are of equal height since each bar represents
100 percent. In the case of data having sub-division, percentage bar diagram will be
more appealing than sub-divided bar diagram.
Different components are converted to percentages using following formula:
Percentage = x 100
Actual value
Total of actual value
Ex: Expenditure of family A and Family B.
Item of
expenditure
Family
(A)
(Rs)
%
Famil
y
(B)
(Rs)
%
Food 1400 28 2400 30
House rent 1600 32 2600 32.5
Education 1200 24 1600 20
Savings 800 16 1400 17.5
TOTAL 5000 8000
Dr. Mohan Kumar, T. L. 35
Fig 3: Percentage bar diagram indicating expenditure of families A & B
5. Multiple or Compound bar diagram:
This type of diagram is used to facilitate the comparison of two or more sets of
inter-related phenomenon over a number of years or regions.
∙ Multiple bar diagram is simply the extension of simple bar diagram.
∙ Bars are constructed side by side to represent the set of values for comparison.
∙ The different bars for period or related phenomenon are placed together.
∙ After providing some space, another set of bars for next time period or phenomenon
are drawn.
∙ In order to distinguish bars, different colour or crossings or dotting, etc... may be
used
∙ Same type of marking or coloring should be done under each attribute.
∙ An index or foot note has to be prepared to identify the meaning of different colours
or dottings or crossing.
Ex: Population under different states. (Double bar diagram)
Fig 4: Multiple bar diagram indicating
expenditure of families A & B
6.5.2 Two-dimensional diagrams:
In one-dimensional diagrams, only length is taken into account. But in
two-dimensional diagrams the area represents the data, therefore both length and width
have taken into account. Such diagrams are also called Area diagrams or Surface
diagrams. The important types of area diagrams are: Rectangles, Squares, Circles and
Pie-diagrams.
Pie-Diagram or Angular Diagram:
Pie-diagram are very popular diagram used to represent the both the total
magnitude and its different component or sectors parts. The circle represents the total
magnitude of the variable. The various segments are represented proportionately by the
various components of the total. Addition of these segments gives the complete circle.
Population (million)
Year UP AP MH
Dr. Mohan Kumar, T. L. 36
Such a component circular diagram is known as Pie or Angular diagram. While making
comparisons, pie diagrams should be used on a percentage basis and not on an
absolute basis.
Procedure for Construction of Pie Diagram
1) Convert each component of total into corresponding angles in degrees. Degree
(Angle) of any component can be calculated by following formula.
Angle = (
Actual value
Total of actual value
3600
Angles are taken to the nearest integral values.
2) Using a compass draw a circle of any convenient radius. (Convenient in the
sense that it looks neither too small nor too big on the paper.)
3) Using a protractor divide the circle in to sectors whose angles have been
calculated in step-1. Sectors are to be in the order of the given items.
4) Various component parts represented by different sector can be distinguished by
using different shades, designs or colours.
5) These sectors can be distinguished by their labels, either inside (if possible) or
just outside the circle with proper identification.
Ex: The cropping pattern in Karnataka in the year 2001-2002 was as
fallows.
CROPS AREA(h
a)
Angle in
(degrees)
Cereals
3940 214
0
Oil
seeds
1165 63
0
Pulses 464 25
0
Cotton 249 13
0
Others 822 45
0
Total 6640 360
0
6.5.3 Three-dimensional diagrams:
Dr. Mohan Kumar, T. L. 37
Three-dimensional diagrams, also known as volume diagram, consist of cubes,
cylinders, spheres, etc. In theses diagrams three things, namely length, width and height
have to be taken into account.
Ex: Cubes, cylinders, spears etc...
6.5.4 Pictogram and Cartogram:
i) Pictogram:
The technique of presenting the data through picture is called as pictogram. In
this method the magnitude of the particular phenomenon, being studied, is drawn. The
sizes of the pictures are kept proportional to the values of different magnitude to be
presented.
ii) Cartogram:
In this technique, statistical facts are presented through maps accompanied by
various type of diagrammatic presentation. They are generally used to presents the
facts according to geographical regions. Population and its other constituent like birth,
death, growth, density, production, import, exports, and several other facts can be
presented on the maps with certain colours, dots, cross, points etc...
Dr. Mohan Kumar, T. L. 38
Dr. Mohan Kumar, T. L. 39
Chapter 7: GRAPHICAL REPRESENTATION OF DATA
7.1 Introduction
From the statistical point of view, graphic presentation of data is more
appropriate and accurate than the diagrammatic representation of the data. Diagrams
are limited to visual presentation of categorical and geographical data and fail to
present the data effectively relating to time-series and frequency distribution. In such
cases, graphs prove to be very useful.
A graph is a visual form of presentation of statistical data, which shows the
relationship between two or more sets of figures. A graph is more attractive than a table
of figure. Even a common man can understand the message of data from the graph.
Comparisons can be made between two or more phenomena very easily with the help
of a graph.
The word graph associated with the word “Graphic”, which means “Vivid” or
“Spraining to life”. Vivid means evoking life like image within mind.
7. 2 The difference between graph and diagram :
Sl. No. Diagram Graphs
1 Diagrams are represent by diagram
& pictures viz. bars, squares, circles,
cubes etc.
Graphs are represented by points (dots
and lines).
2 Diagrams can be drawn on plain
paper and any sort of paper.
Graphs can be drawn only on graph
paper.
3 Diagrams cannot be used to find
measures of central tendency such
as median, mode etc.
Graphs can be used to locate measures
of central tendency such as median,
mode etc.
4 Diagrams are used to represent
categorical or geographical data.
Graphs are used to represent frequency
distribution and time series data
5 Diagrams can be represented as an
approximate idea.
Graphs represented data as an exact
information.
6 Diagrams are more effective and
impressive.
Graphs are not more effective and
impressive.
Dr. Mohan Kumar, T. L. 40
7 Diagrams have everlasting effect. Graphs don’t have everlasting effect.
7.3 Advantage/function of graphical representation
1. It facilitates comparison between different variables.
2. It explains the correlation or relationship between two different variable or
events.
3. It helps on finding out the effect of the all other factors on the change of the
main factor under study.
4. Its helps in forecasting on the basis of present data or previous data.
5. It helps in planning statistical analysis and general procedures of research study.
6. For representing frequency distribution, diagrams are rarely used when
compared with graphs. For example, for the time series data, graphs are more
appropriate than diagrams.
7.4 Limitations:
1. The graph cannot show all those facts which are there in the tables.
2. The graph can show the approximate value only, while table gives exact value.
3. The graph takes more time to draw than tables.
4. Graphs does not reveal the accuracy of data, they show the fluctuation of data
The technique of presenting the statistical data by graphic curve is generally used to
depict two types of statistical series:
I. Time-Series data and
II. Frequency Distribution.
7.5. Time-Series Graph or Historigrams:
Graphical representation of time-series data is known as Historigram. In this
case, time is represented on the X-axis and the magnitude of the variable on the Y-axis.
Taking the time scale as x-coordinate and the corresponding magnitude of variable as
the y-coordinate, points are plotted on the graph paper, and they are joined by lines.
Ex: Time-series graphs on export, import, area under irrigation, sales over years.
1) One Variable Historigram:
In this graphs only one variable is to be represented graphically. Here, time scale
is plotted on the x-axis and the other variable is on the y-axis. The various points thus
obtained are joined by straight line.
Dr. Mohan Kumar, T. L. 41
Fig7.1: Cattle sales over different years
2) Historigram of Two or More Than Two Variables (Single Scale):
Time-series data relating to two or more variables measured in the same units
and belonging to same time period can well be plotted together in the same graph using
the same scales for all the variables along Y-axis and same scale for time along X-axis
for each variable. Here we get a number of curves, one for each variable. Hence it is
essential to depict the each graph by different lines, viz. thin and thick, lines, dotted
lines, dash lines, dash-dot lines etc..
Fig 7.2. Historigram of Two or More Than Two Variables
3) Historigram with Two Scales:
Sometimes variable to be plotted on Y-axis are expressed in two different units,
viz, Rs. Kg. Acres, Km. etc... In such cases, one value with some scale is plotted on the
left Y-axis and other values with others scale on right Y-axis.
4) Belt Graph or Band Curve:
A band graph is a type of line graph which shows the total for successive time
periods broken-up into sub-totals for each of the components of the total. The various
components parts are plotted one over the other. The graphs between the successive
lines are filled by different shades, colors, etc... Belt graph is also known as constituent
element chart or component part line chart.
5) Range Graph:
It is used to depict and emphasize the range of variation of a phenomenon for
each period. For instance, it may be used to show the maximum and minimum
temperature of days of place, price of the commodity on different period of time, etc...
Dr. Mohan Kumar, T. L. 42
7.6 Frequency Distribution Graphs:
Frequency distribution may also be presented graphically in any of the following
way, in which the measurement, class-limits or mid-values are taken along horizontal
(X-axis) and frequencies along Y-axis.
1. Histogram
2. Frequency Polygon
3. Frequency Curve
4. Ogives or Cumulative frequency curve
1. Histogram:
Histogram is the most popular and widely used graph for presentation of
frequency distributions. In histogram, data are plotted as a series of rectangles or bars.
The height of each rectangle or bars represents the frequency of the class interval and
width represents the size of the class intervals. The area covered by histogram is
proportional to the total frequencies represented. Each rectangle is formed adjacent to
other so as to give a continuous picture. Histogram is also called staircase or block
diagram. There are as many rectangles as many classes. Class intervals are shown on
the X-axis and the frequencies on the Y-axis.
Ex: Systolic Blood Pressure (BP) in mm of people
Systolic BP No.of
persons
100-109 7
110-119 16
120-129 19
130-139 31
140-149 41
150-159 23
160-169 10
170-179 3
Fig 7.3: Systolic Blood Pressure (BP) in mmHg of people
Dr. Mohan Kumar, T. L. 43
Construction of Histogram:
i) Construction Histogram for frequency distributions having equal class intervals:
i) Convert the data into the exclusive class intervals if it is given in the inclusive
class intervals.
ii) Each class interval is drawn on the X-axis by section or base (width of rectangle)
which is equal to the magnitude of class interval. On the Y-axis, we have to plot
the corresponding frequencies.
iii) Build the rectangles on each class-intervals having height proportional to the
corresponding frequencies of the classes.
iv) It should be kept in mind that rectangles are drawn adjacent to each other. These
adjacent rectangles thus formed gives histogram of frequency distribution.
2) Histogram for frequency distributions having un-equal class intervals:
i) In case of frequency distribution of un-equal class interval, it becomes bit difficult
to construct a histogram.
ii) In such cases, a correction of un-equal class interval is essential by determining
the “frequency density” or “relative frequency”.
iii) Here height of bar in histogram constitutes the frequency density instead of
frequency, which are plotted on the Y-axis.
iv) The frequency density is determined using the following formula:
Frequency density =
Frequency of Class Interval
Magnitude (Width) of class interval
Drawbacks of Histogram:
Construction of histograms is not possible for open-end class intervals
Remarks: 1) Histogram can be drawn only when the frequency distribution is continuous
frequency distribution.
2) Histogram can be used to graphically locate the Mode value.
Difference between Histogram and Bar diagrams:
Histogram Bar diagrams
Histograms are two dimensional (area)
diagrams which consider height &
width
Bar diagrams are one dimensional
which consider only height
Bars are placed adjacent to each other Bars are placed such that there exist
uniform distance between two bars
Dr. Mohan Kumar, T. L. 44
Class frequencies are shown by area
of rectangle.
Volumes/magnitude are shown by the
height of the bars
Histogram is used to represent
frequency distribution data
Bar diagrams are used to represent
geographical and categorical data.
2. Frequency Polygon:
Frequency polygon is another way of graphical presentation of a frequency
distribution; it can be drawn with the help of histogram or mid-points.
If we mark the midpoints of the top horizontal sides of the rectangles in a
histogram and join them by a straight line or using scale, the figure so formed is called
as frequency polygon (Using histogram). This is done under the assumption that the
frequencies in a class interval are evenly distributed throughout the class.
The frequencies of the classes are pointed by dots against the mid-points of
each class intervals. The adjacent dots are then joined by straight lines or using scale.
The resulting graph is known as frequency polygon (Using mid-points or without
histogram).
The area of the polygon is equal to the area of the histogram, because the area
left outside is just equal to the area included in it.
Fig 7.4 :Frequency Polygon
Difference between Histogram and Frequency Polygon:
Histogram Frequency Polygon
Histogram is two dimensional Frequency Polygon is multi-dimensional
Histogram is bar graph Frequency Polygon is a line graph
Only one histogram can be plotted
on same axis.
Several Frequency Polygon can be plotted
on the same axis
Dr. Mohan Kumar, T. L. 45
Histogram is drawn only for
continuous frequency distribution
Frequency Polygon can be drawn for both
discrete and continuous frequency
distribution
3. Frequency Curve:
Similar to frequency polygon, frequency curve can be drawn with the help of
histogram or mid-points. Frequency curve is obtained by joining the mid-points of the
tops of the rectangles in a histogram by smooth hand curve or free hand curve (Using
Histogram).
The frequencies of the classes are pointed by dots against the mid-points of
each class. The adjacent dots are then joined by smooth hand curve or free hand curve.
The resulting graph is known as frequency curve (Using mid-points or without
histogram).
Fig 7.5: Frequency Curve
4. Ogives or Cumulative Frequency Curve:
For a set of observations, we know how to construct a frequency distribution. In
some cases we may require the number of observations less than a given value or more
than a given value. This is obtained by accumulating (adding) the frequencies up to (or
above) the give value. This accumulated frequency is called cumulative frequency.
These cumulative frequencies are then listed in a table is called cumulative frequency
table. The curve is obtained by plotting cumulative frequencies is called a cumulative
frequency curve or an ogive curve.
There are two methods of constructing ogive namely:
i) The ‘less than ogive’ method.
ii) The ‘more than ogive’ method.
i) The ‘Less than Ogive’ method:
In this method, the frequencies of all preceding class-intervals are added to the
frequency of a class. Here we start with the upper limits of the classes and go on
adding the frequencies. After plotting these less than cumulated frequencies against
Dr. Mohan Kumar, T. L. 46
the upper class boundaries of the respective classes we get ‘Less than Ogive’, which is
an increasing curve, sloping upwards from the left to right and has elongated S shape.
ii) The ‘More than Ogive’ method: In this method, the frequencies of all succeeding
class-intervals are subtracted to the frequency of a class. Here we start with the lower
limits of the classes and go on subtracting the frequencies. After plotting these more
than cummulated frequencies against the lower class boundaries of the respective
classes we get ‘More than Ogive’, which is a decreasing curve, sloping downwards from
the left to right and has elongated S shape on upside down.
Fig 7.6 : Less than and more than ogive curve
Remarks:
Less than ogive and more than ogive can be drawn on the same graph. The
interaction between less than ogive and more than ogive gives the median value.
Advantage of Ogive curve:
1. Ogive curves are useful for graphic computation of partition values like median,
quartiles, deciles, percentiles.
2. They can be used to determine the graphically the portion of observations below/
above the given values or lying between certain intervals.
3. They can be used as cumulative percentage curve or percentile curves.
4. They are more suitable for comparison of two or more frequency distributions than
simple frequency curve.
Dr. Mohan Kumar, T. L. 47
Chapter 8: MEASURES OF CENTRAL TENDENCY or AVERAGE
8.1 Introduction
While studying the population with respect to variable/characteristic of our
interest, we may get a large number of raw observations which are uncondensed form.
It is not possible to grasp any idea about the characteristic by looking at all the
observations. Therefore, it is better to get single number for each group. That number
must be a good representative one for all the observations to give a clear picture of that
characteristic. Such representative number can be a central value for all these
observations. This central value is called a measure of central tendency or an average
or measure of locations.
8.2 Definition:
“A measure of central tendency is a typical value around which other figures
congregate.”
8.3 Objective and function of Average
1) To provide a single value that represents and describes the characteristic of
entire group.
2) To facilitate comparison between and within groups.
3) To draw a conclusion about population from sample data.
4) To form a basis for statistical analysis.
8.4 Essential characteristics/Properties/Pre-requisite for a good or an ideal Average:
The following characteristics should possess for an ideal average.
1. It should be easy to understand and simple to compute.
2. It should be rigidly defined.
3. Its calculation should be based on all the items/observations in the data set.
4. It should be capable of further algebraic treatment (mathematical
manipulation).
5. It should be least affected by sampling fluctuation.
6. It should not be much affected by extreme values.
7. It should be helpful in further statistical analysis.
8.5 Types of Average
Mathematical Average Positional Average Commercial Average
Dr. Mohan Kumar, T. L. 48
1) Arithmetic Mean or Mean
i) Simple Arithmetic Mean
ii) Weighted Arithmetic
Mean
iii) Combined Mean
2) Geometric Mean
3) Harmonic Mean
1) Median
2) Mode
3) Quantiles
i) Quartiles
ii) Deciles
iii) Percentiles
1) Moving Average
2) Progressive Average
3) Composite Average
8.6 Mathematical Average:
The average calculated by well defined mathematical formula is called as
mathematical average. It is calculated by taking into account of all the values in the
series.
Ex: Arithmetic mean, Geometric mean, Harmonic mean
1) Arithmetic Mean (AM) or Mean:
Arithmetic Mean is most popular and widely used measure of average. It is
defined as the sum of all the individual observations divided by total number of
observations. Arithmetic Mean is denoted by .
̅
X
= =
̅
X
Sum of all the observations
Total number of observations
∑X
n
is denote the sum of all the observation and n is number of observations.where∑X
i) Simple Arithmetic Mean/ Simple Mean:
Simple Arithmetic mean is defined as the sum of all the individual observations
divided by total number of observations. Simple arithmetic mean gives same weightage
to all the observation in the series, so it is called simple.
Computation of Simple Arithmetic Mean:
i) For raw data/individual-series/ungrouped data:
If are ‘n’ observations, then their arithmetic mean ( is given by:, …….x1 x2 xn )
̅
X
Dr. Mohan Kumar, T. L. 49
a) Direct Method:
= = , i =1,2,..n
̅
X
+ + ………… +x1 x2 xn
n
n
∑i =1
xi
n
where, = sum of the given observations∑n
i =1
xi
n = number of observations
b) assumed mean/ short-cut method:
=A + , i =1,2,..n
̅
X
n
∑i =1
di
n
where, A = the assumed mean or any value in x
= Deviation of ith
value from the assumed mean-A=xdi i
n = number of observations
ii) For frequency distribution data:
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
If are ‘k’ observations with corresponding frequencies , then, …….x1 x2 xk
, …….f1 f2 fk
their arithmetic mean ( is given by:)
̅
X
a) Direct Method:
= = , i =1,2,..k
̅
X
+ + ………… +f1x1 xf2 2 fkxk
+ +… +f1 f2 fk
k
∑i =1
xfi i
N
where, = the sum of product of ith
observation and its frequency∑k
i =1 xfi i
= the sum of the frequencies or total frequencies.N =∑k
i =1fi
K= number of class
b) Assumed Mean/ Short-Cut Method:
=A + , i =1,2,..k
̅
X
k
∑i =1
dfi i
N
where, A = the assumed mean or any value in x
= the sum of the frequencies or total frequencies.N =∑k
i =1fi
Dr. Mohan Kumar, T. L. 50
= the deviation of ith
value from the assumed mean-A=xdi i
= the sum of product of deviation and its frequency∑k
i =1 dfi i
2) Continuous frequency distribution (Grouped frequency distribution) data:
If represents the mid-points of k class-interval, …….m1 m2 mk
with corresponding frequencies , then their- , - ,..., -- , xx0 x1 1
x2 x2 x3 xk -1
xk
, …….f1 f2 fk
arithmetic mean ( is calculated by:)
̅
X
a) Direct Method:
= = , i =1,2,..k
̅
X
+ + ………… +f1m1 mf2 2 fkmk
+ +… +f1 f2 fk
k
∑i =1
mfi i
N
where, = mid-points or mid values of class-intervals.mi
= the sum of product of ith
observation and its frequency.∑k
i =1 mfi i
= the sum of the frequencies or total frequencies.N =∑k
i =1fi
b) Assumed Mean/ Short-Cut Method:
=A + , i =1,2,..k
̅
X
k
∑i =1
dfi i
N
where, A = the assumed mean or any value in x
= the sum of the frequencies or total frequencies.N =∑k
i =1fi
is the deviation of ith
value from the assumed mean=mi -Adi
= the sum of product of deviation and its frequency∑k
i =1 dfi i
c) Step-Deviation Method:
=A + ×C, i =1,2,..k
̅
X
k
∑i =1
fid'
i
N
where, A = the assumed mean or any value in x.
= the sum of the frequencies or total frequencies.N =∑k
i =1fi
= the deviation of ith
value from the assumed mean.=d'
i
-A)(mi
C
Dr. Mohan Kumar, T. L. 51
C = Width of the class interval.
Merits of Arithmetic Mean:
1. It is simplest and most widely used average.
2. It is easy to understand and easy to calculate.
3. It is rigidly defined.
4. Its calculation is based on all the observations.
5. It is suitable for further mathematical treatment.
6. It is least affected by the fluctuations of sampling as possible.
7. If the number of items is sufficiently large, it is more accurate and more reliable.
8. It is a calculated value and is not based on its position in the series.
9. It provides a good basis for comparison.
Demerits of Arithmetic Mean:
1. It cannot be obtained by inspection nor can be located graphically.
2. It cannot be used to study qualitative phenomenon such as intelligence, beauty,
honesty etc.
3. It is very much affected by extreme values.
4. It cannot be calculated for open-end classes.
5. The A. M. computed may not be the actual item in the series
6. Its value can’t be determined if one or more number of observations are missing in
the series.
7. Some time A.M. gives absurd results ex: number of child per family can’t be in
fraction.
Uses of Arithmetic Mean
1. Arithmetic Mean is used to compare two or more series with respect to certain
character.
2. It is commonly & widely used average in calculating Average cost of production,
Average cost of cultivation, Average cost of yield per hectare etc...
3. It is used in calculating standard deviation, coefficient of variance.
4. It is used in calculating correlation co-efficient, regression co-efficient.
5. It is also used in testing of hypothesis and finding confidence limit.
Mathematical Properties of the Arithmetic Mean
Dr. Mohan Kumar, T. L. 52
1. The sum of the deviation of the individual items from the arithmetic mean is
always zero. i.e. ∑ ( – ) = 0xi
̅
x
2. The sum of the squared deviation of the individual items from the arithmetic mean
is always minimum. i.e. ∑ = minimum( – )xi
̅
x
2
3. The Standard Error of A.M. is less than that of any other measures of central
tendency.
4. If are the means of ‘n’ samples of size respectively, then, ,…..
̅
x 1
̅
x 2
̅
x k , …….n1 n2 nk
their combined mean is given by
=
̿
X
+ ……… +n1
̅
x 1 n2
̅
x 2 nk
̅
x k
+ + ………. +n1 n2 nk
5. Arithmetic mean is dependent on change of both Origin and Scale
(i.e. If each value of a variable X is added or subtracted or multiplied or divided by a
constant values k, the arithmetic mean of new series will also increases or
decreases or multiplies or division by the same constant value k.)
6. If any two of the three values viz. A.M. ( ), Total of the items ( ) and number of
̅
X ∑X
observation ( ) are know, then third value can be easily find out.n
ii) Weighted Arithmetic Mean ( :)
̅
X w
In the computation of arithmetic mean, it gives equal importance to each item in
the series. But when different observations are to be given different weights,
arithmetic mean does not prove to be a good measure of central tendency. In such
cases weighted arithmetic mean is to be calculated.
If each value of the variable is multiplied by its weight & the resulting product is
totaled, then the total is divided by total weight gives the weighted arithmetic mean.
If are ‘n’ values of a variable ‘x’ with respective weights are, …….x1 x2 xn , ...w1 w2 wn
assigned to them, then the weighted arithmetic mean is given by:
= =
̅
X w
+ + ………… +w1x1 xw2 2
wnxn
+ +… +w1 w2 wn
n
∑i =1
xwi i
n
∑i =1
wi
Dr. Mohan Kumar, T. L. 53
Uses of the weighted mean:
Weighted arithmetic mean is used in:
1. Construction of index numbers.
2. Comparison of results of two or more groups where number of items differs in
each group.
3. Computation of standardized death and birth rates.
4. When values of items are given in percentage or proportion.
2) Geometric Mean (GM):
The geometric mean is defined as nth
root of the product of all the n
observations.
If are ‘n’ observations, then geometric mean is given by,x1 x2.…….xn
where, n = number of observationsGM = . .….….x1 x2 xn
n
Computation of Geometric Mean:
i) For raw data/individual-series/ungrouped data:
If are ‘n’ observations, then their geometric mean is calculated by:, …….x1 x2 xn
GM = =. .….….x1 x2 xn
n
( . .….…. )x1 x2 xn
1/n
Or
GM =anti log
(
n
∑i =1
log10xi
n
)
ii) For frequency distribution data:
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
If are ‘k’ observations with corresponding frequencies , then, …….x1 x2 xk
, …….f1 f2 fk
their geometric mean is computed by:
;GM = =. .….…....xf1
1 xf2
2 xfk
k
N
( . .….….... )xf1
1 xf2
2 xfk
k
1/N
Or
GM =anti log
(
k
∑i =1
( )logfi 10 xi
N
)
where, = the sum of the frequencies or total frequenciesN =∑k
i =1fi
Dr. Mohan Kumar, T. L. 54
2) Continuous frequency distribution (Grouped frequency distribution) data:
If represents the mid-points of k class-interval, …….m1 m2 mk
with their corresponding frequencies , then the- , - ,... , -- , xx0 x1 1
x2 x2 x3 xk -1
xk
, …….f1 f2 fk
geometric mean (GM) is calculated by:
;GM = =. .….…....mf1
1 mf2
2 mfk
k
N
( . .….….... )mf1
1 mf2
2 mfk
k
1/N
Or
GM =anti log
(
k
∑i =1
logfi 10mi
N
)
where, = the sum of the frequencies or total frequenciesN =∑k
i =1fi
Mid-points / mid values of class intervals=mi
Merits of Geometric mean:
1. It is rigidly defined.
2. It is based on all observations.
3. It is capable of further mathematical treatment.
4. It is not affected much by the fluctuations of sampling.
5. Unlike AM, it is not affected much by the presence of extreme values.
6. It is very suitable for averaging ratios, rates and percentages.
Demerits of Geometric mean:
1. Calculation is not simple as that of A.M and not easy to understand.
2. The GM may not be the actual value of the series.
3. It can’t be determined graphically and inspection.
4. It cannot be used when the values are negative because if any one observation is
negative, G.M. becomes meaningless or doesn’t exist.
5. It cannot be used when the values are zero, because if any one observation is
zero, G. M. becomes zero.
6. It cannot be calculated for open-end classes.
Uses of G. M.: The Geometric Mean has certain specific uses, some of them are:
1. It is used in the construction of index numbers.
2. It is also helpful in finding out the compound rates of change such as the rate of
growth of population in a country, average rates of change, average rate of
interest etc..
Dr. Mohan Kumar, T. L. 55
3. It is suitable where the data are expressed in terms of rates, ratios and
percentage.
4. It is most suitable when the observations of smaller values are given more
weightage or importance.
3) Harmonic Mean (HM):
Harmonic mean of set of observations is defined as the reciprocal of the
arithmetic mean of the reciprocal of the given observations.
If are ‘n’ observations, then harmonic mean is given by,x1 x2.…….xn
HM = =
n
+ +…..
1
x1
1
x2
1
xn
n
∑(1
xi
)
where, n = number of observations
Computation of Harmonic Mean:
i) For raw data/individual-series/ungrouped data:
If are ‘n’ observations, then their harmonic mean is given by:, …….x1 x2 xn
HM = =
n
+ +…..
1
x1
1
x2
1
xn
n
∑(1
xi
)
ii) For frequency distribution data :
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
If are ‘k’ observations with corresponding frequencies , then their, …….x1 x2 xk
, …….f1 f2 fk
geometric mean is computed by:
HM = =
∑fi
+ +…..
f1
x1
f2
x2
fk
xk
N
k
∑1
(fi
xi
)
where, = the sum of the frequencies or total frequenciesN =∑k
i =1fi
2) Continuous frequency distribution (Grouped frequency distribution) data:
If represents the mid-points of k class-interval, …….m1 m2 mk
with their corresponding frequencies , then the HM- , - ,... , -- , xx0 x1 1
x2 x2 x3 xk -1
xk
, …….f1 f2 fk
is calculated by:
Dr. Mohan Kumar, T. L. 56
HM = =
∑fi
+ +…..
f1
m1
f2
m2
fk
mk
N
k
∑1
(fi
mi
)
where, = the sum of the frequencies or total frequenciesN =∑k
i =1fi
Mid-points / mid values of class intervals=mi
Merits of H.M.:
1. It is rigidly defined.
2. It is based on all items is the series.
3. It is amenable to further algebraic treatment.
4. It is not affected much by the fluctuations of sampling.
5. Unlike AM, it is not affected much by the presence of extreme values.
6. It is the most suitable average when it is desired to give greater weight to
smaller observations and less weight to the larger ones.
Demerits of H.M:
1. It is not easily understood and it is difficult to compute.
2. It is only a summary figure and may not be the actual item in the series.
3. Its calculation is not possible in case the values of one or more items is either
missing, or zero
4. Its calculation is not possible in case the series contains negative and positive
observations.
5. It gives greater importance to small items and is therefore, useful only when
small items have to be given greater weightage
6. It can’t be determined graphically and inspection.
7. It cannot be calculated for open-end classes.
Uses of H. M.:
H.M. is greater significance in such cases where prices are expressed in
quantities (unit/prices). H.M. is also used in averaging time, speed, distance, quantity
etc... for example if you want to find out average speed travelled in km, average time
taken to travel, average distance travelled etc...
8.7 Positional Averages:
These averages are based on the position of the observations in arranged (either
Dr. Mohan Kumar, T. L. 57
ascending or descending order) series. Ex: Median, Mode, quartile, deciles, percentiles.
1) Median:
Median is the middle most value of the series of the data when the observations
are arranged in ascending or descending order.
The median is that value of the variate which divides the group into two equal
parts, one part comprising all values greater than middle value, and the other all values
less than middle value.
Computation of Median:
i) For raw data/individual-series/ungrouped data:
If are ‘n’ observations, then arrange the given values in the ascending, …….x1 x2 xn
(increasing) or descending (decreasing) order.
Case I: If the number of observations (n) is equal to odd number, median is the middle
value.
i.e. Median =Md = itemof the x variable(
n +1
2 )
th
Case II: If the number of observations (n) is equal to even number, median is the mean
of middle two values
i.e.Median =Md =Average of & items of the x variable(
n
2)
th
( +1
n
2 )
th
ii) For frequency distribution data :
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
If are ‘k’ observations with corresponding frequencies , then, …….x1 x2 xk
, …….f1 f2 fk
their median can be find out using following steps:
Step1: Find cumulative frequencies (CF).
Step2: Obtain total frequency (N) and Find . Where is total frequencies.
N +1
2
N =∑k
i =1fi
Step3: See in the cumulative frequencies the value just greater than , Then the
N +1
2
corresponding value of x is median.
2) Continuous frequency distribution (Grouped frequency distribution) data:
If represents the mid-points of k class-interval, …….m1 m2 mk
with their corresponding frequencies , then the- , - ,... , -- , xx0 x1 1
x2 x2 x3 xk -1
xk
, …….f1 f2 fk
steps given below are followed for the calculation of median in continuous series.
Dr. Mohan Kumar, T. L. 58
Step1: Find cumulative frequencies (CF).
Step2: Obtain total frequency (N) and Find . Where total frequencies
N
2
N =∑n
i =1fi
Step3: See in the cumulative frequency the value first greater than value. Then the(
N
2)
th
corresponding class interval is called the Median class.
Then apply the formula given below.
Median =Md = L +
[ ( c
-c.f.
N
2
f ]
Where, L = lower limit of the median class.
N = Total frequency
f = frequency of the median class
c.f. = cumulative frequency class preceding the median class
C = width of class interval.
Graphic method for Location of median:
Median can be located with the help of the cumulative frequency curve or ‘ogive’ .
The procedure for locating median in a grouped data is as follows:
Step1: The class boundaries, where there are no gaps between consecutive classes, i.e.
exclusive class are represented on the horizontal axis (x-axis).
Step2: The cumulative frequency corresponding to different classes is plotted on the
vertical axis (y-axis) against the upper limit of the class interval (or against the
variate value in the case of a discrete series.)
Step3: The curve obtained on joining the points by means of freehand drawing is called
the ‘ogive’ . The ogive so drawn may be either a (i) less than ogive or a (ii) more
than ogive.
Step4: The value of N/2 is marked on the y-axis, where N is the total frequency.
Step5: A horizontal straight line is drawn from the point N/2 on the y-axis parallel to
x-axis to meet the ogive.
Step6: A vertical straight line is drawn from the point of intersection perpendicular to the
horizontal axis.
Dr. Mohan Kumar, T. L. 59
Step7: The point of intersection of the perpendicular to the x-axis gives the value of the
median.
Fig 6.1: Graphic method for location of median
Remarks:
1. From the point of intersection of ‘ less than’ and ‘more than’ ogives, if a perpendicular
is drawn on the x-axis, the point so obtained on the horizontal axis gives the
value of the median.
Fig 6.2: Graphic method for location of median
Merits of Median:
1. It is easily understood and is easy to calculate.
Dr. Mohan Kumar, T. L. 60
2. It is rigidly defined.
3. It can be located merely by inspection.
4. It is not at all affected by extreme values.
5. It can be calculated for distributions with open-end classes.
6. Median is the only average to be used to study qualitative data where the items
are scored or ranked.
Demerits of Median:
1. In case of even number of observations median cannot be determined exactly.
We merely estimate it by taking the mean of two middle terms.
2. It is not based on all the observations.
3. It is not amenable to algebraic treatment.
4. As compared with mean, it is affected much by fluctuations of sampling.
5. If importance needs to be given for small or big item in the series, then median is
not suitable average.
Uses of Median
1. Median is the only average to be used while dealing with qualitative data which
cannot be measure quantitatively but can be arranged in ascending or
descending order.
Ex: To find the average honesty or average intelligence, average beauty etc...
among the group of people.
2. Used for the determining the typical value in problems concerning wages and
distribution of wealth.
3. Median is useful in distribution where open-end classes are given.
2) Mode:
The mode is the value in a distribution, which occur most frequently or
repeatedly.
It is an actual value, which has the highest concentration of items in and around
it or predominant in the series.
In case of discrete frequency distribution mode is the value of x corresponding to
maximum frequency.
Computation of mode:
i) For raw data/individual-series/ungrouped data:
Dr. Mohan Kumar, T. L. 61
Mode is the value of the variable (observation) which occurs maximum number
of times.
ii) For frequency distribution data :
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
In case of discrete frequency distribution mode is the value of x variable
corresponding to maximum frequency.
2) Continuous frequency distribution (Grouped frequency distribution) data:
If represents the mid-points of n class-interval, …….m1 m2 mk
with corresponding frequencies .- , - ,..., -- ,xx0 x1 1
x2 x2 x3 xn -1 xn , …….f1 f2 fk
Locate the highest frequency, and then the class-interval corresponding to
highest frequency is called the modal class.
Then apply the following formula, we can find mode:
Mode =Mo = L + ×C
-f1 f0
2 - -f1 f0 f2
Where, L = lower limit of the modal class.
C = Class interval of the modal class
= frequency of the class preceding the modal classf0
= frequency of the modal classf1
= frequency of the class succeeding the modal classf2
Graphic method for location of mode:
Steps:
1. Draw a histogram of the given distribution.
2. Join the top right corner of the highest rectangle (modal class rectangle) by a straight
line to the top right corner of the preceding rectangle. Similarly the top left corner
of the highest rectangle is joined to the top left corner of the rectangle on the
right.
3. From the point of intersection of these two diagonal lines, draw a perpendicular to the
x -axis.
4. Read the value in x-axis gives the mode.
Dr. Mohan Kumar, T. L. 62
Fig 6.3: Graphic method for Location of mode
Merits of Mode:
1. It is easy to calculate and in some cases it can be located mere inspection
2. Mode is not at all affected by extreme values.
3. It can be calculated for open-end classes.
4. It is usually an actual value of an important part of the series.
5. Mode can be conveniently located even if the frequency distribution has class
intervals of unequal magnitude provided the modal class and the classes
preceding and succeeding it are of the same magnitude.
Demerits of mode:
1. Mode is ill defined. It is not always possible to find a clearly defined mode.
2. It is not based on all observations.
3. It is not capable of further mathematical treatment.
4. As compared with mean, mode is affected to a greater extent by fluctuations of
sampling.
5. It is unsuitable in cases where relative importance of items has to be considered.
Remarks: In some cases, we may come across distributions with two modes. Such
distributions are called bi-modal. If a distribution has more than two modes, it is said to
be multimodal.
Uses of Mode:
Mode is most commonly used in business forecasting such as manufacturing
units, garments industry etc... to find the ideal size. Ex: in business forecasting for
manufacturing of readymade garments for average size of track suits, average size of
dress, average size of shoes etc....
3) Quantiles (or) Partition Values:
Quantiles are the values of the variable which divide the total number of
Dr. Mohan Kumar, T. L. 63
observations into number of equal parts when it is arranged in order of magnitude.
Ex: Median, Quartiles, Deciles, Percentiles.
i) Median: Median is only one value, which divides the whole series into two equal parts.
ii) Quartiles: Quartiles are three in number and divide the whole series into four equal
parts. They are represented by Q1, Q2, Q3 respectively.
First quratile: =Q1
(n +1)
4
Second quratile: =2Q2
(n +1)
4
=3Third quratile: Q3
(n +1)
4
iii) Deciles: Deciles are nine in number and divide the whole series into ten equal parts.
They are represented by D1, D2 …D9.
First Decile: =D1
(n +1)
10
Second Decile: =2D2
(n +1)
10
:
:
=9Ninth Decile: D9
(n +1)
10
iv) Percentiles: Percentiles are 99 in number and divide the whole series into 100 equal
parts. They are represented by P1, P2…P99.
First Percentile: =P1
(n +1)
100
Dr. Mohan Kumar, T. L. 64
Second Percentile: =2P2
(n +1)
100
:
=99Ninty nine Percentile: P99
(n +1)
100
8.8 Commercial Averages:
These are the averages which are mainly calculated based on needs in business.
Ex: Moving Average, Composite Average, Progressive Average
i) Moving Average (M.A.):
It is a special type of A.M. calculated to obtain a trend in time-series. We can find
M.A. by discarding one figure and adding next figure in sequentially and then computing
A.M. of the values which have be taken by rotation.
If a, b, c, d, and e are values in series, then M.A. is given by
M.A = , ,
a +b +c
3
b +c +d
3
c +d +e
3
ii) Progressive Average (P.A.):
It is a cumulative average used occasionally during the early years of the life of
business. This is computed by taking the entire figure available in each succeeding
years.
If a, b, c, d, and e are values in series, then P.A. is given by
P.A = , , ,
a +b
2
a +b +c
3
a +b +c +d
4
a +b +c +d +e
5
iii) Composite Average:
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note
Statistic note

Más contenido relacionado

La actualidad más candente

Use of statistics in real life
Use of statistics in real lifeUse of statistics in real life
Use of statistics in real lifeHarsh Rajput
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy Pranav Krishna
 
sampling simple random sampling
sampling simple random samplingsampling simple random sampling
sampling simple random samplingDENNY VARGHESE
 
Probability distribution
Probability distributionProbability distribution
Probability distributionRohit kumar
 
What is statistics
What is statisticsWhat is statistics
What is statisticsRaj Teotia
 
Properties of arithmetic mean
Properties of arithmetic meanProperties of arithmetic mean
Properties of arithmetic meanNadeem Uddin
 
SCOPE, IMPORTANCE & USES OF STATISTICS
SCOPE, IMPORTANCE & USES OF STATISTICS      SCOPE, IMPORTANCE & USES OF STATISTICS
SCOPE, IMPORTANCE & USES OF STATISTICS Muhammad Yousaf
 
Graphical Representation of data
Graphical Representation of dataGraphical Representation of data
Graphical Representation of dataJijo K Mathew
 
Ogive presentation
Ogive presentationOgive presentation
Ogive presentationSajidBepari
 
Introduction concepts of Statistics
Introduction concepts of StatisticsIntroduction concepts of Statistics
Introduction concepts of StatisticsSaurabh Patni
 
comparison of CRD, RBD and LSD
comparison of CRD, RBD and LSDcomparison of CRD, RBD and LSD
comparison of CRD, RBD and LSDD-kay Verma
 
Meaning and uses of statistics
Meaning and uses of statisticsMeaning and uses of statistics
Meaning and uses of statisticsRekhaChoudhary24
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.sonia gupta
 
Sampling ppt my report
Sampling ppt my reportSampling ppt my report
Sampling ppt my reportmjfababaer
 

La actualidad más candente (20)

Use of statistics in real life
Use of statistics in real lifeUse of statistics in real life
Use of statistics in real life
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy
 
What is statistics
What is statisticsWhat is statistics
What is statistics
 
sampling simple random sampling
sampling simple random samplingsampling simple random sampling
sampling simple random sampling
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
What is statistics
What is statisticsWhat is statistics
What is statistics
 
Properties of arithmetic mean
Properties of arithmetic meanProperties of arithmetic mean
Properties of arithmetic mean
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
SCOPE, IMPORTANCE & USES OF STATISTICS
SCOPE, IMPORTANCE & USES OF STATISTICS      SCOPE, IMPORTANCE & USES OF STATISTICS
SCOPE, IMPORTANCE & USES OF STATISTICS
 
Graphical Representation of data
Graphical Representation of dataGraphical Representation of data
Graphical Representation of data
 
Spss an introduction
Spss  an introductionSpss  an introduction
Spss an introduction
 
Ogive presentation
Ogive presentationOgive presentation
Ogive presentation
 
The Sign Test
The Sign TestThe Sign Test
The Sign Test
 
Introduction concepts of Statistics
Introduction concepts of StatisticsIntroduction concepts of Statistics
Introduction concepts of Statistics
 
Chi -square test
Chi -square testChi -square test
Chi -square test
 
comparison of CRD, RBD and LSD
comparison of CRD, RBD and LSDcomparison of CRD, RBD and LSD
comparison of CRD, RBD and LSD
 
Meaning and uses of statistics
Meaning and uses of statisticsMeaning and uses of statistics
Meaning and uses of statistics
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Sampling ppt my report
Sampling ppt my reportSampling ppt my report
Sampling ppt my report
 

Similar a Statistic note

Similar a Statistic note (20)

Bas 103
Bas 103Bas 103
Bas 103
 
Statistics Exericse 29
Statistics Exericse 29Statistics Exericse 29
Statistics Exericse 29
 
Basics of Research Types of Data Classification
Basics of Research Types of Data ClassificationBasics of Research Types of Data Classification
Basics of Research Types of Data Classification
 
Statistics / Quantitative Techniques Study Material
Statistics / Quantitative Techniques Study MaterialStatistics / Quantitative Techniques Study Material
Statistics / Quantitative Techniques Study Material
 
Stats notes
Stats notesStats notes
Stats notes
 
Statistics...
Statistics...Statistics...
Statistics...
 
Basic stat
Basic statBasic stat
Basic stat
 
Mathematics and statistics for Managers
Mathematics and statistics for ManagersMathematics and statistics for Managers
Mathematics and statistics for Managers
 
Statistics
StatisticsStatistics
Statistics
 
Branches and application of statistics
Branches and application of statisticsBranches and application of statistics
Branches and application of statistics
 
STAT 102 (1ST).pptx
STAT 102 (1ST).pptxSTAT 102 (1ST).pptx
STAT 102 (1ST).pptx
 
Role of Statistics in Scientific Research
Role of Statistics in Scientific ResearchRole of Statistics in Scientific Research
Role of Statistics in Scientific Research
 
Statistics
StatisticsStatistics
Statistics
 
PPT1.pptx
PPT1.pptxPPT1.pptx
PPT1.pptx
 
Statistics
StatisticsStatistics
Statistics
 
Statistics and agricultural
Statistics and agriculturalStatistics and agricultural
Statistics and agricultural
 
Bahir dar institute of technology.pdf
Bahir dar institute of technology.pdfBahir dar institute of technology.pdf
Bahir dar institute of technology.pdf
 
Status of Statistics
Status of StatisticsStatus of Statistics
Status of Statistics
 
probability and statistics Chapter 1 (1)
probability and statistics Chapter 1 (1)probability and statistics Chapter 1 (1)
probability and statistics Chapter 1 (1)
 
Introduction to Business Statistics
Introduction to Business StatisticsIntroduction to Business Statistics
Introduction to Business Statistics
 

Último

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Último (20)

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Statistic note

  • 1. Dr. Mohan Kumar, T. L. 1 Chapter: 1 INTRODUCTION 1.1. Introduction: In the modern world of computer and information technology, the importance of statistics is very well recognized by all the disciplines. Statistics has originated as a science of statehood and found applications slowly and steadily in Agriculture, Economics, Commerce, Biology, Medicine, Industry, Planning, Education and so on. The word statistics in our everyday life means different things to different people. For a layman, ‘Statistics’ means numerical information expressed in quantitative terms. A student knows statistics more intimately as a subject of study like economics, mathematics, chemistry, physics and others. It is a discipline, which scientifically deals with data, and is often described as the science of data. For football fans, statistics are the information about rushing yardage, passing yardage, and first downs, given a halftime. To the manager of power generating station, statistics may be information about the quantity of pollutants being released into the atmosphere and power generated. For school principal, statistics are information on the absenteeism, test scores and teacher salaries. For medical researchers, investigating the effects of a new drug and patient dairy. For college students, statistics are the grades list of different courses, OGPA, CGPA etc... Each of these people is using the word statistics correctly, yet each uses it in a slightly different way and somewhat different purpose. The term statistics is ultimately derived from the Latin word Status or Statisticum Collegium (council of state), the Italian word Statista ("statesman”), and The German word Statistik, which means Political state. Father of Statistics is Sir R. A. Fisher (Ronald Aylmer Fisher). Father of Indian Statistics is P.C. Mahalanobis (Prasanth Chandra Mahalanobis) 1.2 Meaning of Statistics: The word statistics used in two senses, one is in Singular and the other is in Plural. a) When it is used in singular: It means ‘Subject’ or Branch of Science, which deals with Scientific method of collection, classification, presentation, analysis and interpretation of data obtained by sample survey or experimental studies, which are known as the statistical methods. When we say ‘apply statistics’, it means apply the statistical methods to analyze and interpretation of data. b) When it is used in plural: Statistics is a systematic presentation of facts and figures. The majority of people use the word statistics in this context. They only meant simply
  • 2. Dr. Mohan Kumar, T. L. 2 facts and figures. These figures may be with regard to production of food grains in different years, area under cereal crops in different years, per capita income in a particular state at different times etc., and these are generally published in trade journals, economics and statistics bulletins, annual report, technical report, news papers, etc. 1.3 Definition of Statistics: Statistics has been defined differently by different authors from time to time. One can find more than hundred definitions in the literature of statistics. “Statistics may be defined as the science of collection, presentation, analysis and interpretation of numerical data from the logical analysis”. -Croxton and Cowden “The science of statistics is essentially a branch of applied mathematics and may be regarded as mathematics applied to observational data”. -R. A. Fisher “Statistics is the branch of science which deals with the collection, classification and tabulation of numerical facts as the basis for explanations, description and comparison of phenomenon” -Lovitt A.L. Bowley has defined statistics as: (i) Statistics is the science of counting, (ii) Statistics may rightly be called the Science of averages, and (iii) Statistics is the science of measurement of social organism regarded as a whole in all its manifestations. “Statistics is a science of estimates and probabilities” -Boddington In general: Statistics is the science which deals with the, (i) Collection of data (ii) Organization of data (iii) Presentation of data (iv) Analysis of data & (v) Interpretation of data. 1.4 Types of Statistics: There are two major divisions of statistics such as descriptive statistics and inferential statistics. i) Descriptive statistics is the branch of statistics that involves the collecting, organization, summarization, and display of data.
  • 3. Dr. Mohan Kumar, T. L. 3 ii) Inferential statistics is the branch of statistics that involves drawing conclusions about the population using sample data. A basic tool in the study of inferential statistics is probability. 1.5 Nature of Statistics: Statistics is Science as well as an Art. Statistics as a Science: Statistics classified as Science because of its characteristics as follows 1. It is systematic body of studying knowledge. 2. Its methods and procedure are definite and well organized. 3. It analyzes the cause and effect relationship among variables. 4. Its study is according to some rules and dynamism. Statistics as an Art: Statistics is considered as an art because it provides methods to use statistical laws in solving problems. Also application of statistical methods requires skill and experience of the investigator. 1.6 Aims of statistics: Objective of statistics is 1. To study the population. 2. To study the variation and its causes. 3. To study the methods for reducing data/ summarization of data. 1.7 Functions of statistics: The important functions of statistics are given as follows: 1) To express the facts and statements numerically or quantitatively. 2) To Condensation/simplify the complex facts. 3) To use it as a technique for making comparisons. 4) To establish the association and relationship between different groups. 5) To Estimate the present facts and forecasting future. 6) To Tests of Hypothesis. 7) To formulate the policies and measures their impacts. 1.8 Scope/ Application of Statistics In modern times, the importance of statistics increased and applied in every sphere of human activities. Statistics plays an important role in our daily life, it is useful in almost all science such as social, biological, psychology, education, economics, business management, agricultural sciences, information technology etc...The statistical methods can be and are being used by both educated and uneducated people. In many instances we use sample data to make inferences about the entire
  • 4. Dr. Mohan Kumar, T. L. 4 population. 1) Statistics is used in administration by the Government for solving various problems. Ex: price control, birth-death rate estimation, farming policies related to import, export and industries, assessment of pay and D.A., preparation of budget etc.. 2) Statistics are indispensable in planning and in making decisions regarding export, import, and production etc., Statistics serves as foundation of the super structure of planning. 3) Statistics helps the business man in formulation of polices with regard to business. Statistical methods are applied in market research to analyze the demand and supply of manufactured products and fixing its prices. 4) Bankers, stock exchange brokers, insurance companies etc.. make extensive use of statistical data. Insurance companies make use of statistics of mortality and life premium rates etc., for bankers, statistics help in deciding the amount required to meet day to day demands. 5) Problems relating to poverty, unemployment, food storage, deaths due to diseases, due to shortage of food etc., cannot be fully weighted without the statistical balance. Thus statistics is helpful in promoting human welfare. 6) Statistics is widely used in education. Research has become a common feature in all branches of activities. Statistics is necessary for the formulation of policies to start new course, consideration of facilities available for new courses etc. 7) Statistics are a very important part of political campaigns as they lead up to elections. Every time a scientific poll is taken, statistics are used to calculate and illustrate the results in percentages and to calculate the margin for error. 8) In Medical sciences, statistical tools are widely used. Ex: in order to test the efficiency of a new drug or medicine. To study the variability character like Blood Pressure (BP), pulse rate, Hb %, action of drugs on individuals. To determine the association between diseases with different attributes such as smoking and cancer. To compare the different drug or dosage on living beings under different conditions. In agricultural research, Statistical tools have played a significant role in the analysis and interpretation of data. 1) Analysis of variance (ANOVA) is one of the statistical tools developed by Professor R.A. Fisher, plays a prominent role in agriculture experiments. 2) In making data about dry and wet lands, lands under tanks, lands under irrigation projects, rainfed areas etc... 3) In determining and estimating the irrigation required by a crop per day, per base
  • 5. Dr. Mohan Kumar, T. L. 5 period. 4) In determining the required doses of fertilizer for a particular crop and crop land. 5) In soil chemistry, statistics helps in classifying the soils based on Ph content, texture, structures etc... 6) In estimating the yield losses incurred by particular pest, insect, bird, or rodent etc... 7) Agricultural economists use forecasting procedures to estimation and demand and supply of food and export & import, production 8) Animal scientists use statistical procedures to aid in analyzing data for decision purposes. 9) Agricultural engineers use statistical procedures in several areas, such as for irrigation research, modes of cultivation and design of harvesting and cultivating machinery and equipment. 1.9 Limitations of Statistics: 1) Statistics does not study qualitative phenomenon, i.e. it study only quantitative phenomenon. 2) Statistics does not study individual or single observation; in fact it deals with only an aggregate or group of objects/individuals. 3) Statistics laws are not exact laws; they are only approximations. 4) Statistics is liable to be misused. 5) Statistical conclusions are valid only on average base. i.e. Statistics results are not 100 per cent correct. 6) Statistics does not reveal the entire information. Since statistics are collected for a particular purpose, such data may not be relevant or useful in other situations or cases.
  • 6. Dr. Mohan Kumar, T. L. 6 Chapter 2: BASIC TERMINOLOGIES 2.1 Data: Numerical observations collected in systematic manner by assigning numbers or scores to outcomes of a variable(s). 2.2 Raw Data: Raw data is originally collected or observed data, and has not been modified or transformed in any way. The information collected through censuses, sample surveys, experiments and other sources are called a raw data. 2.3 Types of data according to source: There are two types of data 1. Primary data 2. Secondary data. 2.3.1 Primary data: The data collected by the investigator him-self/ her-self for a specific purpose by actual observation or measurement or count is called primary data. Primary data are those which are collected for the first time, primarily for a particular study. They are always given in the form of raw materials and originals in character. Primary data are more reliable than secondary data. These types of data need the application of statistical methods for the purpose of analysis and interpretation. Methods of collection of primary data Primary data is collected in any one of the following methods 1. Direct personal interviews. 2. Indirect oral interviews 3. Information from correspondents. 4. Mailed questionnaire method. 5. Schedules sent through enumerators. 6. Telephonic Interviews, etc... 2.3.2 Secondary data The data which are compiled from the records of others is called secondary data. The data collected by an individual or his agents is primary data for him and secondary data for all others. Secondary data are those which have gone through the statistical treatment. When statistical methods are applied on primary data then they become secondary data. They are in the shape of finished products. The secondary data are less expensive but it may not give all the necessary information. Secondary data can be compiled either from published sources or unpublished sources. Sources of published data 1. Official publications of the central, state and local governments. 2. Reports of committees and commissions. 3. Publications brought about by research workers and educational associations.
  • 7. Dr. Mohan Kumar, T. L. 7 4. Trade and technical journals. 5. Report and publications of trade associations, chambers of commerce, bank etc. 6. Official publications of foreign governments or international bodies like U.N.O, UNESCO etc. Sources of unpublished data: All statistical data are not published. For example, village level officials maintain records regarding area under crop, crop production etc... They collect details for administrative purposes. Similarly details collected by private organizations regarding persons, profit, sales etc become secondary data and are used in certain surveys. Characteristics of secondary data The secondary data should posses the following characteristics. They should be reliable, adequate, suitable, accurate, complete and consistent. 2.3.3 Difference between primary and secondary data Primary data Secondary The data collected by the investigator him-self/ her-self for a specific purpose The data which are compiled from the records of others is called secondary data. Primary data are those data which are collected from the primary sources. Secondary data are those data which are collected from the secondary sources. Primary data are original because investigator himself collects them. Secondary data are not original. Since investigator makes use of the other agencies. If these data are collected accurately and systematically, their suitability will be very positive. These might or might not suit the objects on enquiry. The collection of primary data is more expensive because they are not readily available. The collection of secondary data is comparatively less expensive because they are readily available. It takes more time to collect the data. It takes less time to collect the data. These are no great need of precaution while using these data. These should be used with great care and caution.
  • 8. Dr. Mohan Kumar, T. L. 8 More reliable & accurate Less reliable & accurate Primary data are in the shape of raw material. Secondary data are usually in the shape of readymade/finished products. Possibility of personal prejudice. Possibility of lesser degree of personal prejudice.
  • 9. Dr. Mohan Kumar, T. L. 9 Grouped data: When the data range vary widely, that data values are sorted and grouped into class intervals, in order to reduce the number of scoring categories to a manageable level, Individual values of the original data are not retained. Ex: 0-10, 11-20, 21-30 Ungrouped data: Data values are not grouped into class intervals in order to reduce the number of scoring categories, they have kept in their original form. Ex: 2, 4, 12, 0, 3, 54, etc.. 2.4 Variable: A variable is a description of a quantitative or qualitative characteristic that varies from observation to observation in the same group and by measuring them we can present more than one numerical values. Ex: Daily temperature, Yield of a crop, Nitrogen in soil, height, color, sex. 2.4.1 Observations (Variate): The specific numerical values assigned to the variables are called observations. Ex: yield of a crop is 30 kg. 2.5 Types of Variables Variable Quantitative Variable (Data) Qualitative Variable (Data) Continuous Variable (Data) Discrete Variable (Data) 2.5.1 Quantitative Variable & Qualitative variable Quantitative Variable: A quantitative variable is variable which is normally expressed numerically because it differs in degree rather than kind among elementary units. Ex: Plant height, Plant weight, length, no of seeds per pod, leaf dry weights, etc... Qualitative Variable: A variable that is normally not expressed numerically because it differs in kind rather than degree among elementary units. The term is more or less synonymous with categorical variable. Some examples are hair color, religion, political affiliation, nationality, and social class. Ex: Intelligence, beauty, taste, flavor, fragrance, skin colour, honesty, hard work etc... Attributes: The qualitative variables are termed as attributes. The qualitatively distinct characteristics such as healthy or diseased, positive or negative. The term is often
  • 10. Dr. Mohan Kumar, T. L. 10 applied to designate characteristics that are not easily expressed in numerical terms. Quantitative data: Data obtained by using numerical scales of measurement or on quantitative variable. These are data in numerical quantities involving continuous measurements or counts. In case of quantitative variables the observations are made in terms of Kgs, quintals, Liter, Cm, meters, kilometers etc... Ex: Weight of seeds, height of plants, Yield of a crop, Available nitrogen in a soil, Number of leaves per plant. Qualitative data: When the observations are made with respect to qualitative variable is called qualitative data. Ex: Crop varieties, Shape of seeds, soil type, taste of food, beauty of a person, intelligence of students etc... 2.5.2 Continuous variable & Discrete variable (Discontinuous variable) Continuous variable & Continuous data: Continuous variables is a variables which assumes all the (any) values (integers as well as fractions) in a given range. A continuous variable is a variable that has an infinite number of possible values within a range. If the data are measured on continuous variable, then the data obtained is continuous data. Ex: Height of a plant, Weight of a seed, Rainfall, temperature, humidity, marks of students, income of the individual etc.. Discrete (Discontinuous) variable and discrete data: A variables which assumes only some specified values i.e. only whole numbers (integers) in a given range. A discrete variable can assume only a finite or, at most countable number of possible values. As the old joke goes, you can have 2 children or 3 children, but not 2.37 children, so “number of children” is a discrete variable. If the data are measured on discrete variable, then the data obtained is discrete data. Ex: Number of leaves in a plant, Number of seeds in a pod, number of students, number of insect or pest, 2.6 Population: The aggregate or totality of all possible objects possessing specified characteristics which is under investigation is called population. A population consists of all the items or individuals about which you want to reach conclusions. A population is a collection or well defined set of individual/object/items that describes some
  • 11. Dr. Mohan Kumar, T. L. 11 phenomenon of study of your interest. Ex: Total number of students studying in a school or college, total number of books in a library, total number of houses in a village or town. In statistics, the data set is the target group of your interest is called a population. Notice that, a statistical population does not refer to people as in our everyday usage of the term; it refers to a collection of data. 2.6.1 Census (Complete enumeration): When each and every unit of the population is investigated for the character under study, then it is called Census or Complete enumeration. 2.6.2 Parameter: A parameter is a numerical constant which is measured to describe the characteristic of a population. OR A parameter is a numerical description of a population characteristic. Generally Parameters are not know and constant value, they are estimated from sample data. Ex: Population mean (denoted as μ), population standard deviation (σ), Population ratio, population percentage, population correlation coefficient (() etc... 2.7 Sample: A small portion selected from the population under consideration or fraction of the population is known as sample. 2.7.1 Sample Survey: When the part of the population is investigated for the characteristics under study, then it is called sample survey or sample enumeration. 2.7.2 Statistic: A statistic is a numerical quantity that measured to describes the characteristic of a sample. OR A Statistic is a numerical description of a sample characteristics. Ex: Sample Mean ( ), Sample Standard. Deviation (s), sample ratio, sample ̅ X proportionate etc.. 2.8 Nature of data: It may be noted that different types of data can be collected for different purposes. The data can be collected in connection with time or geographical location or in connection with time and location. The following are the three types of
  • 12. Dr. Mohan Kumar, T. L. 12 data: 1. Time series data. 2. Spatial data 3. Spacio-temporal data. Time series data: It is a collection of a set of numerical values collected and arranged over sequence of time period. The data might have been collected either at regular intervals of time or irregular intervals of time. Ex: The data may be year wise rainfall in Karnataka, Prices of milk over different months Spatial Data: If the data collected is connected with that of a place, then it is termed as spatial data. Ex: The data may be district wise rainfall in karnataka, Prices of milk in four metropolitan cities. Spacio-Temporal Data: If the data collected is connected to the time as well as place then it is known as spacio-temporal data. Ex: Data on Both year & district wise rainfall in Karnataka, Monthly prices of milk over different cities. Chapter 3: CLASSIFICATION 3.1 Introduction The raw data or ungrouped data are always in an unorganized form, need to be organized and presented in meaningful and readily comprehensible form in order to facilitate further statistical analysis. Therefore, it is essential for an investigator to condense a mass of data into more and more comprehensible and digestible form. 3.2 Definition: Classification is the process by which individual items of data are arranged in different groups or classes according to common characteristics or resemblance or similarity possessed by the individual items of variable under study. Ex: 1) For Example, letters in the post office are classified according to their destinations viz., Delhi, Chennai, Bangalore, Mumbai etc... 2) Human population can be divided in to two groups of Males and Females, or into two groups of educated and uneducated persons. 3) Plants can be arranged according to their different heights. Remarks: Classification is done on the basis of single characteristic is called one-way classification. If the classification is done on the basis two characteristics is called two-way classification. Similarly if the classification is done on the basis of more than two characteristic is called multi-way or manifold classification. 3.3 Objectives /Advantages/ Role of Classification: The following are main objectives of classifying the data: 1. It condenses the mass/bulk data in an easily understandable form. 2. It eliminates unnecessary details.
  • 13. Dr. Mohan Kumar, T. L. 13 3. It gives an orderly arrangement of the items of the data. 3. It facilitates comparison and highlights the significant aspect of data. 4. It enables one to get a mental picture of the information and helps in drawing inferences. 5. It helps in the tabulation and statistical analysis. 3.4 Types of classification: Statistical data are classified in respect of their characteristics. Broadly there are four basic types of classification namely 1) Chronological classification or Temporal or Historical Classification 2) Geographical classification (or) Spatial Classification 3) Qualitative classification 4) Quantitative classification 1) Chronological classification: In chronological classification, the collected data are arranged according to the order of time interval expressed in day, weeks, month, years, etc.,. The data is generally classified in ascending order of time. Ex: the data related daily temperature record, monthly price of vegetables, exports and imports of India for different year. Total Food grain production of India for different time periods. Year Production (million tonnes) 2005-06 2006-07 2007-08 2008-09 208.60 217.28 230.78 234.47 2) Geographical classification: In this type of classification, the data are classified according to geographical region or geographical location (area) such as District, State, Countries, City-Village, Urban-Rural, etc... Ex: The production of paddy in different states in India, production of wheat in different countries etc... State-wise classification of production of food grains in India: State Production (in tonnes) Orissa A.P 3,00,000 2,50,000
  • 14. Dr. Mohan Kumar, T. L. 14 U.P Assam 22,00,000 10,000 3) Qualitative classification: In this type of classification, data are classified on the basis of attributes or quality characteristics like sex, literacy, religion, employment social status, nationality, occupation etc... such attributes cannot be measured along with a scale. Ex: If the population to be classified in respect to one attribute, say sex, then we can classify them into males and females. Similarly, they can also be classified into ‘employed’ or ‘unemployed’ on the basis of another attribute ‘employment’, etc... Qualitative classification can be of two types as follows (i) Simple classification (ii) Manifold classification i) Simple classification or Dichotomous Classification: When the classification is done with respect to only one attribute, then it is called as simple classification. If the attributes is dichotomous (two outcomes) in nature, two classes are formed, one possessing the attribute and the other not possessing that attribute. This type of classification is called dichotomous classification. Ex: Population can be divided in to two classes according to sex (male and female) or Income (poor and rich). Population Population Male Female Rich Poor ii) Manifold classification: The classification where two or more attributes are considered and several classes are formed is called a manifold classification. Ex: If we classify population simultaneously with respect to two attributes, Sex and Education, then population are first classified into ‘males’ and ‘females’. Each of these classes may then be further classified into ‘educated’ and ‘uneducated’. Still the classification may be further extended by considering other attributes like income status etc. This can be explained by the following chart Population Male Female Educated Uneducated Educated Uneducated Rich Poor Rich Poor Rich Poor Rich Poor 4) Quantitative classification:
  • 15. Dr. Mohan Kumar, T. L. 15 In quantitative classification the data are classified according to quantitative characteristics that can be measured numerically such as height, weight, production, income, marks secured by the students, age, land holding etc... Ex: Students of a college may be classified according to their height as given in the table Height(in cm) No of students 100-125 125-150 150-175 175-200 20 25 40 15
  • 16. Dr. Mohan Kumar, T. L. 16 Chapter: 4 TABULATION 4.1 Meaning & Definition: A table is a systematic arrangement of data in columns and rows. Tabulation may be defined as the systematic arrangement of classified numerical data in rows or/and columns according to certain characteristics. It expresses the data in concise and attractive form which can be easily understood and used to compare numerical figures, and an investigator is quickly able to locate the desired information and chief characteristics. Thus, a statistical table makes it possible for the investigator to present a huge mass of data in a detailed and orderly form. It facilitates comparison and often reveals certain patterns in data which are otherwise not obvious. Before tabulation data are classified and then displayed under different columns and rows of a table. 4.2 Difference between classification and tabulation: ∙ Classification is a process of classifying or grouping of raw data according to their object, behavior, purpose and usages. Tabulation means a logical arrangement of data into rows and columns. ∙ Classification is the first step to arrange the data, whereas tabulation is the second step to arrange the data. ∙ The main object of the classification to condense the mass of data in such a way that similarities and dissimilarities can be readily find out, but the main object of the tabulation is to simplify complex data for the purpose of better comparison. 4.3 Objectives /Advantages/ Role of Tabulation: Statistical data arranged in a tabular form serve following objectives: 1) It simplifies complex data to enable us to understand easily. 2) It facilitates comparison of related facts. 3) It facilitates computation of various statistical measures like averages, dispersion, correlation etc... 4) It presents facts in minimum possible space, and unnecessary repetitions & explanations are avoided. Moreover, the needed information can be easily located. 5) Tabulated data are good for references, and they make it easier to present the information in the form of graphs and diagrams. 4.4 Disadvantage of Tabulation: 1) The arrangement of data by row and column becomes difficult if the person does
  • 17. Dr. Mohan Kumar, T. L. 17 not have the required knowledge. 2) Lack of description about the nature of data and every data can’t be put in the table. 3) No one section given special emphasis in tables. 4) Table figures/data can be misinterpreted. 3.5 Ideal Characteristics/ Requirements of a Good Table: A good statistical table is such that it summarizes the total information in an easily accessible form in minimum possible space. 1) A table should be formed in keeping with the objects of statistical enquiry. 2) A table should be easily understandable and self explanatory in nature. 3) A table should be formed so as to suit the size of the paper. 4) If the figures in the table are large, they should be suitably rounded or approximated. The units of measurements too should be specified. 5) The arrangements of rows and columns should be in a logical and systematic order. This arrangement may be alphabetical, chronological or according to size. 6) The rows and columns are separated by single, double or thick lines to represent various classes and sub-classes used. 7) The averages or totals of different rows should be given at the right of the table and that of columns at the bottom of the table. Totals for every sub-class too should be mentioned. 8) Necessary footnotes and source notes should be given at the bottom of table 9) In case it is not possible to accommodate all the information in a single table, it is better to have two or more related tables. 4.6 Parts or component of a good Table: The making of a compact table itself an art. This should contain all the information needed within the smallest possible space An ideal Statistical table should consist of the following main parts: 1. Table number 5. Stubs or row designation 2. Title of the table 6. Body of the table 3. Head notes ` 7. Footnotes 4. Captions or column headings 8. Sources of data 1. Table Number: A table should be numbered for easy reference and identification. The table number may be given either in the center at the top above the title or just before the title of the table. 2. Table Title: Every table must be given a suitable title. The title is a description of the
  • 18. Dr. Mohan Kumar, T. L. 18 contents of the table. The title should be clear, brief and self explanatory. The title should explain the nature and period data covered in the table. The title should be placed centrally on the top of a table just below the table number (or just after table number in the same line). Schematic representation of table Table No. : Table title Head notes Stub Headings Caption Row Total Sub Head 1 Sub Head 2 Column Head Column Head Column Head Column Head Stubs entries Body ............ ........... .......... Column Total GrandTotal Foot notes Source notes 3. Head note: It is used to explain certain points relating to the table that have not been included in the title nor in the caption or stubs. For example the unit of measurement is frequently written as head note such as ‘in thousands’ or ‘in million tonnes’ or ‘in crores’ etc... 4. Captions or Column Designation: Captions in a table stands for brief and self explanatory headings of vertical columns. Captions may involve headings and sub-headings as well. Usually, a relatively less important and shorter classification should be tabulated in the columns. 5. Stubs or Row Designations: Stubs stands for brief and self explanatory headings of
  • 19. Dr. Mohan Kumar, T. L. 19 horizontal rows. Normally, a relatively more important classification is given in rows. Also a variable with a large number of classes is usually represented in rows. 6. Body: The body of the table contains the numerical information. This is the most vital part of the table. Data presented in the body arranged according to the description or classification of the captions and stubs. 7. Footnotes: If any item has not been explained properly, a separate explanatory note should be added at the bottom of the table. Thus, they are meant for explaining or providing further details about the data that have not been covered in title, captions and stubs. 8. Sources of data: At the bottom of the table a note should be added indicating the primary and secondary sources from which data have been collected. This may preferably include the name of the author, volume, page and the year of publication.
  • 20. Dr. Mohan Kumar, T. L. 20 4.7 Types of Tabulation: Tables may broadly classify into three categories. I On the basis of no of character used/ Construction: 1) Simple tables 2) Complex tables II On the basis of object/purpose: 1) General purpose/Reference tables 2) Special purpose/Summary tables. III On the basis of originality 1) Primary or original tables 2) Derived tables I On the basis of no of character used/ Construction: The distinction between simple and complex table is based on the number of characteristics studied or based on construction. 1) Simple table: In a simple table only one character data are tabulated. Hence this type of table is also known as one-way or first order table. Ex: Population of country in different state 2) Complex table: If there two or more than two characteristics are tabulated in a table then it is called as complex table. It is also called manifold table. When only two characteristics are shown such a table is known as two-way table or double tabulation. Ex: Two-way table: Population of country in different state and sex-wise Whe n three or more characteristics are represented in the same table is called three-way tabulation. As the number of characteristics increases, the tabulation becomes so complicated and confusing. Ex: Triple table (three way table): Population of country in different State according to State Population KA AP MP UP - - - - Total - State Population Total Males Females KA AP MP UP - - - - - - - - - - - - Total - - -
  • 21. Dr. Mohan Kumar, T. L. 21 Sex and Education Ex: Manifold (Multi way table): When the data are classified according to more than three characters and tabulated. States Status Population Total Male Female Educate d Un educate d Sub-total Educate d Un educated Sub-total Educate d Un educated Total UP Rich Poor Subtota l MP Rich Poor Subtota l Total II On the basis of object/purpose: 1) General tables: General purpose tables sometimes termed as reference tables or information tables. These tables provide information for general use of reference. They usually contain detailed information and are not constructed for specific discussion. These tables are also termed as master tables. Ex: The detailed tables prepared in census reports belong to this class. 2) Special purpose tables: Special purpose tables also known as summery tables which provide information for particular discussion. These tables are constructed or derived from the general purpose tables. These tables are useful for analytical and comparative studies involving the study of relationship among variables. Ex: Calculation of analytical statistics like ratios, percentages, index numbers, etc is incorporated in these tables. State Population Total Males Females Educated Uneducate d Educated Uneducate d KA AP MP UP - - - - - - - - - - - - - - - - - - - - Total - - - - -
  • 22. Dr. Mohan Kumar, T. L. 22 III On the basis of originality: According to nature of originality of data 1) Primary or original tables: This table contains statistical facts in their original form. Figures in these types of tables are not rounded up, but original, actual & absolute in natures. Ex: Time series data recorded on rainfall, foodgrain production etc. 2) Derived tables: This table contains total, ratio, percentage, etc... derived from original tables. It expresses the derived information from original tables. Ex: Trend values, Seasonal values, cyclical variation data. Chapter: 5 FREQUENCY DISTRIBUTIONS 5.1 Introduction: Frequency is the number of times a given value of an observation or character or a particular type of event has appeared/repeated/occurred in the data set. Frequency distribution is simply a table in which the data are grouped into different classes on the basis of common characteristics and the numbers of cases which fall in each class are counted and recorded. That table shows the frequency of occurrence of different value of an observation or character of a single variable. A frequency distribution is a comprehensive way to classify raw data of a quantitative or qualitative variable. It shows how the different values of a variable are distributed in different classes along with their corresponding class frequencies. In frequency distribution, the organization of classified data in a table is done using categories for the data in one column and the frequencies for each category in the second column. 5.2 Types of frequency distribution: 1. Simple frequency distribution: a) Raw Series/individual series/ungrouped data: Raw data have not been manipulated or treated in any way beyond their original measurement. As such, they will not be arranged or organized in any meaningful manner. Series of individual observations is a simple listing of items of each observation. If marks of 10 students in statistics of a class are given individually, it will form a series of individual observations. In raw series, each observation has frequency of one. Ex: Marks of Students: 55, 73, 60, 41, 60, 61, 75, 73, 58, 80. b) Discrete frequency distribution: In a discrete series, the data are presented in such a way that exact measurements of units are indicated. There is definite difference between the variables of different groups of items. Each class is distinct and separate from the other class. Discontinuity from one class to another class exists. In a discrete
  • 23. Dr. Mohan Kumar, T. L. 23 frequency distribution, we count the number of times each value of the variable in data. This is facilitated through the technique of tally bars. Ex: Number of children’s in 15 families is given by 1, 5, 2, 4, 3, 2, 3, 1, 1, 0, 2, 2, 3, 4, 2. Children (No.s) (x) Tally Frequency (f) 0 | 1 1 ||| 3 2 |||| 5 3 ||| 3 4 || 2 5 | 1 Total 15 c) Continuous (grouped) frequency distribution: When the range of the data is too large or the data measured on continuous variable which can take any fractional values, must be condensed by putting them into smaller groups or classes called “Class-Intervals”. The number of items which fall in a class-interval is called as its “Class frequency”. The presentation of the data into continuous classes with the corresponding frequencies is known as continuous/grouped frequency distribution. Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56, 74. Class –Interval (C.I.) Tally Frequency (f) 0-25 || 2 25-50 ||| 3 50-75 |||| || 7 75-100 ||| 3 Total 15 Types of continuous class intervals: There are three methods of class intervals namely i) Exclusive method (Class-Intervals) ii) Inclusive method (Class-Intervals) iii) Open-end classes i) Exclusive method: In an exclusive method, the class intervals are fixed in such a way
  • 24. Dr. Mohan Kumar, T. L. 24 that upper limit of one class becomes the lower limit of the next immediate class. Moreover, an item equal to the upper limit of a class would be excluded from that class and included in the next class. Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56, 74. Class –Interval (C.I.) Tally Frequency (f) 0-25 || 2 25-50 ||| 3 50-75 |||| || 7 75-100 ||| 3 Total 15 ii) Inclusive method: In this method, the observation which are equal to upper as well as lower limit of the class are included to that particular class. It should be clear that upper limit of one class and lower limit of immediate next class are different. Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56, 74. Class–Interval (C.I.) Tally Frequency (f) 0-25 || 2 26-50 ||| 3 51-75 |||| || 7 76-100 ||| 3 Total 15 iii) Open-End classes: In this type of class interval, the lower limit of the first class interval or the upper limit of the last class interval or both are not specified or not given. The necessity of open end classes arises in a number of practical situations, particularly relating to economic, agriculture and medical data when there are few very high values or few very low values which are far apart from the majority of observations. The lower limit of first class can be obtained by subtracting magnitude of next
  • 25. Dr. Mohan Kumar, T. L. 25 class from the upper limit of the open class. The upper limit of last class can be obtained by adding magnitude of previous class to the lower limit of the open class. Ex: for open-end type < 20 Below 20 Less than 20 0-20 20-40 20-40 20-40 20-40 40-60 40-60 40-60 40-60 60-80 60-80 60-80 60-80 >80 80 and Above 80-100 80 –over Difference between Exclusive and Inclusive Class-Intervals Exclusive Method Inclusive Method The observations equal to upper limits of the class is excluded from that class and are included in the immediate next class. The observations equal to both upper and lower limit of a particular class is counted (includes) in the same class. The upper limit of one class and lower limit of immediate next class are same. The upper limit of one class and lower limit of immediate next class are different. There is no gap between upper limit of one class and lower limit of another class. There is gap between upper limit of one class and lower limit of another class. This method is always useful for both integer as well as fractions variable like age, height, weight etc. This method is useful where the variable may take only integral values like members in a family, number of workers in a factory etc., It cannot be used with fractional values like age, height, weight etc. There is no need to convert it to inclusive method to prior to calculation. For simplification in calculation it is necessary to change it to exclusive method. 2. Relative frequency distribution: It is the fraction or proportion of total number of items belongs to the classes.
  • 26. Dr. Mohan Kumar, T. L. 26 Relative frequency of a class = Actual Frequency of the class Total frequency Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56, 74. Class –Interval (C.I.) Tally Frequency (f) Relative Frequency 0-25 || 2 2/15=0.1333 25-50 ||| 3 3/15=0.2000 50-75 |||| || 7 7/15=0.4666 75-100 ||| 3 3/15=0.2000 Total 15 15/15=1.000 3. Percentage frequency distribution: Comparison becomes difficult and impossible when the total numbers of items are too large and highly different from one distribution to other. Under these circumstances percentage frequency distribution facilitates easy comparability. The percentage frequency is calculated on multiplying relative frequency by 100. In percentage frequency distribution, we have to convert the actual frequencies into percentages. Percentage frequency of a class = ( 100 Actual Frequency of the class Total frequency =Relative frequency ( 100 Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56, 74. Class –Interval (C.I.) Tally Frequency (f) Percentage Frequency 0-25 || 2 ×100 =13.33 2 15 25-50 ||| 3 ×100 =20.00 3 15
  • 27. Dr. Mohan Kumar, T. L. 27 50-75 |||| || 7 ×100 =46.66 7 15 75-100 ||| 3 ×100 =20.00 3 15 Total 15 100 % 4. Cumulative Frequency distribution: Cumulative frequency distribution is running total of the frequency values. It is constructed by adding the frequency of the first class interval to the frequency of the second class interval. Again add that total to the frequency in the third class interval and continuing until the final total appearing opposite to the last class interval, which will be the total frequencies. Cumulative frequency is used to determine the number of observations that lie above (or below) a particular value in a data set. xi fi Cumulative frequency C.I. Tally Frequency (f) Cumulative Frequency 0-25 || 2 2 25-50 ||| 3 2+3=5 50-75 |||| || 7 2+3+7=12 75-10 0 ||| 3 2+3+7+3=15 =N Total 15 x1 x2 . . xn f1 f2 . . fn f1 f1+f2 . . f1+f2…..fn=N ∑fi= N 5. Cumulative percentage frequency distribution: Instead of cumulative frequency, if we given cumulative percentages, the distributions are called cumulative percentage frequency distribution. We can form this table either by converting the frequencies into percentages and then cumulate it or we can convert the given cumulative frequency into percentages. Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56, 74
  • 28. Dr. Mohan Kumar, T. L. 28 (C.I.) Tally Frequency (f) Percentage Frequency Cumulative Percentage Frequency 0-25 || 2 ×100 =13.33 2 15 13.33 25-50 ||| 3 ×100 =20.00 3 15 13.33+20=33.33 50-75 |||| || 7 ×100 =46.66 7 15 13.33+20+46.66=79.9 9 75-10 0 ||| 3 ×100 =20.00 3 15 13.33+20+46.66+20=1 00 Total 15 100 % 6. Univariate frequency distribution: Frequency distributions, which studies only one variable at a time are called univariate frequency distribution. 7. Bivariate and Multivariate frequency distribution: Frequency distributions, which studies two variable simultaneously are known as bivariate frequency distribution and it can be summarized in the form of a table is called bivariate (two-way) frequency table. If data are classified on the basis of more than two variables, then distribution is known multivariate frequency distribution. 5.3 Construction of frequency distributions: 1) Construction of discrete frequency distribution: When the given data is related to discrete variable, then first arrange all possible values of the variable in ascending order in first column. In the next column, tally marks (||||) are written to count the number of times particular values of the variable repeated. In order to facilitate counting block of five cross tally marks (/) are prepared and some space is left between every pair of blocks. Then count the number of tally marks corresponding to a particular value of the variable and written against it in the third column known as the frequency column. This type of representation of the data is called discrete frequency distribution. 2) Construction of Continuous frequency distribution: In case of continuous data, we make use of class interval method to construct the frequency distribution.
  • 29. Dr. Mohan Kumar, T. L. 29 Nature of Class Interval: The following are some basic technical terms when a continuous frequency distribution is formed. a) Class Interval: The class interval is defined as the size of each grouping of data. For example, 50-75, 75-100, 100-125… are class intervals. b) Class limits: The two boundaries of class i.e. the minimum and maximum values of a class-interval are known as the lower limits and the upper limit of the class. In statistical calculations, lower class limit is denoted by L and upper class limit by U. For example, take the class 50-100. The lowest value of the class is 50 and highest class is 100. c) Range: The difference between largest and smallest value of the observation is called as Range and is denoted by ‘R’. i.e. R = Largest value – Smallest value= L - S d) Mid-value or mid-point: The central point of a class interval is called the mid value or mid-point. It is found out by adding the upper and lower limits of a class and dividing the sum by 2. i.e. Mid -point = L +U 2 e) Frequency of class interval: Number of observations falling within a particular class interval is called frequency of that class. f) Number of class intervals: The number of class interval in a frequency is matter of importance. The number of class interval should not be too many. For an ideal frequency distribution, the number of class intervals can vary from 5 to 15. The number of class intervals can be fixed arbitrarily keeping in view the nature of problem under study or it can be decided with the help of “Sturges Rule” given by: K = 1 + 3. 322 log10 n Where n = Total number of observations log = logarithm of base 10, K = Number of class intervals. g) Width or Size of the class interval: The difference between the lower and upper class limits is called Width or Size of class interval and is denoted by ‘C’. The size of the class interval is inversely proportional to the number of class interval in a given distribution. The approximate value of the size (or width or magnitude) of the class interval ‘C’ is obtained by using “Sturges Rule” as i.e. Size of class interval =C = Range No.of Class Interval (K) Size of class interval =C = Largest Value – smallest value 1 +3.322 NLog10
  • 30. Dr. Mohan Kumar, T. L. 30 Steps for construction of Continuous frequency distribution 1. For the given raw data select number of class interval of 5 to 15 or find out the number of classes by “Sturges Rule” given by: K = 1 + 3. 322 log10 n Where n = Total number of observations log = logarithm of the number, K = Number of class intervals. 2. Find out the width of class interval: Width or Size of class interval =C = Largest Value – smallest value 1 +3.322 NLog10 Round this result to get a convenient number. You might need to change the number of classes, but the priority should be to use values that are easy to understand. 3. Find the class limits: You can use the minimum data entry as the lower limit of the first class. To find the remaining lower limits, add the class width to the lower limit of the preceding class (Add the class width to the starting point to get the second lower class limit. Add the class width to the second lower class limit to get the third, and so on.). 4. Find the upper limit of the first class: List the lower class limits in a vertical column and proceed to enter the upper class limits, which can be easily identified. Remember that classes cannot overlap. Find the remaining upper class limits. 5. Go through the data set by putting a tally in the appropriate class for each data value. Use the tally marks to find the total frequency for each class.
  • 31. Dr. Mohan Kumar, T. L. 31 Chapter 6: DIAGRAMMATIC REPRESENTATION 6.1 Introduction: One of the most convincing and appealing ways in which statistical results may be presented is through diagrams and graphs. Just one diagram is enough to represent a given data more effectively than thousand words. Moreover even a layman who has nothing to do with numbers can also understands diagrams. Evidence of this can be found in newspapers, magazines, journals, advertisement, etc.... Diagrams are nothing but geometrical figures like, lines, bars, squares, cubes, rectangles, circles, pictures, maps, etc... A diagrammatic representation of data is a visual form of presentation of statistical data, highlighting their basic facts and relationship. If we draw diagrams on the basis of the data collected, they will easily be understood and appreciated by all. It is readily intelligible and save a considerable amount of time and energy. 6.2 Advantage/Significance of diagrams: Diagrams are extremely useful because of the following reasons. 1. They are attractive and impressive. 2. They make data simple and understandable. 3. They make comparison possible. 4. They save time and labour. 5. They have universal utility. 6. They give more information. 7. They have a great memorizing effect. 6.3 Demerits (or) limitations: 1. Diagrams are approximations presentation of quantity. 2. Minute differences in values cannot be represented properly in diagrams. 3. Large differences in values spoil the look of the diagram and impossible to show wide gap. 4. Some of the diagrams can be drawn by experts only. eg. Pie chart. 5. Different scales portray different pictures to laymen. 6. Similar characters required for comparison. 7. No utility to expert for further statistical analysis. 6.5 Types of diagrams: In practice, a very large variety of diagrams are in use and new ones are constantly being added. For convenience and simplicity, they may be divided under the following heads:
  • 32. Dr. Mohan Kumar, T. L. 32 1. One-dimensional diagrams 3. Three-dimensional diagrams 2. Two-dimensional diagrams 4. Pictograms and Cartograms 6.5.1 One-dimensional diagrams: In such diagrams, only one-dimensional measurement, i.e height or length is used and the width is not considered. These diagrams are in the form of bar or line charts and can be classified as 1. Line diagram 4. Percentage bar diagram 2. Simple bar diagram 5. Multiple bar diagram 3. Sub-divided bar diagram 1. Line diagram: Line diagram is used in case where there are many items to be shown and there is not much of difference in their values. Such diagram is prepared by drawing a vertical line for each item according to the scale. ∙ The distance between lines is kept uniform. ∙ Line diagram makes comparison easy, but it is less attractive. Ex: following data shows number of children No. of children (no.s) 0 1 2 3 4 5 Frequency 1 0 1 4 9 6 4 2 Fig 1: Line diagram showing number of children 2. Simple Bar Diagram: It is the simplest among the bar diagram and is generally used for comparison of two or more items of single variable or a simple classification of data. For example data related to export, import, population, production, profit, sale, etc... for different time
  • 33. Dr. Mohan Kumar, T. L. 33 periods or region. ∙ Simple bar can be drawn vertical or horizontal bar diagram with equal width. ∙ The heights of bars are proportional to the volume or magnitude of the characteristics. ∙ All bars stand on the same base line. ∙ The bars are separated from each other by equal interval. ∙ To make the diagram attractive, the bars can be coloured. Ex: Population in different states P o p u l a t i o n ( m ) 1 9 5 1 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 U P A P M H c Fig 2: Simple bar diagram showing population in different states 3. Sub-divided bar diagram: If we have multi character data for different attributes, we use subdivided or component bar diagram. In a sub-divided bar diagram, the bar is sub-divided into various parts in proportion to the values given in the data and the whole bar represent the total. Such diagram shows total as well as various components of total. Such diagrams are also called component bar diagrams. ∙ Here, instead of placing the bars for each component side by side we may place these one on top of the other. ∙ The sub divisions are distinguished by different colours or crossings or dottings. ∙ An index or key showing the various components represented by colors, shades, dots, crossing, etc... should be given. Ex: Fallowing table gives the expenditure of families A & B on the different items. Item of expenditure Family (A) (Rs) Family (B) (Rs) Food 1400 2400 House rent 1600 2600 Population (million) Year UP AP MH 195 1 63.2 2 31.2 5 29.9 8
  • 34. Dr. Mohan Kumar, T. L. 34 Education 1200 1600 Savings 800 1400 TOTAL 5000 8000 Fig 3: Sub-divided bar diagram indicating expenditure of families A & B 4. Percentage bar diagram or Percentage sub-divided bar diagram: This is another form of component bar diagram. Sometimes the volumes or values of the different attributes may be greatly different in such cases sub-divided bar diagram can’t be used for making meaningful comparisons, and then components of attributes are reduced to percentages. Here the components are not the actual values but converted into percentages of the whole. The main difference between the sub-divided bar diagram and percentage bar diagram is that in the sub-divided bar diagram the bars are of different heights since their totals may be different whereas in the percentage bar diagram latter the bars are of equal height since each bar represents 100 percent. In the case of data having sub-division, percentage bar diagram will be more appealing than sub-divided bar diagram. Different components are converted to percentages using following formula: Percentage = x 100 Actual value Total of actual value Ex: Expenditure of family A and Family B. Item of expenditure Family (A) (Rs) % Famil y (B) (Rs) % Food 1400 28 2400 30 House rent 1600 32 2600 32.5 Education 1200 24 1600 20 Savings 800 16 1400 17.5 TOTAL 5000 8000
  • 35. Dr. Mohan Kumar, T. L. 35 Fig 3: Percentage bar diagram indicating expenditure of families A & B 5. Multiple or Compound bar diagram: This type of diagram is used to facilitate the comparison of two or more sets of inter-related phenomenon over a number of years or regions. ∙ Multiple bar diagram is simply the extension of simple bar diagram. ∙ Bars are constructed side by side to represent the set of values for comparison. ∙ The different bars for period or related phenomenon are placed together. ∙ After providing some space, another set of bars for next time period or phenomenon are drawn. ∙ In order to distinguish bars, different colour or crossings or dotting, etc... may be used ∙ Same type of marking or coloring should be done under each attribute. ∙ An index or foot note has to be prepared to identify the meaning of different colours or dottings or crossing. Ex: Population under different states. (Double bar diagram) Fig 4: Multiple bar diagram indicating expenditure of families A & B 6.5.2 Two-dimensional diagrams: In one-dimensional diagrams, only length is taken into account. But in two-dimensional diagrams the area represents the data, therefore both length and width have taken into account. Such diagrams are also called Area diagrams or Surface diagrams. The important types of area diagrams are: Rectangles, Squares, Circles and Pie-diagrams. Pie-Diagram or Angular Diagram: Pie-diagram are very popular diagram used to represent the both the total magnitude and its different component or sectors parts. The circle represents the total magnitude of the variable. The various segments are represented proportionately by the various components of the total. Addition of these segments gives the complete circle. Population (million) Year UP AP MH
  • 36. Dr. Mohan Kumar, T. L. 36 Such a component circular diagram is known as Pie or Angular diagram. While making comparisons, pie diagrams should be used on a percentage basis and not on an absolute basis. Procedure for Construction of Pie Diagram 1) Convert each component of total into corresponding angles in degrees. Degree (Angle) of any component can be calculated by following formula. Angle = ( Actual value Total of actual value 3600 Angles are taken to the nearest integral values. 2) Using a compass draw a circle of any convenient radius. (Convenient in the sense that it looks neither too small nor too big on the paper.) 3) Using a protractor divide the circle in to sectors whose angles have been calculated in step-1. Sectors are to be in the order of the given items. 4) Various component parts represented by different sector can be distinguished by using different shades, designs or colours. 5) These sectors can be distinguished by their labels, either inside (if possible) or just outside the circle with proper identification. Ex: The cropping pattern in Karnataka in the year 2001-2002 was as fallows. CROPS AREA(h a) Angle in (degrees) Cereals 3940 214 0 Oil seeds 1165 63 0 Pulses 464 25 0 Cotton 249 13 0 Others 822 45 0 Total 6640 360 0 6.5.3 Three-dimensional diagrams:
  • 37. Dr. Mohan Kumar, T. L. 37 Three-dimensional diagrams, also known as volume diagram, consist of cubes, cylinders, spheres, etc. In theses diagrams three things, namely length, width and height have to be taken into account. Ex: Cubes, cylinders, spears etc... 6.5.4 Pictogram and Cartogram: i) Pictogram: The technique of presenting the data through picture is called as pictogram. In this method the magnitude of the particular phenomenon, being studied, is drawn. The sizes of the pictures are kept proportional to the values of different magnitude to be presented. ii) Cartogram: In this technique, statistical facts are presented through maps accompanied by various type of diagrammatic presentation. They are generally used to presents the facts according to geographical regions. Population and its other constituent like birth, death, growth, density, production, import, exports, and several other facts can be presented on the maps with certain colours, dots, cross, points etc...
  • 38. Dr. Mohan Kumar, T. L. 38
  • 39. Dr. Mohan Kumar, T. L. 39 Chapter 7: GRAPHICAL REPRESENTATION OF DATA 7.1 Introduction From the statistical point of view, graphic presentation of data is more appropriate and accurate than the diagrammatic representation of the data. Diagrams are limited to visual presentation of categorical and geographical data and fail to present the data effectively relating to time-series and frequency distribution. In such cases, graphs prove to be very useful. A graph is a visual form of presentation of statistical data, which shows the relationship between two or more sets of figures. A graph is more attractive than a table of figure. Even a common man can understand the message of data from the graph. Comparisons can be made between two or more phenomena very easily with the help of a graph. The word graph associated with the word “Graphic”, which means “Vivid” or “Spraining to life”. Vivid means evoking life like image within mind. 7. 2 The difference between graph and diagram : Sl. No. Diagram Graphs 1 Diagrams are represent by diagram & pictures viz. bars, squares, circles, cubes etc. Graphs are represented by points (dots and lines). 2 Diagrams can be drawn on plain paper and any sort of paper. Graphs can be drawn only on graph paper. 3 Diagrams cannot be used to find measures of central tendency such as median, mode etc. Graphs can be used to locate measures of central tendency such as median, mode etc. 4 Diagrams are used to represent categorical or geographical data. Graphs are used to represent frequency distribution and time series data 5 Diagrams can be represented as an approximate idea. Graphs represented data as an exact information. 6 Diagrams are more effective and impressive. Graphs are not more effective and impressive.
  • 40. Dr. Mohan Kumar, T. L. 40 7 Diagrams have everlasting effect. Graphs don’t have everlasting effect. 7.3 Advantage/function of graphical representation 1. It facilitates comparison between different variables. 2. It explains the correlation or relationship between two different variable or events. 3. It helps on finding out the effect of the all other factors on the change of the main factor under study. 4. Its helps in forecasting on the basis of present data or previous data. 5. It helps in planning statistical analysis and general procedures of research study. 6. For representing frequency distribution, diagrams are rarely used when compared with graphs. For example, for the time series data, graphs are more appropriate than diagrams. 7.4 Limitations: 1. The graph cannot show all those facts which are there in the tables. 2. The graph can show the approximate value only, while table gives exact value. 3. The graph takes more time to draw than tables. 4. Graphs does not reveal the accuracy of data, they show the fluctuation of data The technique of presenting the statistical data by graphic curve is generally used to depict two types of statistical series: I. Time-Series data and II. Frequency Distribution. 7.5. Time-Series Graph or Historigrams: Graphical representation of time-series data is known as Historigram. In this case, time is represented on the X-axis and the magnitude of the variable on the Y-axis. Taking the time scale as x-coordinate and the corresponding magnitude of variable as the y-coordinate, points are plotted on the graph paper, and they are joined by lines. Ex: Time-series graphs on export, import, area under irrigation, sales over years. 1) One Variable Historigram: In this graphs only one variable is to be represented graphically. Here, time scale is plotted on the x-axis and the other variable is on the y-axis. The various points thus obtained are joined by straight line.
  • 41. Dr. Mohan Kumar, T. L. 41 Fig7.1: Cattle sales over different years 2) Historigram of Two or More Than Two Variables (Single Scale): Time-series data relating to two or more variables measured in the same units and belonging to same time period can well be plotted together in the same graph using the same scales for all the variables along Y-axis and same scale for time along X-axis for each variable. Here we get a number of curves, one for each variable. Hence it is essential to depict the each graph by different lines, viz. thin and thick, lines, dotted lines, dash lines, dash-dot lines etc.. Fig 7.2. Historigram of Two or More Than Two Variables 3) Historigram with Two Scales: Sometimes variable to be plotted on Y-axis are expressed in two different units, viz, Rs. Kg. Acres, Km. etc... In such cases, one value with some scale is plotted on the left Y-axis and other values with others scale on right Y-axis. 4) Belt Graph or Band Curve: A band graph is a type of line graph which shows the total for successive time periods broken-up into sub-totals for each of the components of the total. The various components parts are plotted one over the other. The graphs between the successive lines are filled by different shades, colors, etc... Belt graph is also known as constituent element chart or component part line chart. 5) Range Graph: It is used to depict and emphasize the range of variation of a phenomenon for each period. For instance, it may be used to show the maximum and minimum temperature of days of place, price of the commodity on different period of time, etc...
  • 42. Dr. Mohan Kumar, T. L. 42 7.6 Frequency Distribution Graphs: Frequency distribution may also be presented graphically in any of the following way, in which the measurement, class-limits or mid-values are taken along horizontal (X-axis) and frequencies along Y-axis. 1. Histogram 2. Frequency Polygon 3. Frequency Curve 4. Ogives or Cumulative frequency curve 1. Histogram: Histogram is the most popular and widely used graph for presentation of frequency distributions. In histogram, data are plotted as a series of rectangles or bars. The height of each rectangle or bars represents the frequency of the class interval and width represents the size of the class intervals. The area covered by histogram is proportional to the total frequencies represented. Each rectangle is formed adjacent to other so as to give a continuous picture. Histogram is also called staircase or block diagram. There are as many rectangles as many classes. Class intervals are shown on the X-axis and the frequencies on the Y-axis. Ex: Systolic Blood Pressure (BP) in mm of people Systolic BP No.of persons 100-109 7 110-119 16 120-129 19 130-139 31 140-149 41 150-159 23 160-169 10 170-179 3 Fig 7.3: Systolic Blood Pressure (BP) in mmHg of people
  • 43. Dr. Mohan Kumar, T. L. 43 Construction of Histogram: i) Construction Histogram for frequency distributions having equal class intervals: i) Convert the data into the exclusive class intervals if it is given in the inclusive class intervals. ii) Each class interval is drawn on the X-axis by section or base (width of rectangle) which is equal to the magnitude of class interval. On the Y-axis, we have to plot the corresponding frequencies. iii) Build the rectangles on each class-intervals having height proportional to the corresponding frequencies of the classes. iv) It should be kept in mind that rectangles are drawn adjacent to each other. These adjacent rectangles thus formed gives histogram of frequency distribution. 2) Histogram for frequency distributions having un-equal class intervals: i) In case of frequency distribution of un-equal class interval, it becomes bit difficult to construct a histogram. ii) In such cases, a correction of un-equal class interval is essential by determining the “frequency density” or “relative frequency”. iii) Here height of bar in histogram constitutes the frequency density instead of frequency, which are plotted on the Y-axis. iv) The frequency density is determined using the following formula: Frequency density = Frequency of Class Interval Magnitude (Width) of class interval Drawbacks of Histogram: Construction of histograms is not possible for open-end class intervals Remarks: 1) Histogram can be drawn only when the frequency distribution is continuous frequency distribution. 2) Histogram can be used to graphically locate the Mode value. Difference between Histogram and Bar diagrams: Histogram Bar diagrams Histograms are two dimensional (area) diagrams which consider height & width Bar diagrams are one dimensional which consider only height Bars are placed adjacent to each other Bars are placed such that there exist uniform distance between two bars
  • 44. Dr. Mohan Kumar, T. L. 44 Class frequencies are shown by area of rectangle. Volumes/magnitude are shown by the height of the bars Histogram is used to represent frequency distribution data Bar diagrams are used to represent geographical and categorical data. 2. Frequency Polygon: Frequency polygon is another way of graphical presentation of a frequency distribution; it can be drawn with the help of histogram or mid-points. If we mark the midpoints of the top horizontal sides of the rectangles in a histogram and join them by a straight line or using scale, the figure so formed is called as frequency polygon (Using histogram). This is done under the assumption that the frequencies in a class interval are evenly distributed throughout the class. The frequencies of the classes are pointed by dots against the mid-points of each class intervals. The adjacent dots are then joined by straight lines or using scale. The resulting graph is known as frequency polygon (Using mid-points or without histogram). The area of the polygon is equal to the area of the histogram, because the area left outside is just equal to the area included in it. Fig 7.4 :Frequency Polygon Difference between Histogram and Frequency Polygon: Histogram Frequency Polygon Histogram is two dimensional Frequency Polygon is multi-dimensional Histogram is bar graph Frequency Polygon is a line graph Only one histogram can be plotted on same axis. Several Frequency Polygon can be plotted on the same axis
  • 45. Dr. Mohan Kumar, T. L. 45 Histogram is drawn only for continuous frequency distribution Frequency Polygon can be drawn for both discrete and continuous frequency distribution 3. Frequency Curve: Similar to frequency polygon, frequency curve can be drawn with the help of histogram or mid-points. Frequency curve is obtained by joining the mid-points of the tops of the rectangles in a histogram by smooth hand curve or free hand curve (Using Histogram). The frequencies of the classes are pointed by dots against the mid-points of each class. The adjacent dots are then joined by smooth hand curve or free hand curve. The resulting graph is known as frequency curve (Using mid-points or without histogram). Fig 7.5: Frequency Curve 4. Ogives or Cumulative Frequency Curve: For a set of observations, we know how to construct a frequency distribution. In some cases we may require the number of observations less than a given value or more than a given value. This is obtained by accumulating (adding) the frequencies up to (or above) the give value. This accumulated frequency is called cumulative frequency. These cumulative frequencies are then listed in a table is called cumulative frequency table. The curve is obtained by plotting cumulative frequencies is called a cumulative frequency curve or an ogive curve. There are two methods of constructing ogive namely: i) The ‘less than ogive’ method. ii) The ‘more than ogive’ method. i) The ‘Less than Ogive’ method: In this method, the frequencies of all preceding class-intervals are added to the frequency of a class. Here we start with the upper limits of the classes and go on adding the frequencies. After plotting these less than cumulated frequencies against
  • 46. Dr. Mohan Kumar, T. L. 46 the upper class boundaries of the respective classes we get ‘Less than Ogive’, which is an increasing curve, sloping upwards from the left to right and has elongated S shape. ii) The ‘More than Ogive’ method: In this method, the frequencies of all succeeding class-intervals are subtracted to the frequency of a class. Here we start with the lower limits of the classes and go on subtracting the frequencies. After plotting these more than cummulated frequencies against the lower class boundaries of the respective classes we get ‘More than Ogive’, which is a decreasing curve, sloping downwards from the left to right and has elongated S shape on upside down. Fig 7.6 : Less than and more than ogive curve Remarks: Less than ogive and more than ogive can be drawn on the same graph. The interaction between less than ogive and more than ogive gives the median value. Advantage of Ogive curve: 1. Ogive curves are useful for graphic computation of partition values like median, quartiles, deciles, percentiles. 2. They can be used to determine the graphically the portion of observations below/ above the given values or lying between certain intervals. 3. They can be used as cumulative percentage curve or percentile curves. 4. They are more suitable for comparison of two or more frequency distributions than simple frequency curve.
  • 47. Dr. Mohan Kumar, T. L. 47 Chapter 8: MEASURES OF CENTRAL TENDENCY or AVERAGE 8.1 Introduction While studying the population with respect to variable/characteristic of our interest, we may get a large number of raw observations which are uncondensed form. It is not possible to grasp any idea about the characteristic by looking at all the observations. Therefore, it is better to get single number for each group. That number must be a good representative one for all the observations to give a clear picture of that characteristic. Such representative number can be a central value for all these observations. This central value is called a measure of central tendency or an average or measure of locations. 8.2 Definition: “A measure of central tendency is a typical value around which other figures congregate.” 8.3 Objective and function of Average 1) To provide a single value that represents and describes the characteristic of entire group. 2) To facilitate comparison between and within groups. 3) To draw a conclusion about population from sample data. 4) To form a basis for statistical analysis. 8.4 Essential characteristics/Properties/Pre-requisite for a good or an ideal Average: The following characteristics should possess for an ideal average. 1. It should be easy to understand and simple to compute. 2. It should be rigidly defined. 3. Its calculation should be based on all the items/observations in the data set. 4. It should be capable of further algebraic treatment (mathematical manipulation). 5. It should be least affected by sampling fluctuation. 6. It should not be much affected by extreme values. 7. It should be helpful in further statistical analysis. 8.5 Types of Average Mathematical Average Positional Average Commercial Average
  • 48. Dr. Mohan Kumar, T. L. 48 1) Arithmetic Mean or Mean i) Simple Arithmetic Mean ii) Weighted Arithmetic Mean iii) Combined Mean 2) Geometric Mean 3) Harmonic Mean 1) Median 2) Mode 3) Quantiles i) Quartiles ii) Deciles iii) Percentiles 1) Moving Average 2) Progressive Average 3) Composite Average 8.6 Mathematical Average: The average calculated by well defined mathematical formula is called as mathematical average. It is calculated by taking into account of all the values in the series. Ex: Arithmetic mean, Geometric mean, Harmonic mean 1) Arithmetic Mean (AM) or Mean: Arithmetic Mean is most popular and widely used measure of average. It is defined as the sum of all the individual observations divided by total number of observations. Arithmetic Mean is denoted by . ̅ X = = ̅ X Sum of all the observations Total number of observations ∑X n is denote the sum of all the observation and n is number of observations.where∑X i) Simple Arithmetic Mean/ Simple Mean: Simple Arithmetic mean is defined as the sum of all the individual observations divided by total number of observations. Simple arithmetic mean gives same weightage to all the observation in the series, so it is called simple. Computation of Simple Arithmetic Mean: i) For raw data/individual-series/ungrouped data: If are ‘n’ observations, then their arithmetic mean ( is given by:, …….x1 x2 xn ) ̅ X
  • 49. Dr. Mohan Kumar, T. L. 49 a) Direct Method: = = , i =1,2,..n ̅ X + + ………… +x1 x2 xn n n ∑i =1 xi n where, = sum of the given observations∑n i =1 xi n = number of observations b) assumed mean/ short-cut method: =A + , i =1,2,..n ̅ X n ∑i =1 di n where, A = the assumed mean or any value in x = Deviation of ith value from the assumed mean-A=xdi i n = number of observations ii) For frequency distribution data: 1) Discrete frequency distribution (Ungrouped frequency distribution) data: If are ‘k’ observations with corresponding frequencies , then, …….x1 x2 xk , …….f1 f2 fk their arithmetic mean ( is given by:) ̅ X a) Direct Method: = = , i =1,2,..k ̅ X + + ………… +f1x1 xf2 2 fkxk + +… +f1 f2 fk k ∑i =1 xfi i N where, = the sum of product of ith observation and its frequency∑k i =1 xfi i = the sum of the frequencies or total frequencies.N =∑k i =1fi K= number of class b) Assumed Mean/ Short-Cut Method: =A + , i =1,2,..k ̅ X k ∑i =1 dfi i N where, A = the assumed mean or any value in x = the sum of the frequencies or total frequencies.N =∑k i =1fi
  • 50. Dr. Mohan Kumar, T. L. 50 = the deviation of ith value from the assumed mean-A=xdi i = the sum of product of deviation and its frequency∑k i =1 dfi i 2) Continuous frequency distribution (Grouped frequency distribution) data: If represents the mid-points of k class-interval, …….m1 m2 mk with corresponding frequencies , then their- , - ,..., -- , xx0 x1 1 x2 x2 x3 xk -1 xk , …….f1 f2 fk arithmetic mean ( is calculated by:) ̅ X a) Direct Method: = = , i =1,2,..k ̅ X + + ………… +f1m1 mf2 2 fkmk + +… +f1 f2 fk k ∑i =1 mfi i N where, = mid-points or mid values of class-intervals.mi = the sum of product of ith observation and its frequency.∑k i =1 mfi i = the sum of the frequencies or total frequencies.N =∑k i =1fi b) Assumed Mean/ Short-Cut Method: =A + , i =1,2,..k ̅ X k ∑i =1 dfi i N where, A = the assumed mean or any value in x = the sum of the frequencies or total frequencies.N =∑k i =1fi is the deviation of ith value from the assumed mean=mi -Adi = the sum of product of deviation and its frequency∑k i =1 dfi i c) Step-Deviation Method: =A + ×C, i =1,2,..k ̅ X k ∑i =1 fid' i N where, A = the assumed mean or any value in x. = the sum of the frequencies or total frequencies.N =∑k i =1fi = the deviation of ith value from the assumed mean.=d' i -A)(mi C
  • 51. Dr. Mohan Kumar, T. L. 51 C = Width of the class interval. Merits of Arithmetic Mean: 1. It is simplest and most widely used average. 2. It is easy to understand and easy to calculate. 3. It is rigidly defined. 4. Its calculation is based on all the observations. 5. It is suitable for further mathematical treatment. 6. It is least affected by the fluctuations of sampling as possible. 7. If the number of items is sufficiently large, it is more accurate and more reliable. 8. It is a calculated value and is not based on its position in the series. 9. It provides a good basis for comparison. Demerits of Arithmetic Mean: 1. It cannot be obtained by inspection nor can be located graphically. 2. It cannot be used to study qualitative phenomenon such as intelligence, beauty, honesty etc. 3. It is very much affected by extreme values. 4. It cannot be calculated for open-end classes. 5. The A. M. computed may not be the actual item in the series 6. Its value can’t be determined if one or more number of observations are missing in the series. 7. Some time A.M. gives absurd results ex: number of child per family can’t be in fraction. Uses of Arithmetic Mean 1. Arithmetic Mean is used to compare two or more series with respect to certain character. 2. It is commonly & widely used average in calculating Average cost of production, Average cost of cultivation, Average cost of yield per hectare etc... 3. It is used in calculating standard deviation, coefficient of variance. 4. It is used in calculating correlation co-efficient, regression co-efficient. 5. It is also used in testing of hypothesis and finding confidence limit. Mathematical Properties of the Arithmetic Mean
  • 52. Dr. Mohan Kumar, T. L. 52 1. The sum of the deviation of the individual items from the arithmetic mean is always zero. i.e. ∑ ( – ) = 0xi ̅ x 2. The sum of the squared deviation of the individual items from the arithmetic mean is always minimum. i.e. ∑ = minimum( – )xi ̅ x 2 3. The Standard Error of A.M. is less than that of any other measures of central tendency. 4. If are the means of ‘n’ samples of size respectively, then, ,….. ̅ x 1 ̅ x 2 ̅ x k , …….n1 n2 nk their combined mean is given by = ̿ X + ……… +n1 ̅ x 1 n2 ̅ x 2 nk ̅ x k + + ………. +n1 n2 nk 5. Arithmetic mean is dependent on change of both Origin and Scale (i.e. If each value of a variable X is added or subtracted or multiplied or divided by a constant values k, the arithmetic mean of new series will also increases or decreases or multiplies or division by the same constant value k.) 6. If any two of the three values viz. A.M. ( ), Total of the items ( ) and number of ̅ X ∑X observation ( ) are know, then third value can be easily find out.n ii) Weighted Arithmetic Mean ( :) ̅ X w In the computation of arithmetic mean, it gives equal importance to each item in the series. But when different observations are to be given different weights, arithmetic mean does not prove to be a good measure of central tendency. In such cases weighted arithmetic mean is to be calculated. If each value of the variable is multiplied by its weight & the resulting product is totaled, then the total is divided by total weight gives the weighted arithmetic mean. If are ‘n’ values of a variable ‘x’ with respective weights are, …….x1 x2 xn , ...w1 w2 wn assigned to them, then the weighted arithmetic mean is given by: = = ̅ X w + + ………… +w1x1 xw2 2 wnxn + +… +w1 w2 wn n ∑i =1 xwi i n ∑i =1 wi
  • 53. Dr. Mohan Kumar, T. L. 53 Uses of the weighted mean: Weighted arithmetic mean is used in: 1. Construction of index numbers. 2. Comparison of results of two or more groups where number of items differs in each group. 3. Computation of standardized death and birth rates. 4. When values of items are given in percentage or proportion. 2) Geometric Mean (GM): The geometric mean is defined as nth root of the product of all the n observations. If are ‘n’ observations, then geometric mean is given by,x1 x2.…….xn where, n = number of observationsGM = . .….….x1 x2 xn n Computation of Geometric Mean: i) For raw data/individual-series/ungrouped data: If are ‘n’ observations, then their geometric mean is calculated by:, …….x1 x2 xn GM = =. .….….x1 x2 xn n ( . .….…. )x1 x2 xn 1/n Or GM =anti log ( n ∑i =1 log10xi n ) ii) For frequency distribution data: 1) Discrete frequency distribution (Ungrouped frequency distribution) data: If are ‘k’ observations with corresponding frequencies , then, …….x1 x2 xk , …….f1 f2 fk their geometric mean is computed by: ;GM = =. .….…....xf1 1 xf2 2 xfk k N ( . .….….... )xf1 1 xf2 2 xfk k 1/N Or GM =anti log ( k ∑i =1 ( )logfi 10 xi N ) where, = the sum of the frequencies or total frequenciesN =∑k i =1fi
  • 54. Dr. Mohan Kumar, T. L. 54 2) Continuous frequency distribution (Grouped frequency distribution) data: If represents the mid-points of k class-interval, …….m1 m2 mk with their corresponding frequencies , then the- , - ,... , -- , xx0 x1 1 x2 x2 x3 xk -1 xk , …….f1 f2 fk geometric mean (GM) is calculated by: ;GM = =. .….…....mf1 1 mf2 2 mfk k N ( . .….….... )mf1 1 mf2 2 mfk k 1/N Or GM =anti log ( k ∑i =1 logfi 10mi N ) where, = the sum of the frequencies or total frequenciesN =∑k i =1fi Mid-points / mid values of class intervals=mi Merits of Geometric mean: 1. It is rigidly defined. 2. It is based on all observations. 3. It is capable of further mathematical treatment. 4. It is not affected much by the fluctuations of sampling. 5. Unlike AM, it is not affected much by the presence of extreme values. 6. It is very suitable for averaging ratios, rates and percentages. Demerits of Geometric mean: 1. Calculation is not simple as that of A.M and not easy to understand. 2. The GM may not be the actual value of the series. 3. It can’t be determined graphically and inspection. 4. It cannot be used when the values are negative because if any one observation is negative, G.M. becomes meaningless or doesn’t exist. 5. It cannot be used when the values are zero, because if any one observation is zero, G. M. becomes zero. 6. It cannot be calculated for open-end classes. Uses of G. M.: The Geometric Mean has certain specific uses, some of them are: 1. It is used in the construction of index numbers. 2. It is also helpful in finding out the compound rates of change such as the rate of growth of population in a country, average rates of change, average rate of interest etc..
  • 55. Dr. Mohan Kumar, T. L. 55 3. It is suitable where the data are expressed in terms of rates, ratios and percentage. 4. It is most suitable when the observations of smaller values are given more weightage or importance. 3) Harmonic Mean (HM): Harmonic mean of set of observations is defined as the reciprocal of the arithmetic mean of the reciprocal of the given observations. If are ‘n’ observations, then harmonic mean is given by,x1 x2.…….xn HM = = n + +….. 1 x1 1 x2 1 xn n ∑(1 xi ) where, n = number of observations Computation of Harmonic Mean: i) For raw data/individual-series/ungrouped data: If are ‘n’ observations, then their harmonic mean is given by:, …….x1 x2 xn HM = = n + +….. 1 x1 1 x2 1 xn n ∑(1 xi ) ii) For frequency distribution data : 1) Discrete frequency distribution (Ungrouped frequency distribution) data: If are ‘k’ observations with corresponding frequencies , then their, …….x1 x2 xk , …….f1 f2 fk geometric mean is computed by: HM = = ∑fi + +….. f1 x1 f2 x2 fk xk N k ∑1 (fi xi ) where, = the sum of the frequencies or total frequenciesN =∑k i =1fi 2) Continuous frequency distribution (Grouped frequency distribution) data: If represents the mid-points of k class-interval, …….m1 m2 mk with their corresponding frequencies , then the HM- , - ,... , -- , xx0 x1 1 x2 x2 x3 xk -1 xk , …….f1 f2 fk is calculated by:
  • 56. Dr. Mohan Kumar, T. L. 56 HM = = ∑fi + +….. f1 m1 f2 m2 fk mk N k ∑1 (fi mi ) where, = the sum of the frequencies or total frequenciesN =∑k i =1fi Mid-points / mid values of class intervals=mi Merits of H.M.: 1. It is rigidly defined. 2. It is based on all items is the series. 3. It is amenable to further algebraic treatment. 4. It is not affected much by the fluctuations of sampling. 5. Unlike AM, it is not affected much by the presence of extreme values. 6. It is the most suitable average when it is desired to give greater weight to smaller observations and less weight to the larger ones. Demerits of H.M: 1. It is not easily understood and it is difficult to compute. 2. It is only a summary figure and may not be the actual item in the series. 3. Its calculation is not possible in case the values of one or more items is either missing, or zero 4. Its calculation is not possible in case the series contains negative and positive observations. 5. It gives greater importance to small items and is therefore, useful only when small items have to be given greater weightage 6. It can’t be determined graphically and inspection. 7. It cannot be calculated for open-end classes. Uses of H. M.: H.M. is greater significance in such cases where prices are expressed in quantities (unit/prices). H.M. is also used in averaging time, speed, distance, quantity etc... for example if you want to find out average speed travelled in km, average time taken to travel, average distance travelled etc... 8.7 Positional Averages: These averages are based on the position of the observations in arranged (either
  • 57. Dr. Mohan Kumar, T. L. 57 ascending or descending order) series. Ex: Median, Mode, quartile, deciles, percentiles. 1) Median: Median is the middle most value of the series of the data when the observations are arranged in ascending or descending order. The median is that value of the variate which divides the group into two equal parts, one part comprising all values greater than middle value, and the other all values less than middle value. Computation of Median: i) For raw data/individual-series/ungrouped data: If are ‘n’ observations, then arrange the given values in the ascending, …….x1 x2 xn (increasing) or descending (decreasing) order. Case I: If the number of observations (n) is equal to odd number, median is the middle value. i.e. Median =Md = itemof the x variable( n +1 2 ) th Case II: If the number of observations (n) is equal to even number, median is the mean of middle two values i.e.Median =Md =Average of & items of the x variable( n 2) th ( +1 n 2 ) th ii) For frequency distribution data : 1) Discrete frequency distribution (Ungrouped frequency distribution) data: If are ‘k’ observations with corresponding frequencies , then, …….x1 x2 xk , …….f1 f2 fk their median can be find out using following steps: Step1: Find cumulative frequencies (CF). Step2: Obtain total frequency (N) and Find . Where is total frequencies. N +1 2 N =∑k i =1fi Step3: See in the cumulative frequencies the value just greater than , Then the N +1 2 corresponding value of x is median. 2) Continuous frequency distribution (Grouped frequency distribution) data: If represents the mid-points of k class-interval, …….m1 m2 mk with their corresponding frequencies , then the- , - ,... , -- , xx0 x1 1 x2 x2 x3 xk -1 xk , …….f1 f2 fk steps given below are followed for the calculation of median in continuous series.
  • 58. Dr. Mohan Kumar, T. L. 58 Step1: Find cumulative frequencies (CF). Step2: Obtain total frequency (N) and Find . Where total frequencies N 2 N =∑n i =1fi Step3: See in the cumulative frequency the value first greater than value. Then the( N 2) th corresponding class interval is called the Median class. Then apply the formula given below. Median =Md = L + [ ( c -c.f. N 2 f ] Where, L = lower limit of the median class. N = Total frequency f = frequency of the median class c.f. = cumulative frequency class preceding the median class C = width of class interval. Graphic method for Location of median: Median can be located with the help of the cumulative frequency curve or ‘ogive’ . The procedure for locating median in a grouped data is as follows: Step1: The class boundaries, where there are no gaps between consecutive classes, i.e. exclusive class are represented on the horizontal axis (x-axis). Step2: The cumulative frequency corresponding to different classes is plotted on the vertical axis (y-axis) against the upper limit of the class interval (or against the variate value in the case of a discrete series.) Step3: The curve obtained on joining the points by means of freehand drawing is called the ‘ogive’ . The ogive so drawn may be either a (i) less than ogive or a (ii) more than ogive. Step4: The value of N/2 is marked on the y-axis, where N is the total frequency. Step5: A horizontal straight line is drawn from the point N/2 on the y-axis parallel to x-axis to meet the ogive. Step6: A vertical straight line is drawn from the point of intersection perpendicular to the horizontal axis.
  • 59. Dr. Mohan Kumar, T. L. 59 Step7: The point of intersection of the perpendicular to the x-axis gives the value of the median. Fig 6.1: Graphic method for location of median Remarks: 1. From the point of intersection of ‘ less than’ and ‘more than’ ogives, if a perpendicular is drawn on the x-axis, the point so obtained on the horizontal axis gives the value of the median. Fig 6.2: Graphic method for location of median Merits of Median: 1. It is easily understood and is easy to calculate.
  • 60. Dr. Mohan Kumar, T. L. 60 2. It is rigidly defined. 3. It can be located merely by inspection. 4. It is not at all affected by extreme values. 5. It can be calculated for distributions with open-end classes. 6. Median is the only average to be used to study qualitative data where the items are scored or ranked. Demerits of Median: 1. In case of even number of observations median cannot be determined exactly. We merely estimate it by taking the mean of two middle terms. 2. It is not based on all the observations. 3. It is not amenable to algebraic treatment. 4. As compared with mean, it is affected much by fluctuations of sampling. 5. If importance needs to be given for small or big item in the series, then median is not suitable average. Uses of Median 1. Median is the only average to be used while dealing with qualitative data which cannot be measure quantitatively but can be arranged in ascending or descending order. Ex: To find the average honesty or average intelligence, average beauty etc... among the group of people. 2. Used for the determining the typical value in problems concerning wages and distribution of wealth. 3. Median is useful in distribution where open-end classes are given. 2) Mode: The mode is the value in a distribution, which occur most frequently or repeatedly. It is an actual value, which has the highest concentration of items in and around it or predominant in the series. In case of discrete frequency distribution mode is the value of x corresponding to maximum frequency. Computation of mode: i) For raw data/individual-series/ungrouped data:
  • 61. Dr. Mohan Kumar, T. L. 61 Mode is the value of the variable (observation) which occurs maximum number of times. ii) For frequency distribution data : 1) Discrete frequency distribution (Ungrouped frequency distribution) data: In case of discrete frequency distribution mode is the value of x variable corresponding to maximum frequency. 2) Continuous frequency distribution (Grouped frequency distribution) data: If represents the mid-points of n class-interval, …….m1 m2 mk with corresponding frequencies .- , - ,..., -- ,xx0 x1 1 x2 x2 x3 xn -1 xn , …….f1 f2 fk Locate the highest frequency, and then the class-interval corresponding to highest frequency is called the modal class. Then apply the following formula, we can find mode: Mode =Mo = L + ×C -f1 f0 2 - -f1 f0 f2 Where, L = lower limit of the modal class. C = Class interval of the modal class = frequency of the class preceding the modal classf0 = frequency of the modal classf1 = frequency of the class succeeding the modal classf2 Graphic method for location of mode: Steps: 1. Draw a histogram of the given distribution. 2. Join the top right corner of the highest rectangle (modal class rectangle) by a straight line to the top right corner of the preceding rectangle. Similarly the top left corner of the highest rectangle is joined to the top left corner of the rectangle on the right. 3. From the point of intersection of these two diagonal lines, draw a perpendicular to the x -axis. 4. Read the value in x-axis gives the mode.
  • 62. Dr. Mohan Kumar, T. L. 62 Fig 6.3: Graphic method for Location of mode Merits of Mode: 1. It is easy to calculate and in some cases it can be located mere inspection 2. Mode is not at all affected by extreme values. 3. It can be calculated for open-end classes. 4. It is usually an actual value of an important part of the series. 5. Mode can be conveniently located even if the frequency distribution has class intervals of unequal magnitude provided the modal class and the classes preceding and succeeding it are of the same magnitude. Demerits of mode: 1. Mode is ill defined. It is not always possible to find a clearly defined mode. 2. It is not based on all observations. 3. It is not capable of further mathematical treatment. 4. As compared with mean, mode is affected to a greater extent by fluctuations of sampling. 5. It is unsuitable in cases where relative importance of items has to be considered. Remarks: In some cases, we may come across distributions with two modes. Such distributions are called bi-modal. If a distribution has more than two modes, it is said to be multimodal. Uses of Mode: Mode is most commonly used in business forecasting such as manufacturing units, garments industry etc... to find the ideal size. Ex: in business forecasting for manufacturing of readymade garments for average size of track suits, average size of dress, average size of shoes etc.... 3) Quantiles (or) Partition Values: Quantiles are the values of the variable which divide the total number of
  • 63. Dr. Mohan Kumar, T. L. 63 observations into number of equal parts when it is arranged in order of magnitude. Ex: Median, Quartiles, Deciles, Percentiles. i) Median: Median is only one value, which divides the whole series into two equal parts. ii) Quartiles: Quartiles are three in number and divide the whole series into four equal parts. They are represented by Q1, Q2, Q3 respectively. First quratile: =Q1 (n +1) 4 Second quratile: =2Q2 (n +1) 4 =3Third quratile: Q3 (n +1) 4 iii) Deciles: Deciles are nine in number and divide the whole series into ten equal parts. They are represented by D1, D2 …D9. First Decile: =D1 (n +1) 10 Second Decile: =2D2 (n +1) 10 : : =9Ninth Decile: D9 (n +1) 10 iv) Percentiles: Percentiles are 99 in number and divide the whole series into 100 equal parts. They are represented by P1, P2…P99. First Percentile: =P1 (n +1) 100
  • 64. Dr. Mohan Kumar, T. L. 64 Second Percentile: =2P2 (n +1) 100 : =99Ninty nine Percentile: P99 (n +1) 100 8.8 Commercial Averages: These are the averages which are mainly calculated based on needs in business. Ex: Moving Average, Composite Average, Progressive Average i) Moving Average (M.A.): It is a special type of A.M. calculated to obtain a trend in time-series. We can find M.A. by discarding one figure and adding next figure in sequentially and then computing A.M. of the values which have be taken by rotation. If a, b, c, d, and e are values in series, then M.A. is given by M.A = , , a +b +c 3 b +c +d 3 c +d +e 3 ii) Progressive Average (P.A.): It is a cumulative average used occasionally during the early years of the life of business. This is computed by taking the entire figure available in each succeeding years. If a, b, c, d, and e are values in series, then P.A. is given by P.A = , , , a +b 2 a +b +c 3 a +b +c +d 4 a +b +c +d +e 5 iii) Composite Average: