2. 1
Probability
Probability is a measure or estimation of how likely it is that something will happen or
that a statement is true.
Probabilities are given a value between 0 (0% chance, or will not happen) and 1 (100%
chance, or will certainly happen). The higher the degree of probability, the more likely the
event is to happen, or, in a longer series of samples, the greater the number of times such
event is expected to happen.
These concepts have been given an axiomatic mathematical derivation in probability theory, which is used widely in such areas of study as mathematics, statistics, finance, gambling, science, artificial intelligence/machine learning and philosophy to, for example,
draw inferences about the expected frequency of events. Probability theory is also used to
describe the underlying mechanics and regularities of complex systems.
1.1
Interpretations
When dealing with experiments that are random and well-defined in a purely theoretical
setting (like tossing a fair coin), probabilities describe the statistical number of outcomes
considered, divided by the number of all outcomes (tossing a fair coin twice will yield head1
head with probability , because the four outcomes head-head, head-tails, tails-head and
4
tails-tails are equally likely to occur).
When it comes to practical application, however, the word probability does not have a
singular direct definition. In fact, there are two major categories of probability interpretations, whose adherents possess conflicting views about the fundamental nature of
probability: objectivists and subjectivists.
Objectivists Objectivists assign numbers to describe some objective or physical state of
affairs. The most popular version of objective probability is frequentist probability, which
claims that the probability of a random event denotes the relative frequency of occurrence
of an experiment’s outcome, when repeating the experiment. This interpretation considers
probability to be the relative frequency “in the long run” of outcomes. A modification of this
is propensity probability, which interprets probability as the tendency of some experiment
to yield a certain outcome, even if it is performed only once.
Subjectivists Subjectivists assign numbers per subjective probability, i.e., as a degree of
belief. The degree of belief has been interpreted as, “the price at which you would buy or
sell a bet that pays 1 unit of utility if E, 0 if not E.” The most popular version of subjective
probability is Bayesian probability, which includes expert knowledge as well as experimental data to produce probabilities. The expert knowledge is represented by some (subjective) prior probability distribution. The data is incorporated in a likelihood function.
The product of the prior and the likelihood, normalized, results in a posterior probability
distribution that incorporates all the information known to date. Starting from arbitrary,
subjective probabilities for a group of agents, some Bayesians claim that all agents will
2
3. eventually have sufficiently similar assessments of probabilities, given enough evidence.
1.2
Theory
Like other theories, the theory of probability is a representation of probabilistic concepts
in formal terms—that is, in terms that can be considered separately from their meaning.
These formal terms are manipulated by the rules of mathematics and logic, and any results
are interpreted or translated back into the problem domain.
There have been at least two successful attempts to formalize probability, namely the Kolmogorov formulation and the Cox formulation. In Kolmogorov’s formulation (see probability space), sets are interpreted as events and probability itself as a measure on a class
of sets. In Cox’s theorem, probability is taken as a primitive (that is, not further analyzed) and the emphasis is on constructing a consistent assignment of probability values
to propositions. In both cases, the laws of probability are the same, except for technical
details.
There are other methods for quantifying uncertainty, such as the Dempster-Shafer theory
or possibility theory, but those are essentially different and not compatible with the laws
of probability as usually understood.
1.3
Applications
Probability theory is applied in everyday life in risk assessment and in trade on financial
markets. Governments apply probabilistic methods in environmental regulation, where
it is called pathway analysis. A good example is the effect of the perceived probability
of any widespread Middle East conflict on oil prices—which have ripple effects in the
economy as a whole. An assessment by a commodity trader that a war is more likely vs.
less likely sends prices up or down, and signals other traders of that opinion. Accordingly,
the probabilities are neither assessed independently nor necessarily very rationally. The
theory of behavioral finance emerged to describe the effect of such group think on pricing,
on policy, and on peace and conflict.
The discovery of rigorous methods to assess and combine probability assessments has
changed society. It is important for most citizens to understand how probability assessments are made, and how they contribute to decisions.
Another significant application of probability theory in everyday life is reliability. Many
consumer products, such as automobiles and consumer electronics, use reliability theory
in product design to reduce the probability of failure. Failure probability may influence a
manufacture’s decisions on a product’s warranty.
The cache language model and other statistical language models that are used in natural
language processing are also examples of applications of probability theory.
3
4. 1.4
Mathematical treatment
Consider an experiment that can produce a number of results. The collection of all results
is called the sample space of the experiment. The power set of the sample space is formed
by considering all different collections of possible results. For example, rolling a die can
produce six possible results. One collection of possible results gives an odd number on the
die. Thus, the subset {1,3,5} is an element of the power set of the sample space of die
rolls. These collections are called “events.” In this case, {1,3,5} is the event that the die
falls on some odd number. If the results that actually occur fall in a given event, the event
is said to have occurred.
A probability is a way of assigning every event a value between zero and one, with the
requirement that the event made up of all possible results (in our example, the event
{1,2,3,4,5,6}) is assigned a value of one. To qualify as a probability, the assignment of
values must satisfy the requirement that if you look at a collection of mutually exclusive
events (events with no common results, e.g., the events {1,6}, {3}, and {2,4} are all mutually exclusive), the probability that at least one of the events will occur is given by the
sum of the probabilities of all the individual events.
The probability of an event A is written as P(A), p(A) or Pr(A). This mathematical definition
of probability can extend to infinite sample spaces, and even uncountable sample spaces,
using the concept of a measure.
The opposite or complement of an event A is the event [not A] (that is, the event of A not
occurring); its probability is given by P(notA) = 1 − P(A). As an example, the chance of
1 5
not rolling a six on a six-sided die is 1 – (chance of rolling a six) = 1 − = .
6 6
If two events A and B occur on a single performance of an experiment, this is called the
intersection or joint probability of A and B, denoted as P(A ∩ B).
1.4.1
Independent probability
If two events, A and B are independent then the joint probability is
P(A and B) = P(A ∩ B) = P(A)P(B),
for example, if two coins are flipped the chance of both being heads is
1 1 1
× = .
2 2 4
Mutually exclusive If either event A or event B or both events occur on a single performance of an experiment this is called the union of the events A and B denoted as P(A∪ B).
If two events are mutually exclusive then the probability of either occurring is
P(A or B) = P(A ∪ B) = P(A) + P(B).
For example, the chance of rolling a 1 or 2 on a six-sided die is
4
5. P(1 or 2) = P(1) + P(2) =
1 1 1
+ = .
6 6 3
Not mutually exclusive If the events are not mutually exclusive then
P (A or B) = P (A) + P (B) − P (A and B) .
For example, when drawing a single card at random from a regular deck of cards, the
13 12 3
11
chance of getting a heart or a face card (J,Q,K) (or one that is both) is
+
−
=
,
52 52 52 26
because of the 52 cards of a deck 13 are hearts, 12 are face cards, and 3 are both: here
the possibilities included in the “3 that are both” are included in each of the “13 hearts”
and the “12 face cards” but should only be counted once.
1.4.2
Conditional probability
Conditional probability is the probability of some event A, given the occurrence of some
other event B. Conditional probability is written P(A | B), and is read “the probability of A,
given B”. It is defined by
P(A ∩ B)
P(A | B) =
.
P(B)
If P(B) = 0 then P(A | B) is formally undefined by this expression. However, it is possible
to define a conditional probability for some zero-probability events using a σ-algebra of
such events (such as those arising from a continuous random variable).
For example, in a bag of 2 red balls and 2 blue balls (4 balls in total), the probability of
1
taking a red ball is ; however, when taking a second ball, the probability of it being either
2
a red ball or a blue ball depends on the ball previously taken, such as, if a red ball was
1
taken, the probability of picking a red ball again would be since only 1 red and 2 blue
3
balls would have been remaining.
1.4.3
Summary of probabilities
Event
A
not A
A or B
A and B
A given B
Probability
P(A) ∈ [0, 1]
P(Ac ) = 1 − P(A)
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A ∪ B) = P(A) + P(B) {if A and B are mutually exclusive}
P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)
P(A ∩ B) = P(A)P(B) {if A and B are independent}
P(A ∩ B)
P(B|A)P(A)
P(A | B) =
=
P(B)
P(B)
5
6. 1.5
Reference
This section is based on http://en.wikipedia.org/wiki/Probability.
2
Mutually exclusive events
Two events are mutually exclusive if they cannot occur at the same time. An example is
tossing a coin once, which can result in either heads or tails, but not both.
In the coin-tossing example, both outcomes are collectively exhaustive, which means that
at least one of the outcomes must happen, so these two possibilities together exhaust all
the possibilities. However, not all mutually exclusive events are collectively exhaustive.
For example, the outcomes 1 and 4 of a single roll of a six-sided die are mutually exclusive
(cannot both happen) but not collectively exhaustive (there are other possible outcomes;
2,3,5,6).
2.1
Logic
In logic, two mutually exclusive propositions are propositions that logically cannot be true
at the same time. Another term for mutually exclusive is “disjoint”. To say that more than
two propositions are mutually exclusive, depending on context, means that one cannot be
true if the other one is true, or at least one of them cannot be true. The term pairwise
mutually exclusive always means two of them cannot be true simultaneously.
2.2
Probability
In probability theory, events E1, E2, . . . , En are said to be mutually exclusive if the occurrence of any one of them implies the non-occurrence of the remaining n − 1 events.
Therefore, two mutually exclusive events cannot both occur. Formally said, the intersection of each two of them is empty (the null event): A ∩ B = . In consequence, mutually
exclusive events have the property: P(A ∩ B) = 0.
For example, one cannot draw a card that is both red and a club because clubs are always
black. If one draws just one card from the deck, either a red card (heart or diamond)
or a black card (club or spade) can be drawn. When A and B are mutually exclusive,
P(A ∪ B) = P(A) + P(B). One might ask, “What is the probability of drawing a red card
or a club?” This problem would be solved by adding together the probability of drawing
a red card and the probability of drawing a club. In a standard 52-card deck, there are
26 13 39
3
twenty-six red cards and thirteen clubs:
+
=
or .
52 52 52
4
One would have to draw at least two cards in order to draw both a red card and a club.
The probability of doing so in two draws would depend on whether the first card drawn
were replaced before the second drawing, since without replacement there would be one
fewer card after the first card was drawn. The probabilities of the individual events (red,
6
7. and club) would be multiplied rather than added. The probability of drawing a red and
26 13
338
13
a club in two drawings without replacement would be
×
=
, or
. With
52 51
2652
102
26 13
338
13
replacement, the probability would be
×
=
, or
.
52 52 2704
104
In probability theory the word “or” allows for the possibility of both events happening. The
probability of one or both events occurring is denoted P(A ∪ B) and in general it equals
P(A) + P(B) − P(A ∩ B). Therefore, if one asks, “What is the probability of drawing a red
card or a king?”, drawing any of a red king, a red non-king, or a black king is considered
a success. In a standard 52-card deck, there are twenty-six red cards and four kings, two
26
4
2
28
of which are red, so the probability of drawing a red or a king is
+
−
=
.
52 52 52
52
However, with mutually exclusive events the last term in the formula, – P(A ∩ B), is zero,
so the formula simplifies to the one given in the previous paragraph.
Events are collectively exhaustive if all the possibilities for outcomes are exhausted by
those possible events, so at least one of those outcomes must occur. The probability that
at least one of the events will occur is equal to 1. For example, there are theoretically
only two possibilities for flipping a coin. Flipping a head and flipping a tail are collectively
exhaustive events, and there is a probability of 1 of flipping either a head or a tail. Events
can be both mutually exclusive and collectively exhaustive. In the case of flipping a coin,
flipping a head and flipping a tail are also mutually exclusive events. Both outcomes
cannot occur for a single trial (i.e., when a coin is flipped only once). The probability
of flipping a head and the probability of flipping a tail can be added to yield a probability
1 1
of 1: + = 1.
2 2
2.3
Reference
This section is based on http://en.wikipedia.org/wiki/Mutually_exclusive_
events.
3
Independence
In probability theory, to say that two events are independent (alternatively called statistically independent or stochastically independent) means that the occurrence of one does
not affect the probability of the other. Similarly, two random variables are independent if
the realization of one does not affect the probability distribution of the other.
The concept of independence extends to dealing with collections of more than two events
or random variables.
7
8. 3.1
Definition for two events
Two events A and B are independent if and only if their joint probability equals the product
of their probabilities:
P(A ∩ B) = P(A)P(B)
Why this defines independence is made clear by rewriting with conditional probabilities:
P(A ∩ B)
P(B)
⇔ P(A) = P(A | B)
P(A ∩ B) = P(A)P(B) ⇔ P(A) =
and similarly
P(A ∩ B) = P(A)P(B) ⇔ P(B) = P(B | A).
Thus, the occurrence of B does not affect the probability of A, and vice versa. Although
the derived expressions may seem more intuitive, they are not the preferred definition, as
the conditional probabilities may be undefined if P(A) or P(B) are 0. Furthermore, the
preferred definition makes clear by symmetry that when A is independent of B, B is also
independent of A.
3.2
Definition for more than two events
A finite set of events {Ai } is pairwise independent if and only if every pair of events is
independent. That is, if and only if for all distinct pairs of indices m, n
P(Am ∩ An ) = P(Am )P(An ).
A finite set of events is mutually independent if and only if every event is independent of
any intersection of the other events. That is, if and only if for every subset {An}
n
n
Ai =
P
i=1
P(Ai ).
i=1
This is called the multiplication rule for independent events.
For more than two events, a mutually independent set of events is (by definition) pairwise
independent, but the converse is not necessarily true.
3.3
Conditional independence
Intuitively, two random variables X and Y are conditionally independent given Z if, once
Z is known, the value of Y does not add any additional information about X . For instance,
8
9. two measurements X and Y of the same underlying quantity Z are not independent, but
they are conditionally independent given Z (unless the errors in the two measurements
are somehow connected).
The formal definition of conditional independence is based on the idea of conditional
distributions. If X , Y , and Z are discrete random variables, then we define X and Y to be
conditionally independent given Z if
P(X ≤ x, Y ≤ y|Z = z) = P(X ≤ x|Z = z) · P(Y ≤ y|Z = z)
for all x, y and z such that P(Z = z) > 0. On the other hand, if the random variables are
continuous and have a joint probability density function p, then X and Y are conditionally
independent given Z if
pX Y |Z (x, y|z) = pX |Z (x|z) · pY |Z ( y|z)
for all real numbers x, y and z such that p Z (z) > 0.
If X and Y are conditionally independent given Z, then
P(X = x, Y = y, Z = z) = P(X = x | Z = z)
for any x, y and z with P(Z = z) > 0. That is, the conditional distribution for X given
Y and Z is the same as that given Z alone. A similar equation holds for the conditional
probability density functions in the continuous case.
Independence can be seen as a special kind of conditional independence, since probability
can be seen as a kind of conditional probability given no events.
3.4
Reference
This
section
is
based
(probability_theory).
on
http://en.wikipedia.org/wiki/Independence_
9