SlideShare una empresa de Scribd logo
1 de 60
Descargar para leer sin conexión
Tutorial on Driver Analysis and
Product Optimization with BayesiaLab
Stefan Conrady, stefan.conrady@bayesia.us
Dr. Lionel Jouffe, jouffe@bayesia.com
December 11, 2010
Revised: March 13, 2013
Table of Contents
Introduction
BayesiaLab 4
Acknowledgements 4
Abstract 4
Bayesian Networks 5
Structural Equation Models 5
Probabilistic Structural Equation Models 5
Tutorial
Notation 6
Model Development 6
Dataset 6
Consumer Research 6
Data Import 7
Unsupervised Learning 11
Preliminary Analysis 15
Variable Clustering 20
Multiple Clustering 23
Analysis of Factors 26
Completing the PSEM 32
Market Driver Analysis 38
Product Driver Analysis 44
Product Optimization 44
Conclusion 56
Appendix: The Bayesian Network Paradigm
Acyclic Graphs & Bayes’s Rule 57
Compact Representation of the Joint Probability Distribution 58
References
Driver Analysis and Product Optimization with BayesiaLab
ii
 www.bayesia.com | www.bayesia.us
Contact Information
Bayesia USA 60
Bayesia S.A.S. 60
Bayesia Singapore Pte. Ltd. 60
Copyright 60
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us
 iii
Introduction
This tutorial is intended for new or prospective users of BayesiaLab. The example in this tutorial is taken
from the field of marketing science and is meant to illustrate the capabilities of BayesiaLab with a real-world
case study and actual consumer data. Beyond market researchers, analysts and researchers in many fields
will hopefully find the proposed methodology valuable and intuitive. In this context, many of the technical
steps are outlined in great detail, such as data preparation and the network learning, as they are applicable
to research with BayesiaLab in general, regardless of the domain.1
BayesiaLab
Bayesia S.A.S., based in Laval, France has been developing BayesiaLab since 1999 and it has emerged as the
leading software package for knowledge discovery, data mining and knowledge modeling using Bayesian
networks. BayesiaLab enjoys broad acceptance in academic communities as well as in business and industry.
The relevance of Bayesian networks, especially in the context of market research, is highlighted by Bayesia’s
strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007.
Acknowledgements
We would like to express our gratitude to Ares Research (www.ares-etudes.com) for generously providing
data from their consumer research for our case study.
Abstract
Market driver analysis and product optimization are one of the central tasks in Product Marketing and thus
relevant to virtually all types of businesses. BayesiaLab provides a unified software platform, which can,
based on consumer data,
1. provide deep understanding of the market preference structure
2. directly generate recommendations for prioritized product actions.
The proposed approach utilizes Probabilistic Structural Equation Models (PSEM), based on machine-
learned Bayesian networks. PSEMs provide an efficient alternative to Structural Equation Models (SEM),
which have been used traditionally in market research.
Driver Analysis and Product Optimization with BayesiaLab
4 www.bayesia.com | www.bayesia.us
1 This tutorial is based on version 5.0 of BayesiaLab.
Bayesian Networks
A Bayesian network or belief network is a directed acyclic graphical model that represents the joint prob-
ability distribution over a set of random variables and their conditional dependencies via a directed acyclic
graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between dis-
eases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence
of various diseases.2
Structural Equation Models
Structural Equation Modeling (SEM) is a statistical technique for testing and estimating causal relations
using a combination of statistical data and qualitative causal assumptions. This definition of SEM was ar-
ticulated by the geneticist Sewall Wright (1921), the economist Trygve Haavelmo (1943) and the cognitive
scientist Herbert Simon (1953), and formally defined by Judea Pearl (2000).
Structural Equation Models (SEM) allow both confirmatory and exploratory modeling, meaning they are
suited to both theory testing and theory development.
Probabilistic Structural Equation Models
Traditionally, specifying and estimating an SEM required a multitude of manual steps, which are typically
very time consuming, often requiring weeks or even months of an analyst’s time. PSEMs are based on the
idea of leveraging machine learning for automatically generating a structural model. As a result, creating
PSEMs with BayesiaLab is extremely fast and can thus form an immediate basis for much deeper analysis
and optimization.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 5
2 See appendix for a brief introduction to Bayesian networks.
Tutorial
At the beginning of this tutorial, we want to emphasize the overarching objectives of this case study, so we
do not lose sight of the “big picture” as we immerse ourselves into the technicalities of BayesiaLab and
Bayesian networks.
In this study, we want to examine how product attributes perceived by consumers relate to purchase inten-
tion for specific products. Put simply, we want to understand the key drivers for purchase intent. Given the
large number of attributes in our study, we also want to identify common concepts among these attributes
in order to make interpretation easier and communication with managerial decision makers more effective.
Secondly, we want to utilize the generated understanding of consumer dynamics so product developers can
optimize the characteristics of the products under study in order to increase purchase intent among consum-
ers, which is our ultimate business objective.
Notation
In order to clearly distinguish between natural language, BayesiaLab-specific functions and study-specific
variable names, the following notation is used:
• BayesiaLab functions, keywords, commands, etc., are shown in bold type.
• Variable names are capitalized and italicized.
Model Development
Dataset
Consumer Research
This study is based on a monadic3 consumer survey about perfumes, which was conducted in France. In this
example, we use survey responses from 1,320 women, who have evaluated a total of 11 fragrances on a
wide range of attributes:
• 27 ratings on fragrance-related attributes, such as, “sweet”, “flowery”, “feminine”, etc., measured on a 1-
to-10 scale.
• 12 ratings on projected imagery related to someone, who would be wearing the respective fragrance, e.g.
“is sexy”, “is modern”, measured on a 1-to-10 scale.
• 1 variable for Intensity, a measure reflecting the level of intensity, measured on a 1-to-5 scale.4
• 1 variable for Purchase Intent, measured on a 1-to-6 scale.
Driver Analysis and Product Optimization with BayesiaLab
6 www.bayesia.com | www.bayesia.us
3 a product test only involving one product, i.e. in our study each respondent evaluated only one perfume.
4 The variable Intensity is listed separately due to the a-priori knowledge of its non-linearity and the existence of a “just-
about-right” level.
• 1 nominal variable, Product, for product identification purposes.
Data Import
To start the analysis with BayesiaLab, we first import the data set, which is formatted as a CSV file.5 With
Data | Open Data Source | Text File, we start the Data Import wizard, which immediately provides a pre-
view of the data file.
The table displayed in the Data Import wizard
shows the individual variables as columns and
the responses as rows. There are a number of
options available, e.g. for sampling. However,
this is not necessary in our example given the
relatively small size of the database.
Clicking the Next button, prompts a data type
analysis, which provides BayesiaLab’s best
guess regarding the data type of each variable.
Furthermore, the Information box provides a
brief summary regarding the number of re-
cords, the number of missing values 6 , filtered
states, etc.
For this example, we will need to override the
default data type for the Product variable as
each value is a nominal product identifier
rather than a numerical scale value. We can
change the data type by highlighting the Prod-
uct variable and clicking the Discrete check
box, which changes the color of the Product
column to red.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 7
5 CSV stands for “comma-separated values”, a common format for text-based data files.
6 There are no missing values in our database and filtered states are not applicable in this survey.
We will also define Purchase Intent and Inten-
sity as discrete variables, as the default number
of states of these variables is already adequate
for our purposes.7
The next screen provides options as to how to
treat any missing values. In our case, there are
no missing values so that the corresponding
panel is grayed-out.
Clicking the small upside-down triangle next
to the variable names brings up a window with
key statistics of the selected variable, in this
case Fresh.
The next step is the Discretization and Aggre-
gation dialogue, which allows the analyst to
determine the type of discretization, which
must be performed on all continuous
variables.8 For this survey, and given the num-
ber of observations, it is appropriate to reduce
the number of states from the original 10
states (1 through 10) to a smaller number. One
could, for instance, bin the 1-10 rating into
low, mid and high, or apply any other arbi-
trary method deemed appropriate by the ana-
lyst.
Driver Analysis and Product Optimization with BayesiaLab
8 www.bayesia.com | www.bayesia.us
7 The desired number of variable states is largely a function of the analyst’s judgment.
8 BayesiaLab requires discrete distributions for all variables.
The screenshot shows the dialogue for the Manual selection of discretization steps, which permits to select
binning thresholds by point-and-click.
For this particular example, we select Equal Distances with 5 intervals for all continuous variables. This was
the analyst’s choice in order to be consistent with prior research.
Clicking Select All Continuous followed by
Finish completes the import process and the 49
variables (columns) from our database are
now shown as blue nodes in the Graph Panel,
which is the main window for network editing.
Note
For choosing discretization algorithms beyond this
example, the following rule of thumb may be helpful:
• For supervised learning, choose Decision Tree.
• For unsupervised learning, choose, in the order of
priority, K-Means, Equal Distances or Equal
Frequencies.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 9
This initial view represents a fully unconnected Bayesian network.
For reasons, which will become clear later, we will initially exclude two variables, Product and Purchase
Intent. We can do so by right-clicking the nodes and selecting Properties | Exclusion. Alternatively, holding
“x” while double-clicking the nodes performs the same exclusion function.
Driver Analysis and Product Optimization with BayesiaLab
10 www.bayesia.com | www.bayesia.us
Unsupervised Learning
As the next step, we will perform the first unsupervised learning of a network by selecting Learning | Asso-
ciation Discovering | EQ.
The resulting view shows the learned network with all the nodes in their original position.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 11
Needless to say, this view of the network is not very intuitive. BayesiaLab has numerous built-in layout al-
gorithms, of which the Force Directed Layout is perhaps the most commonly used.
Driver Analysis and Product Optimization with BayesiaLab
12 www.bayesia.com | www.bayesia.us
It can be invoked by View | Automatic Layout | Force Directed Layout or alternatively through the key-
board shortcut “p”. This shortcut is worthwhile to remember as it is one of the most commonly used func-
tions.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 13
The resulting network will look similar to the following screenshot.
To optimize the use of the available screen, clicking the Best Fit button in the toolbar “zooms to fit”
the graph to the screen. In addition, rotating the graph with the Rotate Left and Rotate Right buttons
helps to create a suitable view.
The final graph should closely resemble the following screenshot and, in this view, the properties of this first
learned Bayesian network become immediately apparent. This network is a now compact representation of
the 47 dimensions of the joint probability distribution of the underlying database.
Driver Analysis and Product Optimization with BayesiaLab
14 www.bayesia.com | www.bayesia.us
It is very important to note that, although this learned graph happens to have a tree structure, this is not the
result of an imposed constraint.
Preliminary Analysis
The analyst can further examine this graph by switching into the Validation Mode, which immediately
opens up the Monitor Panel on the right side of the screen.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 15
This panel is initially empty, but by clicking on any node (or multiple nodes) in the network, Monitors ap-
pear inside the Monitor Panel. The corresponding nodes are highlighted in yellow.
Driver Analysis and Product Optimization with BayesiaLab
16 www.bayesia.com | www.bayesia.us
By default, the Monitors show the marginal distributions of all selected variables. This shows, for instance,
9.7% of respondents rated their perfume at <=2.8 in terms of the Fresh attribute.
On this basis, one can start to experiment with the properties of this particular Bayesian network and query
it. With BayesiaLab this can be done in an extremely intuitive way, i.e. by setting evidence (or observations)
directly on the Monitors. For instance, we can compute the conditional probability distribution of Flowery,
given that we have observed a specific value, i.e. a specific state of Fresh. In formal notation, this would be
P(Flowery | Fresh)
We will now set Flowery to the state that represents the highest rating (>8.2), and we can immediately ob-
serve the conditional probability distribution of Fresh, i.e.
P(Fresh | Flowery = " > 8.2")
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 17
The gray arrows inside the bars indicate how the distributions have
changed compared to the previous distributions. This means that re-
spondents, who have rated the Flowery attribute of a perfume at the
top level, will have a 67% probability of also assigning a top rating to
the Fresh attribute.
P(Fresh = " > 8.2"| Flowery = " > 8.2") = 66.9%
Switching briefly back into the Modeling Mode and by clicking on the Flowery node, one can see the prob-
abilistic relationship between Flowery and Fresh in detail. By learning the network, BayesiaLab has auto-
matically created a contingency table for every single direct relationship between nodes.
All contingency tables, together with the graph
structure, thus encode the joint probability
distribution of our original database.
Returning to the Validation Mode, we can
further examine the properties of our network.
Of great interest is the strength of the prob-
abilistic relationships between the variables. In
BayesiaLab this can be shown by selecting
Analysis | Graphic | Arcs’ Mutual Information.
Note
The structure of our Bayesian network may be
directed, but the directions of the arcs do not
necessarily have to be meaningful.
For observational inference, it is only necessary that
the Bayesian network correctly represents the joint
probability distribution of the underlying database.
Driver Analysis and Product Optimization with BayesiaLab
18 www.bayesia.com | www.bayesia.us
The thickness of the arcs is now proportional to the Mutual Information, i.e. the strength of the relationship
between the nodes.
Intuitively, Mutual Information measures the information that X and Y share: it measures how much know-
ing one of these variables reduces our uncertainty about the other. For example, if X and Y are independent,
then knowing X does not provide any information about Y and vice versa, so their mutual information is
zero. At the other extreme, if X and Y are identical then all information conveyed by X is shared with Y:
knowing X determines the value of Y and vice versa.
Formal Definition of Mutual Information
I(X;Y ) = p(x,y)log
p(x,y)
p(x)p(y)
⎛
⎝⎜
⎞
⎠⎟
x∈X
∑
y∈Y
∑
We can also show the values of the Mutual Information on the graph by clicking on Display Arc Com-
ments.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 19
In the top part of the comment box attached to each arc, the Mutual Information of the arc is
shown. Below, expressed as a percentage and highlighted in blue, we see the relative Mutual In-
formation in the direction of the arc (parent node ➔ child node). And, at the bottom, we have
the relative mutual information in the opposite direction of the arc (child node ➔ parent node).
Variable Clustering
The information about the strength between the manifest variables can also be utilized for purposes of Vari-
able Clustering. More specifically, a concept related closely to the Mutual Information, namely the
Kullback-Leibler Divergence (K-L Divergence) is utilized for clustering.
For probability distributions P and Q of a discrete
random variable their K–L divergence is defined to be
DKL = (P || Q) = P(i)log
P(i)
Q(i)i
∑
In words, it is the average of the logarithmic difference
between the joint probability distributions P(i) and Q(i),
where the average is taken using the probabilities P(i).
Driver Analysis and Product Optimization with BayesiaLab
20 www.bayesia.com | www.bayesia.us
Such variable clusters will allow us to induce new latent variables, which each represent a common concept
among the manifest variables.9 From here on, we will make a very clear distinction between manifest vari-
ables, which are directly observed, such as the survey responses, and latent variables, which are derived. In
traditional statistics, deriving such latent variables or factors is typically performed by means of Factor
Analysis, e.g. Principal Components Analysis (PCA).
In BayesiaLab, this “factor extraction” can be done very easily via the Analysis | Graphics | Variable Clus-
tering function, which is also accessible through the keyboard shortcut “s”.
The speed in which this is performed is one of the strengths of BayesiaLab, as the resulting variable clusters
are presented instantly.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 21
9 An alternative approach is to interpret the derived concept or factor as a hidden common cause.
In this case, BayesiaLab has identified 15 variable clusters and each node is color-coded according to its
cluster membership. To interpret these newly-found clusters, we can zoom in and visually examine the
structure in the Graph Panel.
To support the interpretation process, BayesiaLab can also display a Dendrogram, which allows the analyst
to review the linkage of nodes into variable clusters.
The analyst may also choose a different number of clusters, based on his own judgement relating to the do-
main. A slider in the toolbar allows to choose various numbers of clusters and the color association of the
nodes will be update instantly.
By clicking the Validate Clustering button in the toolbar, the clusters are saved and the color codes will
be formally associated with the nodes. A clustering report provides us with a formal summary of the new
factors and their associated manifest variables.10
Driver Analysis and Product Optimization with BayesiaLab
22 www.bayesia.com | www.bayesia.us
10 Variable cluster = derived concept = unobserved latent variable = hidden cause = extracted factor.
The analyst also has the option to use his do-
main knowledge to modify which manifest
variables belong to specific factors. This can be
done by right-clicking on the Graph Panel and
selecting Class Editor.
Multiple Clustering
As our next step towards building the PSEM, we will introduce these newly-generated latent factors into our
existing network and also estimate their probabilistic relationships with the manifest variables. This means
we will create a new node for each latent factor, creating 15 new dimensions in our network. For this step,
we will need to return to the Modeling Mode, because the introduction of the factor nodes into the net-
works requires the learning algorithms.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 23
More specifically, we select Learning | Multiple
Clustering, which brings up the Multiple Clus-
tering dialogue. There is a range of settings,
but we will focus only on a subset. Firstly, we
need to specify an output directory for the to-
be-learned networks. Secondly, we need to set
some parameters for the clustering process,
such as the minimum and maximum number
of states, which can be created during the
learning process.
In our example, we select Automatic Selection
of the Number of Classes, which will allow the
learning algorithm to find the optimum num-
ber of factor states up to a maximum of five
states. This means that each new factor will
need to represent the corresponding manifest
variables with up to five states.
Driver Analysis and Product Optimization with BayesiaLab
24 www.bayesia.com | www.bayesia.us
The Multiple Clustering process concludes
with a report, which shows details regarding
the generated clustering. The top portion of
the report is shown in the following screen-
shot.
The detail section of Factor_0, as it relates to
the manifest variables, is worth highlighting.
Here, we can see the strength of the relation-
ship between the manifest variables, such as
Trust, Bold, etc., and Factor_0. In a traditional
Factor Analysis, this would be the equivalent
of factor loading.
After closing the report, we will now see a new
(unconnected) network, with 15 additional
nodes, one for each factor, i.e. Factor_0
through Factor_14, highlighted in yellow in
the screenshot.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 25
Analysis of Factors
We can also further examine how the new factors relate to the manifest variables and how well they repre-
sent them. In the case of Factor_0, we want to understand how it can summarize our five manifest variables.
By going into our previously-specified output directory, using the Windows Explorer or the Mac Finder, we
can see that 15 new networks (in BayesiaLab’s xbl format for networks) were generated. We open the spe-
cific network for Factor_0, either by directly double-clicking the xbl file or by selecting Network | Open.
The factor-specific networks are identified by a suffix/extension of the format “_[Factor_#].xbl” and “#”
stands for the factor number. We then see a network including the manifest variables and with the factor
being linked by arcs going from the factor to the manifest variables.
Returning to the Validation Mode, we can see five states for Factor_0, labeled C1 through C5, as well as
their marginal distribution. As Factor_0 is a target node by default, it automatically appears highlighted in
red in the Monitor Panel.
Driver Analysis and Product Optimization with BayesiaLab
26 www.bayesia.com | www.bayesia.us
Here, we can also study how the states of the manifest variables relate to the states of Factor_0. This can be
done easily by setting observations to the monitors, e.g. setting C1 to 100%.
We now see that given that Factor_0 is in state
C1, the variable Active has a probability of
approximately 75% of being in state <=2.8.
Expressed more formally, we would state
P(Active = “<=2.8” | Factor_0 = C1) =
74.57%. This means that for respondents,
who have been assigned to C1, it is likely that
they would rate the Active attribute very low
as well.
In the Monitor for Factor_0, in parentheses
behind the cluster name, we find the expected
mean value of the numeric equivalents of the
states of the manifest variables, e.g. “C1 (2.08)”. That means that given the state C1 of Factor_0, we expect
the mean value of Trust, Bold, Fulfilled, Active and Character to be 2.08.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 27
To go into even greater detail, we can actually look at every single respondent, i.e. every record in the data-
base and see what cluster they were assigned to. We select Inference | Interactive Inference,
which will bring up a record selector in the toolbar.
With this record selector, we can now scroll through the entire database, review the actual ratings of the
respondents and then see the estimation to which cluster each respondent belongs.
Driver Analysis and Product Optimization with BayesiaLab
28 www.bayesia.com | www.bayesia.us
In our first case, record 0, we see the ratings of this respondent indicated by the manifest Monitors. In the
highlighted Monitor for Factor_0, we read that this respondent, given her responses, has an 82% probabil-
ity of belonging to Cluster 5 (C5) in Factor_0.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 29
Moving to our second case, record 1, we see that the respondent belongs to Cluster 3 (C3) with a 96%
probability.
We can also evaluate the performance of our new network based on Factor_0 by selecting Analysis | Net-
work Performance | Global.
Driver Analysis and Product Optimization with BayesiaLab
30 www.bayesia.com | www.bayesia.us
This will return the log-likelihood density function as shown in the following screenshot.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 31
Completing the PSEM
We are now returning to our main task and our principal network, which has been augmented by the 15
new factors.
Before we re-learn our network with the new factors, we need to include Purchase Intent as a variable and
also impose a number of constraints in the form of Forbidden Arcs.
Driver Analysis and Product Optimization with BayesiaLab
32 www.bayesia.com | www.bayesia.us
Being in the Modeling Mode, we can include Purchase Intent by right-clicking the node and uncheck Exclu-
sion.
This makes the Purchase Intent variable available in the next stage of learning, which is reflected visually as
well in the node color and the icon.
Our desired SEM-type network structure stipulates that manifest variables be connected exclusively to the
factors and that all the connections with Purchase Intent must also go through the factors. We achieve such
a structure by imposing the following sets of forbidden arcs:
1. No arcs between manifest variables
2. No arcs from manifest variables to factors
3. No arcs between manifest variables and Purchase Intent
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 33
We can define these forbidden arcs by right-clicking anywhere on the Graph Panel, which brings up the fol-
lowing menu.
In BayesiaLab, all manifest variables and all factors are conveniently grouped into classes, so we can easily
define which arcs are forbidden in the Forbidden Arc Editor.
Driver Analysis and Product Optimization with BayesiaLab
34 www.bayesia.com | www.bayesia.us
Upon completing this step, we can proceed to learning our network again: Learning | Association Discover-
ing | EQ
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 35
The initial result will resemble the following screenshot.
Driver Analysis and Product Optimization with BayesiaLab
36 www.bayesia.com | www.bayesia.us
Using the Force Directed Layout algorithm (shortcut “p”), as before, we can quickly transform this network
into a much more interpretable format.
Now we see the manifest variables “laddering up” to the factors, and we also see how the factors are related
to each other. Most importantly, we can observe where the Purchase Intent node was attached to the net-
work during the learning process. The structure conveys that Purchase Intent has the strongest link with
Factor_2.
Now that we can see the big picture, it is perhaps appropriate to give the factors more descriptive names.
For obvious reasons, this task is the responsibility of the analyst. In this case study, Factor_0 was given the
name “Self-Confident”. We add this name into the node comments by double-clicking Factor_0 and scroll-
ing to the right inside the Node Editor until we see the Comments tab.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 37
We repeat this for all other nodes, and we can subsequently display the node comments for all factors by
clicking the Display Node Comment icon in the toolbar or by selecting View | Display Node Comments
from the menu.
Market Driver Analysis
Our Probabilistic Structural Equations Model is now complete, and we can use it to perform the actual
analysis part of this exercise, namely to find out what “drives” Purchase Intent.
We return to the Validation Mode and right-click on Purchase Intent and then check Set As Target Node.
Double-clicking the node while pressing “t” is a helpful shortcut.
Driver Analysis and Product Optimization with BayesiaLab
38 www.bayesia.com | www.bayesia.us
This will also change the appearance of the node and literally give it the look of a target.
In order to understand the relationship between the factors and Purchase Intent, we want to tune out all the
manifest variables for the time being. We can do so by right-clicking the Use of Classes icon in the bottom
right corner of the screen. This will bring up a list of all classes. By default, all are checked and thus visible.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 39
For our purposes, we want to deselect All and then only check the Factor class.
The resulting view has all the manifest variables grayed-out, so the relationship between the factors becomes
more prominent. By deselecting the manifest variables, we also exclude them from subsequent analysis.
Driver Analysis and Product Optimization with BayesiaLab
40 www.bayesia.com | www.bayesia.us
We will now right-click inside the (currently empty) Monitor Panel and select Monitors Sorted wrt Target
Variable Correlations. The keyboard shortcut “x” will do the same.
This brings up the monitor for the target node, Purchase Intent, plus all the monitors for the factors, in the
order of the strength of relationship with the Target Node.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 41
This immediately highlights the order of importance of the factors relative to the Target Node, Purchase
Intent. Another way of comprehensively displaying the importance is by selecting Reports | Target Analy-
sis | Correlations With the Target Node
“Correlations” is more of a metaphor here, as BayesiaLab actually orders the factors by their Mutual In-
formation relative to the target node, Purchase Intent.
By clicking Quadrants, we can obtain a type of opportunity graph, which shows the mean value of each
factor on the x-axis and the relative Mutual Information with Purchase Intent on the y-axis. Mutual Infor-
mation can be interpreted as importance in this context.
Driver Analysis and Product Optimization with BayesiaLab
42 www.bayesia.com | www.bayesia.us
By right-clicking on the graph, we can switch between the display of the formal factor names, e.g. Factor_0,
Factor_1, etc., and the factor comments, such as Adequacy, Seduction, which is much easier for interpreta-
tion.
As in the previous views, it becomes very obvious that the factor Adequacy is most important with regard to
Purchase Intent, followed by the factor Seduction. This is very helpful for understanding the overall market
dynamics and for communicating the key drivers to managerial decision makers.
The lines dividing the graph into quadrants reflect the mean values for each axis. The upper-left quadrant
highlights opportunities as these particular factors are “above average” in importance, but “below average”
in terms of their rating.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 43
Product Driver Analysis
Although this insight is relevant for the whole market, it does not yet allow us to work on improving spe-
cific products. For this we need to look at product-specific graphs. In addition, we may need to introduce
constraints as to where we may not have the ability to impact any attributes. Such information must come
from the domain expert, in our case from the perfumer, who will determine if and how odoriferous com-
pounds can affect the consumers’ perception of the product attributes.
These constraints can be entered into BayesiaLab’s Cost Editor, which is accessible by right-clicking any-
where in the Graph Panel. Those attributes, which cannot be changed (as determined by the expert), will be
set to “Not Observable”. As we proceed with our analysis, these constraints will be extremely important
when searching for realistic product scenarios.
On a side note, an example from the presumably more tangible auto industry may better illustrate such
kinds of constraints. For instance, a vehicle platform may have an inherent wheelbase limitation, which thus
sets a hard limit regarding the maximum amount of rear passenger legroom. Even if consumers perceived a
need for improvement on this attribute, making such a recommendation to the engineers would be futile. As
we search for optimum product solutions with our Bayesian network, this is very important to bear in mind
and thus we must formally encode these constraints of our domain through the Cost Editor.
Product Optimization
We now return briefly to the Modeling Mode to include the Product variable, which has been excluded
from our analysis thus far. Right-clicking the node and then unchecking Properties | Exclusion will achieve
this.
At this time, we will also move beyond the analysis of factors and actually look at the individual product
attributes, so we select Manifest from the Display Classes menu.
Driver Analysis and Product Optimization with BayesiaLab
44 www.bayesia.com | www.bayesia.us
Back in the Validation Mode, we can perform a Multi Quadrant Analysis: Tools | Multi Quadrant Analysis
This tool allows us to look at the attribute ratings of each product and their respective importance, as ex-
pressed with the Mutual Information. Thus, we pick Product as the Selector Node and choose Mutual In-
formation for Analysis. In this case, we also want to check Linearize Nodes’ Values, Regenerate Values and
specify an Output Directory, where the product-specific networks will be saved. In the process of generating
the Multi Quadrant Analysis, BayesiaLab produces one Bayesian network for each Product. For all Prod-
ucts the network structure will be identical to the network for the entire market, however, the parameters,
i.e. the contingency tables, will be specific to each Product.
However, before we proceed to the product-specific networks, we will first see a Multi Quadrant Analysis
by Product, and we can select each product’s graph simply by right-clicking and choosing the appropriate
product identification number.
Please note that only the observable variables are visible on the chart, i.e. those variables which were not
previously defined as “Not Observable” in the Cost Editor.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 45
For Product No. 5, Personality is at the very top of the importance scale. But how will the Personality at-
tribute compare in the competitive context? If we Display Scales by right-clicking on the graph, it appears
that Personality is already at the best level among the competitors, i.e. to the far right of the horizontal
scale. On the other hand, on the Fresh attribute Product No. 511 marks the bottom end of the competitive
range.
Driver Analysis and Product Optimization with BayesiaLab
46 www.bayesia.com | www.bayesia.us
11 Any similarities of identifiers with actual product names are purely coincidental.
For a perfumer it would thus be reasonable to assume that there is limited room for improvement with re-
gard to Personality, and that Fresh perhaps offers a significant opportunity for Product No. 5.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 47
To highlight the differences between products, we will also show Product No. 1 in comparison.
For Product No. 1 it becomes apparent that Intensity is highly important, but that its rating is towards the
bottom end of the scale. The perfumer may thus conclude a bolder version of the same fragrance will im-
prove Purchase Intent.
Driver Analysis and Product Optimization with BayesiaLab
48 www.bayesia.com | www.bayesia.us
Finally, by hovering over any data point in the opportunity chart, BayesiaLab can also display the position
of competitors compared to the reference product for any attribute. The screenshot shows Product No. 5 as
the reference and the position of competitors on the Personality attribute.
BayesiaLab also allows us to measure and save the “gap to best level” (=variations) for each product and
each variable through the Export Variations function. This formally captures our opportunity for improve-
ment.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 49
Please note that these variations need to be saved individually by Product.
By now we have all the components necessary for a comprehensive optimization of product attributes:
1. Constraints on “non-actionable” attributes, i.e. excluding those variables, which can’t be affected
through product changes.
2. A Bayesian network for each Product.
3. The current attribute rating of each Product and each attribute’s importance relative to Purchase Intent.
4. The “gap to best level” (variation) for each attribute and Product.
With the above, we are now in a position to search for realistic product configurations, based on the exist-
ing product, which would realistically optimize Purchase Intent.
We proceed individually by Product, and for illustration purposes we use Product No. 5 again. We load the
product-specific network, which was previously saved when the Multi Quadrant Analysis was performed.
Driver Analysis and Product Optimization with BayesiaLab
50 www.bayesia.com | www.bayesia.us
One of the powerful features of BayesiaLab is Target Dynamic Profile, which we will apply here on this
network to optimize Purchase Intent: Analysis | Report | Target Analysis | Target Dynamic Profile
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 51
The Target Dynamic Profile provides a number of important options:
• Profile Search Criterion: we intend to optimize the mean of the Purchase Intent.
• Criterion Optimization: maximization is the objective.
Driver Analysis and Product Optimization with BayesiaLab
52 www.bayesia.com | www.bayesia.us
• Search Method: We select Mean and also click on Edit Variations, which allows us to manually stipulate
the range of possible variations of each attribute. In our case, however, we had saved the actual variations
of Product No. 5 versus the competition, so we load that data set, which subsequently displays the values
in the Variation Editor. For example, Fresh could be improved by 10.7% before catching up to the
highest-rated product in this attribute.
• Search Stop Criterion: We check Maximum Number of Evidence Reached and set this parameter to 4.
This means that no more than the top-four attributes will be suggested for improvement.
Upon completion of all computations, we will obtain a list of product action priorities: Fresh, Fruity, Flow-
ery and Wooded.
The highlighted Value/Mean column shows the successive improvement upon implementation of each ac-
tion. From initially 3.76, the Purchase Intent improves to 3.92, which may seem like a fairly small step.
However, the importance lies in the fact that this improvement is not based on utopian thinking, but rather
on attainable product improvements within the range of competitive performance.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 53
Initially, we have the marginal distribution of the attributes and the original mean value for Purchase Intent,
i.e. 3.77.
To further illustrate the impact of our product actions, we will simulate their implementation step-by-step,
which is available through Inference | Interactive Inference.
With the selector in the toolbar, we can go through each product action step-by-step in the order in which
they were recommended.
Driver Analysis and Product Optimization with BayesiaLab
54 www.bayesia.com | www.bayesia.us
Upon implementation of the first product action, we obtain the following picture and Purchase Intent grows
to 3.9. Please note that this is not a sea change in terms of Purchase Intent, but rather a realistic consumer
response to a product change.
The second change results in further subtle improvement to Purchase Intent:
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 55
The third and fourth step are analogous and bring us to the final value for Purchase Intent of 3.92.
Although BayesiaLab generates these recommendations effortlessly, they represent a major innovation in the
field of marketing science. This particular optimization task has not been tractable with traditional methods.
Conclusion
The presented case study demonstrates how BayesiaLab can transform simple survey data into a deep un-
derstanding of consumers’ thinking and quickly provides previously-inconceivable product recommenda-
tions. As such, BayesiaLab is a revolutionary tool, especially as the workflow shown here may take no more
than a few hours for an analyst to implement. This kind of rapid and “actionable”12 insight is clearly a
breakthrough and creates an entirely new level of relevance of research for business applications.
Driver Analysis and Product Optimization with BayesiaLab
56 www.bayesia.com | www.bayesia.us
12 The authors cringe at the inflationary use of “actionable”, but here, for once, it actually seems appropriate.
Appendix: The Bayesian Network Paradigm13
Acyclic Graphs & Bayes’s Rule
Probabilistic models based on directed acyclic graphs have a long and rich tradition, beginning with the
work of geneticist Sewall Wright in the 1920s. Variants have appeared in many fields. Within statistics, such
models are known as directed graphical models; within cognitive science and artificial intelligence, such
models are known as Bayesian networks. The name honors the Rev. Thomas Bayes (1702-1761), whose
rule for updating probabilities in the light of new evidence is the foundation of the approach.
Rev. Bayes addressed both the case of discrete probability distributions of data and the more complicated
case of continuous probability distributions. In the discrete case, Bayes’ theorem relates the conditional and
marginal probabilities of events A and B, provided that the probability of B does not equal zero:
P(A∣B) =
P(B∣A)P(A)
P(B)
In Bayes’ theorem, each probability has a conventional name:
• P(A) is the prior probability (or “unconditional” or “marginal” probability) of A. It is “prior” in the
sense that it does not take into account any information about B; however, the event B need not occur
after event A. In the nineteenth century, the unconditional probability P(A) in Bayes’s rule was called the
“antecedent” probability; in deductive logic, the antecedent set of propositions and the inference rule
imply consequences. The unconditional probability P(A) was called “a priori” by Ronald A. Fisher.
• P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is
derived from or depends upon the specified value of B.
• P(B|A) is the conditional probability of B given A. It is also called the likelihood.
• P(B) is the prior or marginal probability of B, and acts as a normalizing constant.
Bayes theorem in this form gives a mathematical representation of how the conditional probability of event
A given B is related to the converse conditional probability of B given A.
The initial development of Bayesian networks in the late 1970s was motivated by the need to model the top-
down (semantic) and bottom-up (perceptual) combination of evidence in reading. The capability for bidirec-
tional inferences, combined with a rigorous probabilistic foundation, led to the rapid emergence of Bayesian
networks as the method of choice for uncertain reasoning in AI and expert systems replacing earlier, ad hoc
rule-based schemes.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 57
13 Adapted from Pearl (2000), used with permission.
The nodes in a Bayesian network represent variables
of interest (e.g. the temperature of a device, the gen-
der of a patient, a feature of an object, the occur-
rence of an event) and the links represent statistical
(informational) or causal dependencies among the
variables. The dependencies are quantified by condi-
tional probabilities for each node given its parents in
the network. The network supports the computation
of the posterior probabilities of any subset of vari-
ables given evidence about any other subset.
Compact Representation of the Joint
Probability Distribution
“The central paradigm of probabilistic reasoning is
to identify all relevant variables x1, . . . , xN in the
environment [i.e. the domain under study], and make a probabilistic model p(x1, . . . , xN) of their interac-
tion [i.e. represent the variables’ joint probability distribution].”
Bayesian networks are very attractive for this purpose as they can, by means of factorization, compactly
represent the joint probability distribution of all variables.
“Reasoning (inference) is then performed by introducing evidence that sets variables in known states, and
subsequently computing probabilities of interest, conditioned on this evidence. The rules of probability,
combined with Bayes’ rule make for a complete reasoning system, one which includes traditional deductive
logic as a special case.” (Barber, 2012)
Driver Analysis and Product Optimization with BayesiaLab
58 www.bayesia.com | www.bayesia.us
References
Barber, David. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.
Darwiche, Adnan. Modeling and Reasoning with Bayesian Networks. 1st ed. Cambridge University Press,
2009.
Heckerman, D. “A Tutorial on Learning with Bayesian Networks.” Innovations in Bayesian Networks
(2008): 33–82.
Holmes, Dawn E., ed. Innovations in Bayesian Networks: Theory and Applications. Softcover reprint of
hardcover 1st ed. 2008. Springer, 2010.
Kjaerulff, Uffe B., and Anders L. Madsen. Bayesian Networks and Influence Diagrams: A Guide to Con-
struction and Analysis. Softcover reprint of hardcover 1st ed. 2008. Springer, 2010.
Koller, Daphne, and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. 1st ed. The
MIT Press, 2009.
Koski, Timo, and John Noble. Bayesian Networks: An Introduction. 1st ed. Wiley, 2009.
Mittal, Ankush. Bayesian Network Technologies: Applications and Graphical Models. Edited by Ankush
Mittal and Ashraf Kassim. 1st ed. IGI Publishing, 2007.
Neapolitan, Richard E. Learning Bayesian Networks. Prentice Hall, 2003.
Pearl, Judea. Causality: Models, Reasoning and Inference. 2nd ed. Cambridge University Press, 2009.
———. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. 1st ed. Morgan
Kaufmann, 1988.
Pearl, Judea, and Stuart Russell. Bayesian Networks. UCLA Congnitive Systems Laboratory, November
2000. http://bayes.cs.ucla.edu/csl_papers.html.
Pourret, Olivier, Patrick Naïm, and Bruce Marcot, eds. Bayesian Networks: A Practical Guide to Applica-
tions. 1st ed. Wiley, 2008.
Schafer, J.L., and M.K. Olsen. “Multiple Imputation for Multivariate Missing-data Problems: A Data Ana-
lyst’s Perspective.” Multivariate Behavioral Research 33, no. 4 (1998): 545–571.
Spirtes, Peter; Glymour, Clark. Causation, Prediction and Search. The MIT Press, 2001.
Driver Analysis and Product Optimization with BayesiaLab
www.bayesia.com | www.bayesia.us 59
Contact Information
Bayesia USA
312 Hamlet’s End Way
Franklin, TN 37067
USA
Phone: +1 888-386-8383
info@bayesia.us
www.bayesia.us
Bayesia S.A.S.
6, rue Léonard de Vinci
BP 119
53001 Laval Cedex
France
Phone: +33(0)2 43 49 75 69
info@bayesia.com
www.bayesia.com
Bayesia Singapore Pte. Ltd.
20 Cecil Street
#14-01, Equity Plaza
Singapore 049705
Phone: +65 3158 2690
info@bayesia.sg
www.bayesia.sg
Copyright
© 2013 Bayesia USA, Bayesia S.A.S. and Bayesia Singapore. All rights reserved.
Driver Analysis and Product Optimization with BayesiaLab
60 www.bayesia.com | www.bayesia.us

Más contenido relacionado

La actualidad más candente

Predicting Yelp Review Star Ratings with Language
Predicting Yelp Review Star Ratings with LanguagePredicting Yelp Review Star Ratings with Language
Predicting Yelp Review Star Ratings with Language
Sebastian W. Cheah
 

La actualidad más candente (18)

Research Method EMBA chapter 12
Research Method EMBA chapter 12Research Method EMBA chapter 12
Research Method EMBA chapter 12
 
Empirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningEmpirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion Mining
 
Dissertation
DissertationDissertation
Dissertation
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Computing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback CommentsComputing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback Comments
 
Crimson Publishers-Toward Generating Customized Rehabilitation Plan and Deliv...
Crimson Publishers-Toward Generating Customized Rehabilitation Plan and Deliv...Crimson Publishers-Toward Generating Customized Rehabilitation Plan and Deliv...
Crimson Publishers-Toward Generating Customized Rehabilitation Plan and Deliv...
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
Predictive Modeling Workshop
Predictive Modeling WorkshopPredictive Modeling Workshop
Predictive Modeling Workshop
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
 
Modelling the supply chain perception gaps
Modelling the supply chain perception gapsModelling the supply chain perception gaps
Modelling the supply chain perception gaps
 
sigir2020
sigir2020sigir2020
sigir2020
 
Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...
Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...
Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
 
MAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingMAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. Irwing
 
Predicting Yelp Review Star Ratings with Language
Predicting Yelp Review Star Ratings with LanguagePredicting Yelp Review Star Ratings with Language
Predicting Yelp Review Star Ratings with Language
 

Similar a Driver Analysis and Product Optimization with Bayesian Networks

Loyalty_Driver_Analysis_V13b
Loyalty_Driver_Analysis_V13bLoyalty_Driver_Analysis_V13b
Loyalty_Driver_Analysis_V13b
Bayesia USA
 
Descriptive Statistics and Interpretation Grading GuideQNT5.docx
Descriptive Statistics and Interpretation Grading GuideQNT5.docxDescriptive Statistics and Interpretation Grading GuideQNT5.docx
Descriptive Statistics and Interpretation Grading GuideQNT5.docx
theodorelove43763
 
Proceedings Template - WORD
Proceedings Template - WORDProceedings Template - WORD
Proceedings Template - WORD
butest
 
OL 325 Milestone Three Guidelines and Rubric Section
 OL 325 Milestone Three Guidelines and Rubric  Section OL 325 Milestone Three Guidelines and Rubric  Section
OL 325 Milestone Three Guidelines and Rubric Section
MoseStaton39
 

Similar a Driver Analysis and Product Optimization with Bayesian Networks (20)

Types Of Sap Hana Models
Types Of Sap Hana ModelsTypes Of Sap Hana Models
Types Of Sap Hana Models
 
Loyalty_Driver_Analysis_V13b
Loyalty_Driver_Analysis_V13bLoyalty_Driver_Analysis_V13b
Loyalty_Driver_Analysis_V13b
 
Lobsters, Wine and Market Research
Lobsters, Wine and Market ResearchLobsters, Wine and Market Research
Lobsters, Wine and Market Research
 
Customer_Analysis.docx
Customer_Analysis.docxCustomer_Analysis.docx
Customer_Analysis.docx
 
SentimentAnalysisofTwitterProductReviewsDocument.pdf
SentimentAnalysisofTwitterProductReviewsDocument.pdfSentimentAnalysisofTwitterProductReviewsDocument.pdf
SentimentAnalysisofTwitterProductReviewsDocument.pdf
 
Business analyst
Business analystBusiness analyst
Business analyst
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media Data
 
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
 
SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docx
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Connecting social media to e commerce (2)
Connecting social media to e commerce (2)Connecting social media to e commerce (2)
Connecting social media to e commerce (2)
 
Analytics
AnalyticsAnalytics
Analytics
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology
 
Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks
Modeling Vehicle Choice and Simulating Market Share with Bayesian NetworksModeling Vehicle Choice and Simulating Market Share with Bayesian Networks
Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks
 
Bsa 411 preview full class
Bsa 411 preview full classBsa 411 preview full class
Bsa 411 preview full class
 
Descriptive Statistics and Interpretation Grading GuideQNT5.docx
Descriptive Statistics and Interpretation Grading GuideQNT5.docxDescriptive Statistics and Interpretation Grading GuideQNT5.docx
Descriptive Statistics and Interpretation Grading GuideQNT5.docx
 
Proceedings Template - WORD
Proceedings Template - WORDProceedings Template - WORD
Proceedings Template - WORD
 
OL 325 Milestone Three Guidelines and Rubric Section
 OL 325 Milestone Three Guidelines and Rubric  Section OL 325 Milestone Three Guidelines and Rubric  Section
OL 325 Milestone Three Guidelines and Rubric Section
 
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
 

Más de Bayesia USA

BayesiaLab_Book_V18 (1)
BayesiaLab_Book_V18 (1)BayesiaLab_Book_V18 (1)
BayesiaLab_Book_V18 (1)
Bayesia USA
 
vehicle_safety_v20b
vehicle_safety_v20bvehicle_safety_v20b
vehicle_safety_v20b
Bayesia USA
 
Impact Analysis V12
Impact Analysis V12Impact Analysis V12
Impact Analysis V12
Bayesia USA
 

Más de Bayesia USA (15)

BayesiaLab_Book_V18 (1)
BayesiaLab_Book_V18 (1)BayesiaLab_Book_V18 (1)
BayesiaLab_Book_V18 (1)
 
vehicle_safety_v20b
vehicle_safety_v20bvehicle_safety_v20b
vehicle_safety_v20b
 
Impact Analysis V12
Impact Analysis V12Impact Analysis V12
Impact Analysis V12
 
Causality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact AnalysisCausality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact Analysis
 
Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...
Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...
Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...
 
The Bayesia Portfolio of Research Software
The Bayesia Portfolio of Research SoftwareThe Bayesia Portfolio of Research Software
The Bayesia Portfolio of Research Software
 
Bayesian Networks &amp; BayesiaLab
Bayesian Networks &amp; BayesiaLabBayesian Networks &amp; BayesiaLab
Bayesian Networks &amp; BayesiaLab
 
Causal Inference and Direct Effects
Causal Inference and Direct EffectsCausal Inference and Direct Effects
Causal Inference and Direct Effects
 
Knowledge Discovery in the Stock Market
Knowledge Discovery in the Stock MarketKnowledge Discovery in the Stock Market
Knowledge Discovery in the Stock Market
 
Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...
Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...
Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...
 
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisProbabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
 
Microarray Analysis with BayesiaLab
Microarray Analysis with BayesiaLabMicroarray Analysis with BayesiaLab
Microarray Analysis with BayesiaLab
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian Networks
 
BayesiaLab 5.0 Introduction
BayesiaLab 5.0 IntroductionBayesiaLab 5.0 Introduction
BayesiaLab 5.0 Introduction
 
Car And Driver Hk Interview
Car And Driver Hk InterviewCar And Driver Hk Interview
Car And Driver Hk Interview
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Driver Analysis and Product Optimization with Bayesian Networks

  • 1. Tutorial on Driver Analysis and Product Optimization with BayesiaLab Stefan Conrady, stefan.conrady@bayesia.us Dr. Lionel Jouffe, jouffe@bayesia.com December 11, 2010 Revised: March 13, 2013
  • 2. Table of Contents Introduction BayesiaLab 4 Acknowledgements 4 Abstract 4 Bayesian Networks 5 Structural Equation Models 5 Probabilistic Structural Equation Models 5 Tutorial Notation 6 Model Development 6 Dataset 6 Consumer Research 6 Data Import 7 Unsupervised Learning 11 Preliminary Analysis 15 Variable Clustering 20 Multiple Clustering 23 Analysis of Factors 26 Completing the PSEM 32 Market Driver Analysis 38 Product Driver Analysis 44 Product Optimization 44 Conclusion 56 Appendix: The Bayesian Network Paradigm Acyclic Graphs & Bayes’s Rule 57 Compact Representation of the Joint Probability Distribution 58 References Driver Analysis and Product Optimization with BayesiaLab ii www.bayesia.com | www.bayesia.us
  • 3. Contact Information Bayesia USA 60 Bayesia S.A.S. 60 Bayesia Singapore Pte. Ltd. 60 Copyright 60 Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us iii
  • 4. Introduction This tutorial is intended for new or prospective users of BayesiaLab. The example in this tutorial is taken from the field of marketing science and is meant to illustrate the capabilities of BayesiaLab with a real-world case study and actual consumer data. Beyond market researchers, analysts and researchers in many fields will hopefully find the proposed methodology valuable and intuitive. In this context, many of the technical steps are outlined in great detail, such as data preparation and the network learning, as they are applicable to research with BayesiaLab in general, regardless of the domain.1 BayesiaLab Bayesia S.A.S., based in Laval, France has been developing BayesiaLab since 1999 and it has emerged as the leading software package for knowledge discovery, data mining and knowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities as well as in business and industry. The relevance of Bayesian networks, especially in the context of market research, is highlighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007. Acknowledgements We would like to express our gratitude to Ares Research (www.ares-etudes.com) for generously providing data from their consumer research for our case study. Abstract Market driver analysis and product optimization are one of the central tasks in Product Marketing and thus relevant to virtually all types of businesses. BayesiaLab provides a unified software platform, which can, based on consumer data, 1. provide deep understanding of the market preference structure 2. directly generate recommendations for prioritized product actions. The proposed approach utilizes Probabilistic Structural Equation Models (PSEM), based on machine- learned Bayesian networks. PSEMs provide an efficient alternative to Structural Equation Models (SEM), which have been used traditionally in market research. Driver Analysis and Product Optimization with BayesiaLab 4 www.bayesia.com | www.bayesia.us 1 This tutorial is based on version 5.0 of BayesiaLab.
  • 5. Bayesian Networks A Bayesian network or belief network is a directed acyclic graphical model that represents the joint prob- ability distribution over a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between dis- eases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.2 Structural Equation Models Structural Equation Modeling (SEM) is a statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions. This definition of SEM was ar- ticulated by the geneticist Sewall Wright (1921), the economist Trygve Haavelmo (1943) and the cognitive scientist Herbert Simon (1953), and formally defined by Judea Pearl (2000). Structural Equation Models (SEM) allow both confirmatory and exploratory modeling, meaning they are suited to both theory testing and theory development. Probabilistic Structural Equation Models Traditionally, specifying and estimating an SEM required a multitude of manual steps, which are typically very time consuming, often requiring weeks or even months of an analyst’s time. PSEMs are based on the idea of leveraging machine learning for automatically generating a structural model. As a result, creating PSEMs with BayesiaLab is extremely fast and can thus form an immediate basis for much deeper analysis and optimization. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 5 2 See appendix for a brief introduction to Bayesian networks.
  • 6. Tutorial At the beginning of this tutorial, we want to emphasize the overarching objectives of this case study, so we do not lose sight of the “big picture” as we immerse ourselves into the technicalities of BayesiaLab and Bayesian networks. In this study, we want to examine how product attributes perceived by consumers relate to purchase inten- tion for specific products. Put simply, we want to understand the key drivers for purchase intent. Given the large number of attributes in our study, we also want to identify common concepts among these attributes in order to make interpretation easier and communication with managerial decision makers more effective. Secondly, we want to utilize the generated understanding of consumer dynamics so product developers can optimize the characteristics of the products under study in order to increase purchase intent among consum- ers, which is our ultimate business objective. Notation In order to clearly distinguish between natural language, BayesiaLab-specific functions and study-specific variable names, the following notation is used: • BayesiaLab functions, keywords, commands, etc., are shown in bold type. • Variable names are capitalized and italicized. Model Development Dataset Consumer Research This study is based on a monadic3 consumer survey about perfumes, which was conducted in France. In this example, we use survey responses from 1,320 women, who have evaluated a total of 11 fragrances on a wide range of attributes: • 27 ratings on fragrance-related attributes, such as, “sweet”, “flowery”, “feminine”, etc., measured on a 1- to-10 scale. • 12 ratings on projected imagery related to someone, who would be wearing the respective fragrance, e.g. “is sexy”, “is modern”, measured on a 1-to-10 scale. • 1 variable for Intensity, a measure reflecting the level of intensity, measured on a 1-to-5 scale.4 • 1 variable for Purchase Intent, measured on a 1-to-6 scale. Driver Analysis and Product Optimization with BayesiaLab 6 www.bayesia.com | www.bayesia.us 3 a product test only involving one product, i.e. in our study each respondent evaluated only one perfume. 4 The variable Intensity is listed separately due to the a-priori knowledge of its non-linearity and the existence of a “just- about-right” level.
  • 7. • 1 nominal variable, Product, for product identification purposes. Data Import To start the analysis with BayesiaLab, we first import the data set, which is formatted as a CSV file.5 With Data | Open Data Source | Text File, we start the Data Import wizard, which immediately provides a pre- view of the data file. The table displayed in the Data Import wizard shows the individual variables as columns and the responses as rows. There are a number of options available, e.g. for sampling. However, this is not necessary in our example given the relatively small size of the database. Clicking the Next button, prompts a data type analysis, which provides BayesiaLab’s best guess regarding the data type of each variable. Furthermore, the Information box provides a brief summary regarding the number of re- cords, the number of missing values 6 , filtered states, etc. For this example, we will need to override the default data type for the Product variable as each value is a nominal product identifier rather than a numerical scale value. We can change the data type by highlighting the Prod- uct variable and clicking the Discrete check box, which changes the color of the Product column to red. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 7 5 CSV stands for “comma-separated values”, a common format for text-based data files. 6 There are no missing values in our database and filtered states are not applicable in this survey.
  • 8. We will also define Purchase Intent and Inten- sity as discrete variables, as the default number of states of these variables is already adequate for our purposes.7 The next screen provides options as to how to treat any missing values. In our case, there are no missing values so that the corresponding panel is grayed-out. Clicking the small upside-down triangle next to the variable names brings up a window with key statistics of the selected variable, in this case Fresh. The next step is the Discretization and Aggre- gation dialogue, which allows the analyst to determine the type of discretization, which must be performed on all continuous variables.8 For this survey, and given the num- ber of observations, it is appropriate to reduce the number of states from the original 10 states (1 through 10) to a smaller number. One could, for instance, bin the 1-10 rating into low, mid and high, or apply any other arbi- trary method deemed appropriate by the ana- lyst. Driver Analysis and Product Optimization with BayesiaLab 8 www.bayesia.com | www.bayesia.us 7 The desired number of variable states is largely a function of the analyst’s judgment. 8 BayesiaLab requires discrete distributions for all variables.
  • 9. The screenshot shows the dialogue for the Manual selection of discretization steps, which permits to select binning thresholds by point-and-click. For this particular example, we select Equal Distances with 5 intervals for all continuous variables. This was the analyst’s choice in order to be consistent with prior research. Clicking Select All Continuous followed by Finish completes the import process and the 49 variables (columns) from our database are now shown as blue nodes in the Graph Panel, which is the main window for network editing. Note For choosing discretization algorithms beyond this example, the following rule of thumb may be helpful: • For supervised learning, choose Decision Tree. • For unsupervised learning, choose, in the order of priority, K-Means, Equal Distances or Equal Frequencies. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 9
  • 10. This initial view represents a fully unconnected Bayesian network. For reasons, which will become clear later, we will initially exclude two variables, Product and Purchase Intent. We can do so by right-clicking the nodes and selecting Properties | Exclusion. Alternatively, holding “x” while double-clicking the nodes performs the same exclusion function. Driver Analysis and Product Optimization with BayesiaLab 10 www.bayesia.com | www.bayesia.us
  • 11. Unsupervised Learning As the next step, we will perform the first unsupervised learning of a network by selecting Learning | Asso- ciation Discovering | EQ. The resulting view shows the learned network with all the nodes in their original position. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 11
  • 12. Needless to say, this view of the network is not very intuitive. BayesiaLab has numerous built-in layout al- gorithms, of which the Force Directed Layout is perhaps the most commonly used. Driver Analysis and Product Optimization with BayesiaLab 12 www.bayesia.com | www.bayesia.us
  • 13. It can be invoked by View | Automatic Layout | Force Directed Layout or alternatively through the key- board shortcut “p”. This shortcut is worthwhile to remember as it is one of the most commonly used func- tions. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 13
  • 14. The resulting network will look similar to the following screenshot. To optimize the use of the available screen, clicking the Best Fit button in the toolbar “zooms to fit” the graph to the screen. In addition, rotating the graph with the Rotate Left and Rotate Right buttons helps to create a suitable view. The final graph should closely resemble the following screenshot and, in this view, the properties of this first learned Bayesian network become immediately apparent. This network is a now compact representation of the 47 dimensions of the joint probability distribution of the underlying database. Driver Analysis and Product Optimization with BayesiaLab 14 www.bayesia.com | www.bayesia.us
  • 15. It is very important to note that, although this learned graph happens to have a tree structure, this is not the result of an imposed constraint. Preliminary Analysis The analyst can further examine this graph by switching into the Validation Mode, which immediately opens up the Monitor Panel on the right side of the screen. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 15
  • 16. This panel is initially empty, but by clicking on any node (or multiple nodes) in the network, Monitors ap- pear inside the Monitor Panel. The corresponding nodes are highlighted in yellow. Driver Analysis and Product Optimization with BayesiaLab 16 www.bayesia.com | www.bayesia.us
  • 17. By default, the Monitors show the marginal distributions of all selected variables. This shows, for instance, 9.7% of respondents rated their perfume at <=2.8 in terms of the Fresh attribute. On this basis, one can start to experiment with the properties of this particular Bayesian network and query it. With BayesiaLab this can be done in an extremely intuitive way, i.e. by setting evidence (or observations) directly on the Monitors. For instance, we can compute the conditional probability distribution of Flowery, given that we have observed a specific value, i.e. a specific state of Fresh. In formal notation, this would be P(Flowery | Fresh) We will now set Flowery to the state that represents the highest rating (>8.2), and we can immediately ob- serve the conditional probability distribution of Fresh, i.e. P(Fresh | Flowery = " > 8.2") Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 17
  • 18. The gray arrows inside the bars indicate how the distributions have changed compared to the previous distributions. This means that re- spondents, who have rated the Flowery attribute of a perfume at the top level, will have a 67% probability of also assigning a top rating to the Fresh attribute. P(Fresh = " > 8.2"| Flowery = " > 8.2") = 66.9% Switching briefly back into the Modeling Mode and by clicking on the Flowery node, one can see the prob- abilistic relationship between Flowery and Fresh in detail. By learning the network, BayesiaLab has auto- matically created a contingency table for every single direct relationship between nodes. All contingency tables, together with the graph structure, thus encode the joint probability distribution of our original database. Returning to the Validation Mode, we can further examine the properties of our network. Of great interest is the strength of the prob- abilistic relationships between the variables. In BayesiaLab this can be shown by selecting Analysis | Graphic | Arcs’ Mutual Information. Note The structure of our Bayesian network may be directed, but the directions of the arcs do not necessarily have to be meaningful. For observational inference, it is only necessary that the Bayesian network correctly represents the joint probability distribution of the underlying database. Driver Analysis and Product Optimization with BayesiaLab 18 www.bayesia.com | www.bayesia.us
  • 19. The thickness of the arcs is now proportional to the Mutual Information, i.e. the strength of the relationship between the nodes. Intuitively, Mutual Information measures the information that X and Y share: it measures how much know- ing one of these variables reduces our uncertainty about the other. For example, if X and Y are independent, then knowing X does not provide any information about Y and vice versa, so their mutual information is zero. At the other extreme, if X and Y are identical then all information conveyed by X is shared with Y: knowing X determines the value of Y and vice versa. Formal Definition of Mutual Information I(X;Y ) = p(x,y)log p(x,y) p(x)p(y) ⎛ ⎝⎜ ⎞ ⎠⎟ x∈X ∑ y∈Y ∑ We can also show the values of the Mutual Information on the graph by clicking on Display Arc Com- ments. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 19
  • 20. In the top part of the comment box attached to each arc, the Mutual Information of the arc is shown. Below, expressed as a percentage and highlighted in blue, we see the relative Mutual In- formation in the direction of the arc (parent node ➔ child node). And, at the bottom, we have the relative mutual information in the opposite direction of the arc (child node ➔ parent node). Variable Clustering The information about the strength between the manifest variables can also be utilized for purposes of Vari- able Clustering. More specifically, a concept related closely to the Mutual Information, namely the Kullback-Leibler Divergence (K-L Divergence) is utilized for clustering. For probability distributions P and Q of a discrete random variable their K–L divergence is defined to be DKL = (P || Q) = P(i)log P(i) Q(i)i ∑ In words, it is the average of the logarithmic difference between the joint probability distributions P(i) and Q(i), where the average is taken using the probabilities P(i). Driver Analysis and Product Optimization with BayesiaLab 20 www.bayesia.com | www.bayesia.us
  • 21. Such variable clusters will allow us to induce new latent variables, which each represent a common concept among the manifest variables.9 From here on, we will make a very clear distinction between manifest vari- ables, which are directly observed, such as the survey responses, and latent variables, which are derived. In traditional statistics, deriving such latent variables or factors is typically performed by means of Factor Analysis, e.g. Principal Components Analysis (PCA). In BayesiaLab, this “factor extraction” can be done very easily via the Analysis | Graphics | Variable Clus- tering function, which is also accessible through the keyboard shortcut “s”. The speed in which this is performed is one of the strengths of BayesiaLab, as the resulting variable clusters are presented instantly. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 21 9 An alternative approach is to interpret the derived concept or factor as a hidden common cause.
  • 22. In this case, BayesiaLab has identified 15 variable clusters and each node is color-coded according to its cluster membership. To interpret these newly-found clusters, we can zoom in and visually examine the structure in the Graph Panel. To support the interpretation process, BayesiaLab can also display a Dendrogram, which allows the analyst to review the linkage of nodes into variable clusters. The analyst may also choose a different number of clusters, based on his own judgement relating to the do- main. A slider in the toolbar allows to choose various numbers of clusters and the color association of the nodes will be update instantly. By clicking the Validate Clustering button in the toolbar, the clusters are saved and the color codes will be formally associated with the nodes. A clustering report provides us with a formal summary of the new factors and their associated manifest variables.10 Driver Analysis and Product Optimization with BayesiaLab 22 www.bayesia.com | www.bayesia.us 10 Variable cluster = derived concept = unobserved latent variable = hidden cause = extracted factor.
  • 23. The analyst also has the option to use his do- main knowledge to modify which manifest variables belong to specific factors. This can be done by right-clicking on the Graph Panel and selecting Class Editor. Multiple Clustering As our next step towards building the PSEM, we will introduce these newly-generated latent factors into our existing network and also estimate their probabilistic relationships with the manifest variables. This means we will create a new node for each latent factor, creating 15 new dimensions in our network. For this step, we will need to return to the Modeling Mode, because the introduction of the factor nodes into the net- works requires the learning algorithms. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 23
  • 24. More specifically, we select Learning | Multiple Clustering, which brings up the Multiple Clus- tering dialogue. There is a range of settings, but we will focus only on a subset. Firstly, we need to specify an output directory for the to- be-learned networks. Secondly, we need to set some parameters for the clustering process, such as the minimum and maximum number of states, which can be created during the learning process. In our example, we select Automatic Selection of the Number of Classes, which will allow the learning algorithm to find the optimum num- ber of factor states up to a maximum of five states. This means that each new factor will need to represent the corresponding manifest variables with up to five states. Driver Analysis and Product Optimization with BayesiaLab 24 www.bayesia.com | www.bayesia.us
  • 25. The Multiple Clustering process concludes with a report, which shows details regarding the generated clustering. The top portion of the report is shown in the following screen- shot. The detail section of Factor_0, as it relates to the manifest variables, is worth highlighting. Here, we can see the strength of the relation- ship between the manifest variables, such as Trust, Bold, etc., and Factor_0. In a traditional Factor Analysis, this would be the equivalent of factor loading. After closing the report, we will now see a new (unconnected) network, with 15 additional nodes, one for each factor, i.e. Factor_0 through Factor_14, highlighted in yellow in the screenshot. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 25
  • 26. Analysis of Factors We can also further examine how the new factors relate to the manifest variables and how well they repre- sent them. In the case of Factor_0, we want to understand how it can summarize our five manifest variables. By going into our previously-specified output directory, using the Windows Explorer or the Mac Finder, we can see that 15 new networks (in BayesiaLab’s xbl format for networks) were generated. We open the spe- cific network for Factor_0, either by directly double-clicking the xbl file or by selecting Network | Open. The factor-specific networks are identified by a suffix/extension of the format “_[Factor_#].xbl” and “#” stands for the factor number. We then see a network including the manifest variables and with the factor being linked by arcs going from the factor to the manifest variables. Returning to the Validation Mode, we can see five states for Factor_0, labeled C1 through C5, as well as their marginal distribution. As Factor_0 is a target node by default, it automatically appears highlighted in red in the Monitor Panel. Driver Analysis and Product Optimization with BayesiaLab 26 www.bayesia.com | www.bayesia.us
  • 27. Here, we can also study how the states of the manifest variables relate to the states of Factor_0. This can be done easily by setting observations to the monitors, e.g. setting C1 to 100%. We now see that given that Factor_0 is in state C1, the variable Active has a probability of approximately 75% of being in state <=2.8. Expressed more formally, we would state P(Active = “<=2.8” | Factor_0 = C1) = 74.57%. This means that for respondents, who have been assigned to C1, it is likely that they would rate the Active attribute very low as well. In the Monitor for Factor_0, in parentheses behind the cluster name, we find the expected mean value of the numeric equivalents of the states of the manifest variables, e.g. “C1 (2.08)”. That means that given the state C1 of Factor_0, we expect the mean value of Trust, Bold, Fulfilled, Active and Character to be 2.08. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 27
  • 28. To go into even greater detail, we can actually look at every single respondent, i.e. every record in the data- base and see what cluster they were assigned to. We select Inference | Interactive Inference, which will bring up a record selector in the toolbar. With this record selector, we can now scroll through the entire database, review the actual ratings of the respondents and then see the estimation to which cluster each respondent belongs. Driver Analysis and Product Optimization with BayesiaLab 28 www.bayesia.com | www.bayesia.us
  • 29. In our first case, record 0, we see the ratings of this respondent indicated by the manifest Monitors. In the highlighted Monitor for Factor_0, we read that this respondent, given her responses, has an 82% probabil- ity of belonging to Cluster 5 (C5) in Factor_0. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 29
  • 30. Moving to our second case, record 1, we see that the respondent belongs to Cluster 3 (C3) with a 96% probability. We can also evaluate the performance of our new network based on Factor_0 by selecting Analysis | Net- work Performance | Global. Driver Analysis and Product Optimization with BayesiaLab 30 www.bayesia.com | www.bayesia.us
  • 31. This will return the log-likelihood density function as shown in the following screenshot. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 31
  • 32. Completing the PSEM We are now returning to our main task and our principal network, which has been augmented by the 15 new factors. Before we re-learn our network with the new factors, we need to include Purchase Intent as a variable and also impose a number of constraints in the form of Forbidden Arcs. Driver Analysis and Product Optimization with BayesiaLab 32 www.bayesia.com | www.bayesia.us
  • 33. Being in the Modeling Mode, we can include Purchase Intent by right-clicking the node and uncheck Exclu- sion. This makes the Purchase Intent variable available in the next stage of learning, which is reflected visually as well in the node color and the icon. Our desired SEM-type network structure stipulates that manifest variables be connected exclusively to the factors and that all the connections with Purchase Intent must also go through the factors. We achieve such a structure by imposing the following sets of forbidden arcs: 1. No arcs between manifest variables 2. No arcs from manifest variables to factors 3. No arcs between manifest variables and Purchase Intent Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 33
  • 34. We can define these forbidden arcs by right-clicking anywhere on the Graph Panel, which brings up the fol- lowing menu. In BayesiaLab, all manifest variables and all factors are conveniently grouped into classes, so we can easily define which arcs are forbidden in the Forbidden Arc Editor. Driver Analysis and Product Optimization with BayesiaLab 34 www.bayesia.com | www.bayesia.us
  • 35. Upon completing this step, we can proceed to learning our network again: Learning | Association Discover- ing | EQ Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 35
  • 36. The initial result will resemble the following screenshot. Driver Analysis and Product Optimization with BayesiaLab 36 www.bayesia.com | www.bayesia.us
  • 37. Using the Force Directed Layout algorithm (shortcut “p”), as before, we can quickly transform this network into a much more interpretable format. Now we see the manifest variables “laddering up” to the factors, and we also see how the factors are related to each other. Most importantly, we can observe where the Purchase Intent node was attached to the net- work during the learning process. The structure conveys that Purchase Intent has the strongest link with Factor_2. Now that we can see the big picture, it is perhaps appropriate to give the factors more descriptive names. For obvious reasons, this task is the responsibility of the analyst. In this case study, Factor_0 was given the name “Self-Confident”. We add this name into the node comments by double-clicking Factor_0 and scroll- ing to the right inside the Node Editor until we see the Comments tab. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 37
  • 38. We repeat this for all other nodes, and we can subsequently display the node comments for all factors by clicking the Display Node Comment icon in the toolbar or by selecting View | Display Node Comments from the menu. Market Driver Analysis Our Probabilistic Structural Equations Model is now complete, and we can use it to perform the actual analysis part of this exercise, namely to find out what “drives” Purchase Intent. We return to the Validation Mode and right-click on Purchase Intent and then check Set As Target Node. Double-clicking the node while pressing “t” is a helpful shortcut. Driver Analysis and Product Optimization with BayesiaLab 38 www.bayesia.com | www.bayesia.us
  • 39. This will also change the appearance of the node and literally give it the look of a target. In order to understand the relationship between the factors and Purchase Intent, we want to tune out all the manifest variables for the time being. We can do so by right-clicking the Use of Classes icon in the bottom right corner of the screen. This will bring up a list of all classes. By default, all are checked and thus visible. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 39
  • 40. For our purposes, we want to deselect All and then only check the Factor class. The resulting view has all the manifest variables grayed-out, so the relationship between the factors becomes more prominent. By deselecting the manifest variables, we also exclude them from subsequent analysis. Driver Analysis and Product Optimization with BayesiaLab 40 www.bayesia.com | www.bayesia.us
  • 41. We will now right-click inside the (currently empty) Monitor Panel and select Monitors Sorted wrt Target Variable Correlations. The keyboard shortcut “x” will do the same. This brings up the monitor for the target node, Purchase Intent, plus all the monitors for the factors, in the order of the strength of relationship with the Target Node. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 41
  • 42. This immediately highlights the order of importance of the factors relative to the Target Node, Purchase Intent. Another way of comprehensively displaying the importance is by selecting Reports | Target Analy- sis | Correlations With the Target Node “Correlations” is more of a metaphor here, as BayesiaLab actually orders the factors by their Mutual In- formation relative to the target node, Purchase Intent. By clicking Quadrants, we can obtain a type of opportunity graph, which shows the mean value of each factor on the x-axis and the relative Mutual Information with Purchase Intent on the y-axis. Mutual Infor- mation can be interpreted as importance in this context. Driver Analysis and Product Optimization with BayesiaLab 42 www.bayesia.com | www.bayesia.us
  • 43. By right-clicking on the graph, we can switch between the display of the formal factor names, e.g. Factor_0, Factor_1, etc., and the factor comments, such as Adequacy, Seduction, which is much easier for interpreta- tion. As in the previous views, it becomes very obvious that the factor Adequacy is most important with regard to Purchase Intent, followed by the factor Seduction. This is very helpful for understanding the overall market dynamics and for communicating the key drivers to managerial decision makers. The lines dividing the graph into quadrants reflect the mean values for each axis. The upper-left quadrant highlights opportunities as these particular factors are “above average” in importance, but “below average” in terms of their rating. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 43
  • 44. Product Driver Analysis Although this insight is relevant for the whole market, it does not yet allow us to work on improving spe- cific products. For this we need to look at product-specific graphs. In addition, we may need to introduce constraints as to where we may not have the ability to impact any attributes. Such information must come from the domain expert, in our case from the perfumer, who will determine if and how odoriferous com- pounds can affect the consumers’ perception of the product attributes. These constraints can be entered into BayesiaLab’s Cost Editor, which is accessible by right-clicking any- where in the Graph Panel. Those attributes, which cannot be changed (as determined by the expert), will be set to “Not Observable”. As we proceed with our analysis, these constraints will be extremely important when searching for realistic product scenarios. On a side note, an example from the presumably more tangible auto industry may better illustrate such kinds of constraints. For instance, a vehicle platform may have an inherent wheelbase limitation, which thus sets a hard limit regarding the maximum amount of rear passenger legroom. Even if consumers perceived a need for improvement on this attribute, making such a recommendation to the engineers would be futile. As we search for optimum product solutions with our Bayesian network, this is very important to bear in mind and thus we must formally encode these constraints of our domain through the Cost Editor. Product Optimization We now return briefly to the Modeling Mode to include the Product variable, which has been excluded from our analysis thus far. Right-clicking the node and then unchecking Properties | Exclusion will achieve this. At this time, we will also move beyond the analysis of factors and actually look at the individual product attributes, so we select Manifest from the Display Classes menu. Driver Analysis and Product Optimization with BayesiaLab 44 www.bayesia.com | www.bayesia.us
  • 45. Back in the Validation Mode, we can perform a Multi Quadrant Analysis: Tools | Multi Quadrant Analysis This tool allows us to look at the attribute ratings of each product and their respective importance, as ex- pressed with the Mutual Information. Thus, we pick Product as the Selector Node and choose Mutual In- formation for Analysis. In this case, we also want to check Linearize Nodes’ Values, Regenerate Values and specify an Output Directory, where the product-specific networks will be saved. In the process of generating the Multi Quadrant Analysis, BayesiaLab produces one Bayesian network for each Product. For all Prod- ucts the network structure will be identical to the network for the entire market, however, the parameters, i.e. the contingency tables, will be specific to each Product. However, before we proceed to the product-specific networks, we will first see a Multi Quadrant Analysis by Product, and we can select each product’s graph simply by right-clicking and choosing the appropriate product identification number. Please note that only the observable variables are visible on the chart, i.e. those variables which were not previously defined as “Not Observable” in the Cost Editor. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 45
  • 46. For Product No. 5, Personality is at the very top of the importance scale. But how will the Personality at- tribute compare in the competitive context? If we Display Scales by right-clicking on the graph, it appears that Personality is already at the best level among the competitors, i.e. to the far right of the horizontal scale. On the other hand, on the Fresh attribute Product No. 511 marks the bottom end of the competitive range. Driver Analysis and Product Optimization with BayesiaLab 46 www.bayesia.com | www.bayesia.us 11 Any similarities of identifiers with actual product names are purely coincidental.
  • 47. For a perfumer it would thus be reasonable to assume that there is limited room for improvement with re- gard to Personality, and that Fresh perhaps offers a significant opportunity for Product No. 5. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 47
  • 48. To highlight the differences between products, we will also show Product No. 1 in comparison. For Product No. 1 it becomes apparent that Intensity is highly important, but that its rating is towards the bottom end of the scale. The perfumer may thus conclude a bolder version of the same fragrance will im- prove Purchase Intent. Driver Analysis and Product Optimization with BayesiaLab 48 www.bayesia.com | www.bayesia.us
  • 49. Finally, by hovering over any data point in the opportunity chart, BayesiaLab can also display the position of competitors compared to the reference product for any attribute. The screenshot shows Product No. 5 as the reference and the position of competitors on the Personality attribute. BayesiaLab also allows us to measure and save the “gap to best level” (=variations) for each product and each variable through the Export Variations function. This formally captures our opportunity for improve- ment. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 49
  • 50. Please note that these variations need to be saved individually by Product. By now we have all the components necessary for a comprehensive optimization of product attributes: 1. Constraints on “non-actionable” attributes, i.e. excluding those variables, which can’t be affected through product changes. 2. A Bayesian network for each Product. 3. The current attribute rating of each Product and each attribute’s importance relative to Purchase Intent. 4. The “gap to best level” (variation) for each attribute and Product. With the above, we are now in a position to search for realistic product configurations, based on the exist- ing product, which would realistically optimize Purchase Intent. We proceed individually by Product, and for illustration purposes we use Product No. 5 again. We load the product-specific network, which was previously saved when the Multi Quadrant Analysis was performed. Driver Analysis and Product Optimization with BayesiaLab 50 www.bayesia.com | www.bayesia.us
  • 51. One of the powerful features of BayesiaLab is Target Dynamic Profile, which we will apply here on this network to optimize Purchase Intent: Analysis | Report | Target Analysis | Target Dynamic Profile Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 51
  • 52. The Target Dynamic Profile provides a number of important options: • Profile Search Criterion: we intend to optimize the mean of the Purchase Intent. • Criterion Optimization: maximization is the objective. Driver Analysis and Product Optimization with BayesiaLab 52 www.bayesia.com | www.bayesia.us
  • 53. • Search Method: We select Mean and also click on Edit Variations, which allows us to manually stipulate the range of possible variations of each attribute. In our case, however, we had saved the actual variations of Product No. 5 versus the competition, so we load that data set, which subsequently displays the values in the Variation Editor. For example, Fresh could be improved by 10.7% before catching up to the highest-rated product in this attribute. • Search Stop Criterion: We check Maximum Number of Evidence Reached and set this parameter to 4. This means that no more than the top-four attributes will be suggested for improvement. Upon completion of all computations, we will obtain a list of product action priorities: Fresh, Fruity, Flow- ery and Wooded. The highlighted Value/Mean column shows the successive improvement upon implementation of each ac- tion. From initially 3.76, the Purchase Intent improves to 3.92, which may seem like a fairly small step. However, the importance lies in the fact that this improvement is not based on utopian thinking, but rather on attainable product improvements within the range of competitive performance. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 53
  • 54. Initially, we have the marginal distribution of the attributes and the original mean value for Purchase Intent, i.e. 3.77. To further illustrate the impact of our product actions, we will simulate their implementation step-by-step, which is available through Inference | Interactive Inference. With the selector in the toolbar, we can go through each product action step-by-step in the order in which they were recommended. Driver Analysis and Product Optimization with BayesiaLab 54 www.bayesia.com | www.bayesia.us
  • 55. Upon implementation of the first product action, we obtain the following picture and Purchase Intent grows to 3.9. Please note that this is not a sea change in terms of Purchase Intent, but rather a realistic consumer response to a product change. The second change results in further subtle improvement to Purchase Intent: Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 55
  • 56. The third and fourth step are analogous and bring us to the final value for Purchase Intent of 3.92. Although BayesiaLab generates these recommendations effortlessly, they represent a major innovation in the field of marketing science. This particular optimization task has not been tractable with traditional methods. Conclusion The presented case study demonstrates how BayesiaLab can transform simple survey data into a deep un- derstanding of consumers’ thinking and quickly provides previously-inconceivable product recommenda- tions. As such, BayesiaLab is a revolutionary tool, especially as the workflow shown here may take no more than a few hours for an analyst to implement. This kind of rapid and “actionable”12 insight is clearly a breakthrough and creates an entirely new level of relevance of research for business applications. Driver Analysis and Product Optimization with BayesiaLab 56 www.bayesia.com | www.bayesia.us 12 The authors cringe at the inflationary use of “actionable”, but here, for once, it actually seems appropriate.
  • 57. Appendix: The Bayesian Network Paradigm13 Acyclic Graphs & Bayes’s Rule Probabilistic models based on directed acyclic graphs have a long and rich tradition, beginning with the work of geneticist Sewall Wright in the 1920s. Variants have appeared in many fields. Within statistics, such models are known as directed graphical models; within cognitive science and artificial intelligence, such models are known as Bayesian networks. The name honors the Rev. Thomas Bayes (1702-1761), whose rule for updating probabilities in the light of new evidence is the foundation of the approach. Rev. Bayes addressed both the case of discrete probability distributions of data and the more complicated case of continuous probability distributions. In the discrete case, Bayes’ theorem relates the conditional and marginal probabilities of events A and B, provided that the probability of B does not equal zero: P(A∣B) = P(B∣A)P(A) P(B) In Bayes’ theorem, each probability has a conventional name: • P(A) is the prior probability (or “unconditional” or “marginal” probability) of A. It is “prior” in the sense that it does not take into account any information about B; however, the event B need not occur after event A. In the nineteenth century, the unconditional probability P(A) in Bayes’s rule was called the “antecedent” probability; in deductive logic, the antecedent set of propositions and the inference rule imply consequences. The unconditional probability P(A) was called “a priori” by Ronald A. Fisher. • P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the specified value of B. • P(B|A) is the conditional probability of B given A. It is also called the likelihood. • P(B) is the prior or marginal probability of B, and acts as a normalizing constant. Bayes theorem in this form gives a mathematical representation of how the conditional probability of event A given B is related to the converse conditional probability of B given A. The initial development of Bayesian networks in the late 1970s was motivated by the need to model the top- down (semantic) and bottom-up (perceptual) combination of evidence in reading. The capability for bidirec- tional inferences, combined with a rigorous probabilistic foundation, led to the rapid emergence of Bayesian networks as the method of choice for uncertain reasoning in AI and expert systems replacing earlier, ad hoc rule-based schemes. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 57 13 Adapted from Pearl (2000), used with permission.
  • 58. The nodes in a Bayesian network represent variables of interest (e.g. the temperature of a device, the gen- der of a patient, a feature of an object, the occur- rence of an event) and the links represent statistical (informational) or causal dependencies among the variables. The dependencies are quantified by condi- tional probabilities for each node given its parents in the network. The network supports the computation of the posterior probabilities of any subset of vari- ables given evidence about any other subset. Compact Representation of the Joint Probability Distribution “The central paradigm of probabilistic reasoning is to identify all relevant variables x1, . . . , xN in the environment [i.e. the domain under study], and make a probabilistic model p(x1, . . . , xN) of their interac- tion [i.e. represent the variables’ joint probability distribution].” Bayesian networks are very attractive for this purpose as they can, by means of factorization, compactly represent the joint probability distribution of all variables. “Reasoning (inference) is then performed by introducing evidence that sets variables in known states, and subsequently computing probabilities of interest, conditioned on this evidence. The rules of probability, combined with Bayes’ rule make for a complete reasoning system, one which includes traditional deductive logic as a special case.” (Barber, 2012) Driver Analysis and Product Optimization with BayesiaLab 58 www.bayesia.com | www.bayesia.us
  • 59. References Barber, David. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012. Darwiche, Adnan. Modeling and Reasoning with Bayesian Networks. 1st ed. Cambridge University Press, 2009. Heckerman, D. “A Tutorial on Learning with Bayesian Networks.” Innovations in Bayesian Networks (2008): 33–82. Holmes, Dawn E., ed. Innovations in Bayesian Networks: Theory and Applications. Softcover reprint of hardcover 1st ed. 2008. Springer, 2010. Kjaerulff, Uffe B., and Anders L. Madsen. Bayesian Networks and Influence Diagrams: A Guide to Con- struction and Analysis. Softcover reprint of hardcover 1st ed. 2008. Springer, 2010. Koller, Daphne, and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. 1st ed. The MIT Press, 2009. Koski, Timo, and John Noble. Bayesian Networks: An Introduction. 1st ed. Wiley, 2009. Mittal, Ankush. Bayesian Network Technologies: Applications and Graphical Models. Edited by Ankush Mittal and Ashraf Kassim. 1st ed. IGI Publishing, 2007. Neapolitan, Richard E. Learning Bayesian Networks. Prentice Hall, 2003. Pearl, Judea. Causality: Models, Reasoning and Inference. 2nd ed. Cambridge University Press, 2009. ———. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. 1st ed. Morgan Kaufmann, 1988. Pearl, Judea, and Stuart Russell. Bayesian Networks. UCLA Congnitive Systems Laboratory, November 2000. http://bayes.cs.ucla.edu/csl_papers.html. Pourret, Olivier, Patrick Naïm, and Bruce Marcot, eds. Bayesian Networks: A Practical Guide to Applica- tions. 1st ed. Wiley, 2008. Schafer, J.L., and M.K. Olsen. “Multiple Imputation for Multivariate Missing-data Problems: A Data Ana- lyst’s Perspective.” Multivariate Behavioral Research 33, no. 4 (1998): 545–571. Spirtes, Peter; Glymour, Clark. Causation, Prediction and Search. The MIT Press, 2001. Driver Analysis and Product Optimization with BayesiaLab www.bayesia.com | www.bayesia.us 59
  • 60. Contact Information Bayesia USA 312 Hamlet’s End Way Franklin, TN 37067 USA Phone: +1 888-386-8383 info@bayesia.us www.bayesia.us Bayesia S.A.S. 6, rue Léonard de Vinci BP 119 53001 Laval Cedex France Phone: +33(0)2 43 49 75 69 info@bayesia.com www.bayesia.com Bayesia Singapore Pte. Ltd. 20 Cecil Street #14-01, Equity Plaza Singapore 049705 Phone: +65 3158 2690 info@bayesia.sg www.bayesia.sg Copyright © 2013 Bayesia USA, Bayesia S.A.S. and Bayesia Singapore. All rights reserved. Driver Analysis and Product Optimization with BayesiaLab 60 www.bayesia.com | www.bayesia.us