The document describes how TIBCO Spotfire's recommendation engine can help users quickly build dashboards and analyses. It provides two case studies that demonstrate how recommendations reduced dashboard setup time to 30 seconds in the first case and helped identify factors influencing paper towel absorbency in the second. Overall, recommendations dramatically speeds data exploration and insight generation for both business users and analysts.
1. Examples of Spotfire Recommendations in Action
Easy dashboard setup for business users,
dramatically faster creation of full-featured data
analysis applications for analysts
The agile business intelligence market is growing rapidly, and as Gartner points
out, the transition is toward platforms that can be rapidly implemented and used
by analysts and business users to find insights quickly—as well as by IT staff to
quickly build analytics content to meet business requirements and deliver more
timely business benefits.1
This drive for speed is about business value: accuracy
and speed of interpretation for decision-making, authoring, and development
of data discovery applications, and task completion to enable developers to
implement their ideas quickly and obtain accurate insights. 2
This paper describes a recommendation engine for the TIBCO Spotfire®
interactive graphical analysis system. Spotfire Recommendations makes data
discovery fast and easy for both analysts and business users. The system uses
metadata typing and built-in graphics taxonomy to produce a collection of
inherently sensible graphics choices applied to the data at hand. The user chooses
from the suggestions, and the software builds a dashboard of linked, brushable,
configurable graphics with supporting data filters and graphics controls that can
be rapidly applied to the canvas.
1 Rita L. Sallam, Bill Hostmann, Kurt Schlegel, Joao Tapadinhas, Josh Parenteau, Thomas W. Oestreich. Mag-
ic Quadrant for Business Intelligence and Analytics Platforms, Gartner. February 23, 2015.
2 Stephen Few. Information Dashboard Design, Analytics Press, CA. 2015.
TIME IS OF THE ESSENCE
“With a dashboard, every
unnecessary piece of
information results in time
wasted trying to filter out
what’s important, which is
intolerable when time is of
the essence.”2
2. WHITEPAPER | 2
For business users, Recommendations reduces the burden for initial setup of the
dashboard, and for analysts, it dramatically speeds the creation of full-featured
data analysis applications.
Following are two case studies showing Recommendations applied to datasets
for consumer packaged goods manufacturing and for homeless populations in
the United States.
CASE STUDY: GEOLOCATION ANALYSIS OF US HOMELESS
The Department of Housing and Urban Development (HUD) collects data
on homelessness in the US and releases two annual reports to Congress:
the Annual Homelessness Assessment Report (AHAR), Parts 13
and 24
. Part 1
contains information from the annual point-in-time counts (PIT) conducted by
communities nationwide on a single night in January. Part 2 includes information
obtained from homeless shelters throughout the course of a calendar year, the
Homeless Inventory Count (HIC). In March 2015, HUD released the 2013 AHAR
Part 2; Part 1 was released in October 2014.
Raw data is available online at data.hud.gov. We obtained PIT and HIC data
for 2007–2013. Estimates of homeless veterans are included beginning in 2011.
HUD partners with the Veterans Administration on the Veterans Homelessness
Prevention Demonstration Program.
The Housing Inventory Count and point-in-time data are yearly measures across
~400 spatial regions in the US using HUD’s Continuums of Care (CoC) regions. A
shape file describing these regions is available at https://www.hudexchange.info/
coc/gis-tools/.
Key variables in the beds data (HIC [Housing Inventory Count]) are: Shelter
Type (ES [Emergency Shelter], TH [Transitional Housing], RRH [Rapid Re-
Housing], SH [Safe Haven], PSH [Permanent Supportive Housing] and Household
Type (with children, without children, with only children). Key variables in
homeless data (PIT) are: Shelter Access (Sheltered, Unsheltered) and Family
Situation (Individuals, Persons in Families).
We also included US Census data in the analysis for the period 2010–2013,
including counts by state, county, and age group of total population, total male
population, total female population, and male and female populations broken
down by race.
3 https://www.hudexchange.info/resources/documents/2014-AHAR-Part1.pdf
4 https://www.hudexchange.info/onecpd/assets/File/2013-AHAR-Part-2.pdf
3. WHITEPAPER | 3
DATA PANEL
The analysis begins by loading the data, which results in data column names
being organized in a data panel (Figure 1). To start the analysis, the user clicks the
Recommendations icon and selects one or more columns of interest.
Figure 1. Spotfire open with data panel on the left showing homeless data. The
user clicks the Recommended visualizations icon in the center of the canvas to
start the analysis.
The numerical columns in this data panel are the homeless counts (PIT) and beds
(HIC) by year. Other data includes a time variable (YearDate), location (by state
and county) and categorical data relating to the Continuum of Care.
We select homeless and state data initially. Spotfire Recommendations
suggests some maps, a bar chart, and a tree map of homeless by state (Figure 2).
We add a map and treemap by state to the canvas.
Figure 2. Recommendations panel for US homeless data, state selected.
4. WHITEPAPER | 4
HOMELESS, BEDS, AND YEAR
We next add beds, and year. Recommendations responds by suggesting cross
tables, a bar chart trellised by year, and a parallel coordinates plot (Figure 3). We
choose the cross table (lower right) to add to the canvas.
FIGURE 3. Recommendations panel for US homeless data, state, beds, and
year selected.
HOMELESS BY STATE
We now have a map of homeless by state, a treemap by state, and a cross table
of homeless and beds by state. Recommendations has linked and arranged these
three graphs on the canvas. With a few more mouse clicks to configure the
graphs, we have an accurate, interactive summary of homeless in the US (Figure
4). Creating this initial dashboard took approximately 30 seconds.
Figure 4. Dashboard showing map of homeless and utilization of shelters by state.
5. WHITEPAPER | 5
HOMELESS SHELTERS
Using this dashboard as a starting point, we are now able to build a
comprehensive analysis of homeless across the US. This enables us to assess if
there are enough shelters for the homeless on an ongoing regional basis.
Figure 5 shows such a dashboard including a map of homeless utilization by
state, trends of homeless and available beds, beds by shelter type, top states for
bed utilization, and tables of homeless and bed utilization by CoC. The dashboard
addresses the question: Do we have enough shelters for the homeless? Relevant KPIs
are shown across the top and visualizations are arranged for easy interpretation.
Figure 5. Completed dashboard providing a detailed analysis of homeless in the
US during 2007–2013.
The dashboard in Figure 5 is rapidly assembed from that shown in Figure 4. We
calculated bed utilization, configured colors on the map and bar chart, and added
the trend charts and a slider for years at the top.
This dashboard is setup for drill down into regions and times of interest.
All the data is now in shape for continued analysis, and for combining with
additional data. We focus on Massachusetts and incorporate some weather
data into the analysis. We fit contours to the temperature data (zip code) and
display precipitation by size of circle. Color of contour lines and circles indicates
temperature (red is warmer and blue is colder).
Figure 6 shows this updated analysis for Massachusetts. Note that the pockets
of high homeless utilization to the southeast of Boston coincide with milder
temperatures and lower precipitation.
6. WHITEPAPER | 6
Figure 6. Completed dashboard with drill-down to homeless in Massachusetts.
Weather data has been incorporated: contours are fit to the temperature data,
and precipitation is shown in circles (larger circles indicate more precipitation).
Color of contour lines and circles indicates temperature (red is warmer and blue
is colder). Note that the pockets of high homeless utilization to the southeast of
Boston coincide with milder temperatures and lower precipitation.
CASE STUDY: PAPER TOWEL MANUFACTURING
Paper towel manufacturing involves equipment including dryers, dyes, feeders,
cutting and pattern machines, and a series of process steps. Product quality is
assessed via measurements of quality characteristics like absorbancy, strength,
and softness. Large quantities of data are collected on machines and on process
times for each batch at every process step.
The data under consideration in this simple example includes measurements
of product quality and equipment operation and performance. One goal of the
analysis is to assess effects of equipment on product quality.
DATA PANEL AND COLUMN NAMES
The analysis begins by loading the data, which results in data column names
being organized in a data panel. To start the analysis, the user clicks the
Recommendations icon and selects one or more columns of interest.
The numerical columns in this dataset are the measured product quality
characteristics. Selecting these columns produces histograms, density plots, and
tables in the Recommendations panel. Basic versions of the actual visualizations
are displayed (not canned representations of generic chart types), so the user can
see the shape of the distribution directly in the panel, (Figure 7).
7. WHITEPAPER | 7
FIGURE 7. Recommendations panel with numeric columns selected. The
absorbency histogram is shown in the lower right.
HISTOGRAMS
Paper towel softness appears to be normally distributed (upper right). Selecting
each of the other product quality characteristics indicates that most of them
are normally distributed as well. However, when absorbency is selected, the
histogram shows a bi-modal distribution (lower right). This is an interesting
finding that needs an explanation. The absorbency histogram is selected and
added to the analysis by clicking on it in the Recommendations display (Figure 7).
PROCESS DATES
To investigate whether the changes in absorbency correlate to any temporal
effects, a column with process dates is now selected. In this case, a line plot, bar
chart, tree map, table, and other graphics (not showing) are then displayed in the
Recommendations panel (Figure 8). The line plot shows a dramatic change in
absorbency over time (Figure 8, top left), so it is selected and added to the analysis.
Figure 8. Recommendations panel with absorbency numeric column and a time
column selected in the Data Panel. Why does absorbency increase in the later
part of July? The line plot is selected (clicked) to add it to the analysis.
8. WHITEPAPER | 8
Additional line plots are available from Recommendations when the numeric
variables are chosen along with a time variable (Figure 9). Absorbency (lower
left) stands out as being more affected by time (day of the month).
Figure 9. Recommendations panel with numeric columns and time (day of the
month) selected. Absorbency is the red trace in the lower left.
MACHINE USE IN TWO PROCESS STEPS
To investigate whether the machines may be affecting absorbency at one or more
process steps, additional categorical columns, each containing the machine used
at a process step, are selected. The most relevant recommended graph is an
absorbency line plot, colored by machines at one step and trellised by machines
at the second step (Figure 10, top right). This is chosen and added to the analysis.
Figure 10. Recommendations panel with absorbency numeric column, a time
column, and two categorical process machine columns selected.
9. WHITEPAPER | 9
DAY OF MONTH VS. LOW ABSORBENCY
The Recommendations panel is then closed to view the useful visualizations that
have been added. Marking is set to correlate colors for corresponding points
in the line plot and histogram (Figure 11). It is clear that low absorbency in the
histogram correlates to batches processed before July 17th, and high absorbency
correlates to batches processed on or after July 17th.
Figure 11. Analysis with line plot and histogram added from the
Recommendations panel.
ANALYSIS OF VARIANCE
To understand the paper towel absorbency variability, we use the Spotfire analysis of
variance (ANOVA) function. This enables an assessment of the effects of the machine
process steps on the measured product quality characteristics. In the ANOVA setup
dialog (Figure 12), absorbency is selected as a response variable (Y), and the process
step equipment (tool) columns are selected as explanatory variables (X).
Figure 12. ANOVA analysis setup dialog.
10. WHITEPAPER | 10
ANOVA results (Figure 13) are presented as a sorted summary table with one row per
response predictor pair (product quality characteristic and process step equipment).
Each row contains a p-value indicating statistical significance. The most significant
relationship is the effect of Dryer Tool on absorbency. Marking this row produces a
drill-down box plot showing that Dryer Tool DY1 produces paper towel batches with
significantly higher absorbency than Dryer Tools DY2 and DY3.
Figure 13. ANOVA results.
DRYER TOOLS
Since the ANOVA results indicate that Dryer Tool is responsible for the variation in
absorbency, the linked line plot and histogram produced with Recommendations are
then configured to show this clearly (Figure 14). The earlier dashboard graphics are
colored to distinguish between the Dryer Tools, and the line plot X-axis is changed
to the Dryer process date. This shows that only Dryer Tools DY2 and DY3 were in use
prior to July 13th and only DY1 was used on or after July 13th. It is now clear that the
increase in paper towel absorbency is related to the switch to Dryer DY1. Additional
investigation is warranted to determine how DY2 and DY3 could be modified to
match the superior absorbency results obtained with DY1.
Figure 14. Reconfigured discovery analysis for presentation; line plots and
histograms colored according to Dryer Tool.