1. H T T P : / / W W W. A N A LY T I C S - M A G A Z I N E . O R G
JULY/AUGUST 2014DRIVING BETTER BUSINESS DECISIONS
BROUGHT TO YOU BY:
WHY ANALYTICS
PROJECTS
FAIL
ALSO INSIDE:
• Dark side of digital world
• Real-time text analytics
• Data scientists’ time to shine
• The future of forecasting
Key considerations
for deep analytics
on big data,
learning and
insights
Executive Edge
Hewlett-Packard
V. P. Rohit Tandon:
Six ways of
value creation via
E-commerce analytics
2. W W W. I N F O R M S . O R G2 | A N A LY T I C S - M AGA Z I N E . O R G
What I learned today
INSIDE STORY
One of the advantages of editing
Analytics (as well as OR/MS Today, the
membership magazine of INFORMS) is I
learn something new every day, thanks to
the wide array of contributed articles we
receive. For example, just in preparing
this issue, I learned:
• Nearly 20 years ago, Amazon found-
er Jeff Bezos said that Amazon intended
to sell books at or near cost as a way
of gathering data on affluent, educated
shoppers, as reported by George Packer
in The New Yorker. The implication: The
data, once analyzed, had more value
than the loss-leader books, which proved
absolutely correct when Amazon began
selling everything under the sun to well-
targeted consumers.
Drawing on Packer’s article, as well
as a couple of books (“Who Owns the
Future?” and “The Ethics of Big Data”),
Vijay Mehrotra explores the dark side
of technology, big data and analytics –
and the perceived and/or potential threat
it poses – in his Analyze This! column.
Don’t miss it.
• A Formula 1 pit crew, working in an
optimized, well-coordinated fashion, can
change a set of four tires in less than two
seconds. That means that unless you’re
Evelyn Wood, that crew can change
12 tires in the time it takes you to read
this sentence. For the story behind the
motorsports magic, check out Andy
Boyd’s Forum column. Seeing is be-
lieving, so don’t miss the amazing videos
referenced at the end of the article.
• We all know the digital/technical
world will come to a wordy end without
acronyms, but do you know what MOOC
stands for? I do (“massively open online
course”), thanks to an interview I did with
executive search honcho Linda Burtch
regarding the red-hot analytics job market.
• Finally, I also learned from Linda
that in today’s dynamic world, young
people should plan on three or four ca-
reers during their lifetime. “It’s not good
to specialize in one thing and try to stick
with one company or one industry or one
vertical application for your entire ca-
reer,” she says in the Q&A. “It’s incredibly
dangerous, and it likely won’t carry you
through a 35-year career. You need to be
continuously learning something new.”
I got that last part going for me,
every day.
– PETER HORNER, EDITOR
peter.horner@mail.informs.org
3. OPTIMIZEYOUR BUSINESS
WITH UNPRECEDENTED SPEED
info@aimms.com | +1 425 458 4024
To learn more about AIMMS Optimization Apps, visit aimms.com.
TO YOUR ENTERPRISE
OPTIMIZATION
APP STORE
PUBLISHED
INSTANTLY
IN A FEW
DAYS
PROOF OF
CONCEPT
IN A FEW
WEEKS
OPTIMIZATION APP
IN A FEW
MONTHS
MISSION CRITICAL
ENTERPRISE APP
IN A FEW
HOURS
IDEA
4. W W W. I N F O R M S . O R G4 | A N A LY T I C S - M AGA Z I N E . O R G
DRIVING BETTER BUSINESS DECISIONS
C O N T E N T S
FEATURES
REAL-TIME TEXT ANALYTICS
By Aveek Mukhopadhyay and Roger Barga
How a cloud-based analytical engine yields instant insight using
unstructured social media data.
WHY DO ANALYTICS PROJECTS FAIL?
By Haluk Demirkan and Bulent Dal
Not just another IT project: Key considerations for deep analytics
on big data, learning and insights.
‘IT’S THEIR TIME TO SHINE’
By Peter Horner
Job prospects for data scientists and elite analytics professionals
have never been better – and the future is even brighter.
ANALYTICS TRANSFORMS A ‘DINOSAUR’
By Brenda Dietrich, Emily Plachy and Maureen Norton
The story of how industry giant IBM not only survived but
thrived by realizing business value from big data.
THE FUTURE OF FORECASTING
By Jack Yurkiewicz
Making predictions from hard and fast data: Biennial survey
of popular software for analytics professionals.
34
44
54
62
70
54
62 70
34
JULY/AUGUST 2014
Brought to you by
5. Tel 775 831 0300 • Fax 775 831 0314 • info@solver.com
AnAlytic Solver PlAtform
visualize, Analyze, Decide with Power Bi + Premium Solver
Before your company spends a year and a small fortune
on “advanced analytics”, shouldn’t you find out what
your people can do with the latest enhancements to
the tool they already know – Microsoft Excel – in
business intelligence and advanced analytics today?
Did you know that with Power Pivot in Excel 2013 and
2010, your Excel desktop can easily analyze 100 million
row datasets, with the power of Microsoft’s SQL Server
Analysis Services xVelocity engine inside Excel?
Did you know that with Power Query in Excel, you can
extract, transform and load (ETL) data from virtually any
enterprise or cloud database with point-and-click ease?
Did you know that with Analytic Solver Platform in
Excel, you can create powerful data mining, forecasting
and predictive analytics models, rivaling the best-known
statistical packages, again with point-and-click ease?
Did you know that with Analytic Solver Platform, you can
build sophisticated Monte Carlo simulation, risk analysis,
conventional and stochastic optimization models, using
the world’s best solvers, and modeling tools proven in
use by over 7,000 companies?
Did you know that with Power View and Frontline’s
XLMiner Data Visualization, you can visualize not only
your data, but the results of your analytic models?
Now you know that with Microsoft’s Power BI and
Frontline’s Premium Solver App, you can publish your
Excel workbook to Office 365 in the cloud, share your
visualizations, refresh from on-premise databases, and
re-optimize your model for new decisions immediately.
Find Out More, Download Your Free Trial Now
Visit www.solver.com/powerbi to learn more, register
and download a free trial – or email or call us today.
7. Tel 775 831 0300 • Fax 775 831 0314 • info@solver.com
AnAlytic Solver PlAtform
easy to Use, industrial Strength Predictive Analytics in excel
How can you get results quickly for business decisions,
without a huge budget for “enterprise analytics”
software, and months of learning time? Here’s how:
Analytic Solver Platform does it all in Microsoft Excel,
accessing data from PowerPivot and SQL databases.
Sophisticated Data Mining and Predictive Analytics
Go far beyond other statistics and forecasting add-ins
for Excel. Use classical multiple regression, exponential
smoothing, and ARIMA models, but go further with
regression trees, k-nearest neighbors, and neural
networks for prediction, discriminant analysis, logistic
regression, k-nearest neighbors, classification trees,
naïve Bayes and neural nets for classification, and
association rules for affinity (“market basket”) analysis.
Use principal components, k-means clustering, and
hierarchical clustering to simplify and cluster your data.
Simulation, Optimization and Prescriptive Analytics
Analytic Solver Platform also includes decision trees,
Monte Carlo simulation, and powerful conventional and
stochastic optimization for prescriptive analytics.
Help and Support to Get You Started
Analytic Solver Platform can help you learn while
getting results in business analytics, with its Guided
Mode and Constraint Wizard for optimization, and
Distribution Wizard for simulation. You’ll benefit from
User Guides, Help, 30 datasets, 90 sample models, and
new textbooks supporting Analytic Solver Platform.
Surprising Performance on Large Datasets
Excel’s ease of use won’t limit what you can do – Analytic
Solver Platform’s fast, accurate algorithms rival the
best-known statistical software packages.
Find Out More, Download Your Free Trial Now
Visit www.solver.com to learn more, register and
download a free trial – or email or call us today.
8. W W W. I N F O R M S . O R G8 | A N A LY T I C S - M AGA Z I N E . O R G
Increasing popularity and access to the Internet
has changed the way marketers are interacting with
customers. These customers are smart, well informed
and empowered, as Internet connectivity is available
to them at their fingertips and on the go. It has there-
fore become imperative for organizations to be on the
customers’ online radar with respect to new products or
services and to be able to influence their choices.
Not surprisingly, according to one study, 34 percent
of marketers are generating leads through Twitter. In-
dia’s online retail market grew at a staggering 88 per-
cent in 2013 to $16 billion and continues to grow. These
examples are a testimony to the growth of e-commerce.
The Internet deluge has opened an assortment of op-
portunities. Customers are able to buy high-end fashion
and designer shoes, book hotels, buy movie tickets and
you-name-it.
Therefore, an opportunity exists for business re-
search to capture, compile, churn and store colos-
sal bytes of information about customers, suppliers
and operations. This is what we call the age of “big
data.” We believe that this age is a natural progres-
sion in online business and is here to stay. We are al-
ready seeing a surge in adoption of digital channels
such as social media, e-mail marketing and display
ads in e-commerce. Imagine the amount of data this
It has become imperative
for organizations to be
on the customers’ online
radar with respect to
new products or services
and to be able to
influence their choices.
BY ROHIT TANDON
AND SHRUTI UPADHYAY
Six ways of value-creation
through analytics in
E-commerce
EXECUTIVE EDGE
9. Tel 775 831 0300 • Fax 775 831 0314 • info@solver.com
AnAlytic Solver PlAtform
from Solver to full-Power Business Analytics in excel
The Excel Solver’s Big Brother Has Everything You
Need for Predictive and Prescriptive Analytics
From the developers of the Excel Solver, Analytic Solver
Platform makes the world’s best optimization software
accessible in Excel. Solve your existing models faster,
scale up to large size, and solve new kinds of problems.
FromLinearProgrammingtoStochasticOptimization
Fast linear, quadratic and mixed-integer programming is
just the starting point in Analytic Solver Platform. Conic,
nonlinear, non-smooth and global optimization are just
the next step. Easily incorporate uncertainty and solve
with simulation optimization, stochastic programming,
and robust optimization – all at your fingertips.
Ultra-FastMonteCarloSimulationandDecisionTrees
Analytic Solver Platform is also a full-power tool for
Monte Carlo simulation and decision analysis, with a
Distribution Wizard, 50 distributions, 30 statistics and
risk measures, and a wide array of charts and graphs.
Comprehensive Forecasting and Data Mining
Analytic Solver Platform samples data from Excel,
PowerPivot, and SQL databases for forecasting and data
mining, from time series methods to classification and
regression trees, neural networks and association rules.
And you can use visual data exploration, cluster analysis
and mining on your Monte Carlo simulation results.
Find Out More, Download Your Free Trial Now
Analytic Solver Platform comes with Wizards, Help, User
Guides, 90 examples, and unique Active Support that
brings live assistance to you right inside Microsoft Excel.
Visit www.solver.com to learn more, register and
download a free trial – or email or call us today.
10. W W W. I N F O R M S . O R G10 | A N A LY T I C S - M AGA Z I N E . O R G
EXECUTIVE EDGE
has created for marketers to lay their hands on for
analysis. Despite that, in the race to utilize the on-
line space, marketers may be focusing more on ad-
vertising and less on analysis of the data that could
potentially increase sales.
In our opinion, understanding the customer
behavior becomes more complex in business-to-
consumer companies and more so in a 24/7 e-com-
merce business that sells technology products in an
increasingly commoditized industry. A strong analyt-
ics foundation may make e-commerce a thriving and
successful channel of sales. Businesses, therefore,
are increasingly creating customizable campaigns
for their installed base customers and improving
sales effectiveness through e-commerce.
For example, pricing and merchandising deci-
sions need to be taken in real time, and the need to
have real-time insights is ever-increasing. To make
these decisions faster and better, marketers would
need to quickly analyze their digital marketing strate-
gies by mining data exhaustively and cost effectively
through advanced analytics.
KEY DRIVERS OF INCREASED REVENUES
An organization’s ability to achieve its goal of
increased revenues and margins would depend
heavily on its ability to improve three key drivers: 1)
volume of customer traffic to the online store (num-
ber of visits); 2) customer conversion (percentage of
conversion); and 3) basket size (revenue per aver-
age order size). Analytics has a very important role
to play in this value chain. So while organizations
may have the best talent with an analytical mindset
and eagerness to apply it, we need to equip data
In the race to utilize
the online space,
marketers may be focusing
more on advertising
and less on analysis
of the data
that could potentially
increase sales.
11. A NA L Y T I C S J U LY / AU G U S T 2 014 | 11
scientists in organizations with the right
tools and insights.
Conversations with analytics profes-
sionals reiterate our belief in some of the
following must-haves that will elevate an
organization’s e-commerce agenda to the
next level:
1. Development of best-in-class
tools and techniques are a must to
build scalable solutions and tackle the
optimization of key drivers.
Over the years various products such
as SAS have provided excellent devel-
opment environments, but every data
scientist had to start from scratch and
depend on their “personal” techniques to
tackle new problems. However, in recent
years, data scientists and organizations
are now moving toward using templates
and building packaged models and solu-
tions to reuse and replicate technologies
with ease.
One of the first such pilot solutions with-
in HP was developed for HPDirect.com’s
demand generation function, where global
analytics developed V.1 of a series of de-
mand generation models. These models
also paved the way for the development of
www.leeds.colorado.edu/ms
303-492-8397
leedsms@colorado.edu
Stand Out.
Put yourself in a lucrative new career.
Apply now for a master’s degree in business
analytics or supply chain management.
• Intensive nine month programs
• World-renowned faculty
• Experiential projects with industry clients
• Personalized professional development
analytics_Layout 1 4/25/14 12:51 PM Page 1
12. W W W. I N F O R M S . O R G12 | A N A LY T I C S - M AGA Z I N E . O R G
customer targeting models. In most organizations,
such initiatives if implemented have the potential to
lay the foundation for similar opportunities with other
business functions such as planning, store opera-
tions and category management. When an organi-
zation reaches such a stage of maturity, that’s when
true “return on data” (ROD) is possible.
2. The three Ws …whom, what, when. Tradi-
tionally, marketers have used a uni-dimensional ap-
proach to target customers. However, results show
that these can be sub-optimal and might have an
adverse effect on customer loyalty and brand image.
Answering questions such as whom to target, what
to offer and when to offer bring a paradigm shift in
garnering customer interest and loyalty. These help
rank customers on their propensity to re-purchase,
and lead to preferential treatment of the right cus-
tomers with the right product portfolio or allow mar-
keters to understand when to offer discounts.
Effective tools and modeling will also note clues
on probability of customers picking one product over
another or repeat customer behaviors. This brings
us back to the importance of using effective, proven
analytics tools and techniques.
3. Automate and innovate. Creating and
applying big data algorithms will help organizations
in taking appropriate actions. Many of them are
programmed automatically, save time and allow
better decisions faster. Creating a robust tool-based
ecosystem that allows creation of funnels that track
visitors, bounce rates, conversations, etc., is vital to
a successful Web analytics initiative.
Answering questions
such as whom to target,
what to offer and
when to offer bring
a paradigm shift in
garnering customer
interest and loyalty.
EXECUTIVE EDGE
13. J U LY / AU G U S T 2 014 | 13A NA L Y T I C S
4. Site search analytics. Tracking
site search is a very useful resource that
allows you to know what your visitors are
looking for in your website. Is the search
engine directing the customer to your web-
site or redirecting them to the next best op-
tion in absence of the product? Keeping
tabs on this will help companies increase
customer loyalty and sales.
Another application of site search an-
alytics allows you to understand what is
being searched on your website. By under-
standing this, marketers can influence the
site layout and design so that visitors are
able to easily locate answers to common
queries or the most searched products.
5. Marketing spend optimization.
HP’s online store uses a mix of marketing
vehicles to reach different customer seg-
ments with different communication and
buying preferences. Optimizing spend on
various marketing vehicles is critical to
optimizing demand generation efforts as
well. However, determining which market-
ing mix is most beneficial to the business
is not an easy process, requiring not only
a scientific approach to analyzing spend
and revenue, but also a test-learn-opti-
mize culture. For example, ongoing anal-
ysis of the response to different types of
marketing vehicles helps in identifying the
best fit for a particular type of message.
Based on such analysis, one can decide
if a banner would work best vis-à-vis a
customized landing page, or would an
e-mail campaign be the best option.
6. Connect marketing with ware-
housing. In large supply chain environ-
ments, an accurate forecast of orders
that get shipped out of the warehouse on
a daily basis can be tracked using pre-
dictive analytics methodologies to en-
able accurate warehouse space/staffing
allocation in order to meet the aggressive
shipping timeline.
In conclusion, marketers can apply
data mining and advanced analytical skills
to derive key insights to better understand
drivers of Web traffic and reasonably ac-
curate traffic forecast for use in business
planning. We sense that if companies use
data accurately, they can easily exhibit
a three to five times growth of the online
business and will make analytics easily
replicable across different functions of the
organization.
Rohit Tandon is vice president of corporate strategy
and worldwide head of Global Analytics at Hewlett-
Packard. As part of HP’s corporate strategy team, he
helps drive the analytics ecosystem to support HP’s
vision and priorities through delivery of cutting-edge
analytical capabilities across sales, marketing, supply
chain, finance and HR domains. He was recently
named one of the top-10 most influential analytics
leaders in India for 2014 by Analytics India Magazine.
Shruti Upadhyay is a manager with HP Global
Analytics.
14. W W W. I N F O R M S . O R G14 | A N A LY T I C S - M AGA Z I N E . O R G
BY VIJAY MEHROTRA
ANALYZE THIS!
Given my love of books, it is perhaps not surpris-
ing that Amazon.com – where, thanks to the digital
technologies of today, a plethora of books can imme-
diately be found about nearly any idea that pops into
my head and be delivered (free with Amazon Prime
membership!) to my doorstep with remarkable speed
– is a website that I love deeply. Like many avid read-
ers, I purport to do my best to support my local inde-
pendent booksellers, but too often there is simply no
denying the powerful pull of the super convenient,
instantly gratifying, highly personalized Amazon.com
experience.
Thanks to my bi-monthly book club, I recently read
“Who Owns the Future?” by Jaron Lanier, a celebrat-
ed technologist and MacArthur “genius” award winner
best known for his contributions to the field of virtual
reality. Lanier is known as a big thinker, and in this
book – at once rambling, provocative and thoughtful
– he once again shows why.
“WOTF” begins with a bleak assessment of where
digital technology is leading us all. The main thrust of
Lanier’s argument is as follows:
• Technology makes it very easy to give away for
free a lot of things that people find valuable – just
Dark side of the
digital world
“In the book business
the prospect of a single
owner of both the means
of production and the
modes of distribution is
especially worrisome ...”
— George Packer
Big data, unintended consequences: What Amazon’s domination of the
book publishing industry could portend.
15. J U LY / AU G U S T 2 014 | 15A NA L Y T I C S
think about the search engine. Being
human, we are conditioned to love the
chance to get something for nothing,
and we have gratefully grabbed at it with
both hands.
• However, the value that technology
grants us is not actually free. In
exchange, we tacitly give up information
about ourselves, which is then stored
as data.
• Thanks largely to analytics
professionals, this data is then pooled
and analyzed to create a variety of
commercial opportunities that would not
otherwise exist.
• This commercial wealth confers
extraordinary power upon those who
own the technologies that capture and
analyze this data (Lanier calls them
“Siren Servers”).
• This power in turn enables the
owners of the Siren Servers to have a
huge impact on the society that we live
in, including employment, government,
culture and ideas.
• Taken to their logical conclusions,
Your one-stop shop to view top presentations from key INFORMS meetings
Your latest member benefit lets you learn from the best on your schedule.
http://livewebcast.net/INFORMS_Video_Learning_Center
video learning center
NOW ONLINE! 2014 Edelman Presentations
2013 Analytics Conference and Annual Meeting
2012 Analytics Conference and Annual Meeting
2011 Analytics Conference and Annual Meeting
2010 Practice Conference and Annual Meeting
2009 Annual Meeting
16. W W W. I N F O R M S . O R G16 | A N A LY T I C S - M AGA Z I N E . O R G
ANALYZE THIS!
all of this ultimately dooms the human
species to a very sad and cataclysmic
ending.
Along the way, Lanier also wanders
off into pleasantly intense digressions on
a broad variety of somewhat related top-
ics, including Aristotle, the tenure system,
biodiversity and the concept of local op-
tima. He too clearly loves to read.
IMPACT ON PUBLISHING
While still digesting this thought-
provoking book, I came across George
Packer’s recent article entitled “Is
Amazon good for books?” Taking a long
hard look at Amazon.com, the website
that perhaps most fully embodies Lanier’s
concept of a Siren Server, Packer finds
that many of Lanier’s more dire predic-
tions are already playing out there.
Packer’s particular focus is Amazon’s
impact on the publishing industry, and he
believes that the stakes here are incred-
ibly high: “In the book business the pros-
pect of a single owner of both the means
of production and the modes of distribu-
tion is especially worrisome; it would give
Amazon more control over the exchange
of ideas than any company in U.S. histo-
ry. Even in the iPhone age, books remain
central to American intellectual life, and
perhaps to democracy.”
I wholeheartedly agree.
Just as Lanier predicts, suppliers
and consumers alike had originally both
rushed to embrace Amazon, for like so
many technologies it seemed to magical-
ly (that is, without cost) provide all parties
with something for which they hungered.
As Packer writes, “When Amazon
emerged, publishers in New York sud-
denly had a new buyer that paid quickly,
sold their backlist as well as new titles,
and, unlike traditional bookstores, made
very few returns” – generating fresh rev-
enues for publishers with little incremen-
tal investment. Meanwhile, we readers
flocked to Amazon in droves for its con-
venience, its variety, and its low prices.
Amazon.com today accounts for
more than 40 percent of all printed books
purchased as well as 65 percent of all
eBooks, so it is probably fair to say that
book buyers by and large still love Ama-
zon. For us as readers, this is fortuitous,
since the number of independent book-
stores in business has declined by more
than 50 percent since Amazon’s found-
ing. However, as its share of overall book
sales has ballooned, Amazon has taken
advantage of its market power to aggres-
sively push the terms of its agreements
with book publishers dramatically in its
own favor, often through tactics reflect-
ing Amazon’s famously secretive and
opaque corporate culture. Meanwhile,
Packer reports, the many publishers large
and small whose businesses are now
18. W W W. I N F O R M S . O R G18 | A N A LY T I C S - M AGA Z I N E . O R G
ANALYZE THIS!
dependent on Amazon for much of their dis-
tribution and revenues are learning firsthand
that, as Lanier sharply points out, “information
supremacy for one company becomes, as a
matter of course, a form of behavior modifica-
tion for the rest of the world.”
Packer’s article also describes an Amazon
culture that places a very low value on human
beings that are involved with development, pro-
motion and distribution of books, placing its faith
in algorithms rather than editors and relying on
volunteer (that is, free) reviewers to take the
place of staff writers. All of this serves as a real
illustration of Lanier’s premise that as more and
more aspects of the enterprise are mediated by
software, those in the business of carefully cre-
ating content (rather than digitally distributing it)
will be increasingly de-valued and many forms
of employment that have long-term value to our
culture will subsequently perish.
ELIMINATING THE GATEKEEPERS
While Amazon’s efforts at actually serving
as a publisher have so far failed, it is clear
that we can expect them to continue to pur-
sue the holy grail of “eliminating the gate-
keepers” from the world of publishing by
producing its own original content. Indeed,
one comes away from Packer’s article with
the feeling that if Amazon’s founder and CEO
Jeff Bezos could eliminate the need for au-
thors and publishers by replacing them with
automated content-generating software, he
would not hesitate for an instant.
As more and more aspects
of the enterprise are
mediated by software, those
in the business of
carefully creating content
(rather than digitally
distributing it) will be
increasingly devalued.
19. J U LY / AU G U S T 2 014 | 19A NA L Y T I C S
In fact, book distribution has from the
outset been only a small part of Bezos’
vision. The real prize for Bezos has been
the access to reams of consumer data
and the ability to analyze this data for fun
and profit. According to Packer, as early
as 1995, Bezos had publicly stated that
“Amazon intended to sell books as a way
of gathering data on affluent, educated
shoppers.” Indeed, today the $5.25 billion
in book sales makes up only 7 percent
of Amazon’s total revenues. This too is
just as Lanier predicts in “WOTF,” which
may be why it was somehow not available
directly from Amazon.com when I looked
for it the other day (it has since been
restored somehow).
One book that I was able to find on
Amazon.com was “Ethics of Big Data,”
in which author Kord Davis asks a num-
ber of more fundamental questions
about data and its place in the business
world. As a longtime software/IT pro-
fessional with a deep grounding in phi-
losophy and the history of technology,
Davis is equally comfortable discussing
INFORMS is the foremost association of O.R. and analytics experts. Our
members literally wrote the book on how analytics and the principles of
operations research are used to improve organizational decision making.
To find an
expert to help
you, log onto
INFORMS
Find An
Analytics
Consultant
Database
informs.org/Find-Analytics-Consultant/Search
20. W W W. I N F O R M S . O R G2 0 | A N A LY T I C S - M AGA Z I N E . O R G
ANALYZE THIS!
topics as diverse as digital strategy, supply
chain optimization, application development
and values-based management. As such, he
has a unique perspective that motivates him
to take these important – and very thorny –
questions seriously. As he writes in the book’s
Preface, “nobody in history has ever had the
opportunity to innovate, or been faced with
the risks of unintended consequences, that
big data now provides.”
In particular, Davis identifies four
major aspects of any serious data ethics
discussion:
• Identity: In the digital world, who we
are is tacitly defined by the data we leave
behind and indeed our own sense of self
is often tightly intertwined with our online
activities. Davis points out that capturing
and analyzing our digital trail “provides
others the ability to quite easily summarize,
aggregate or correlate various aspects of
our identity – without our participation or
consent.”
• Privacy: Does your decision to
engage in a digital interaction confer
upon other entities the right to utilize data
captured in the course of that specific
interaction, and to link it to other sources
of data that may correspond to you? As
Davis asks, “Does privacy mean the same
thing in both online and offline worlds?…
should individuals have a legitimate ability
to control data about themselves, and to
what degree?”
“Nobody in history has
ever had the opportunity
to innovate, or been faced
with the risks of
unintended consequences,
that big data now
provides.”
— Kord Davis
21. SCHOLARSHIP FOR SERVICE PROGRAM
Undergraduate, graduate, and doctoral students pursuing degrees
in Science, Technology, Engineering, Mathematics (STEM) fields
SMART Scholars receive:
+ Full tuition and educational fees
+ Generous cash stipend
+ Employment with Department of Defense facilities after graduation
+ Summer internships, health insurance, book allowance
For more information and to apply, visit
For more information and to apply, visit HTTP://SMART.ASEE.ORG
In accordance with Federal statutes and regulations, no person on the grounds of race, color, age, sex, national origin or disability shall be excluded from participating in,
denied the benefits of, or be subject to discrimination under any program activity receiving financial assistance from the Department of Defense.
22. W W W. I N F O R M S . O R G22 | A N A LY T I C S - M AGA Z I N E . O R G
ANALYZE THIS!
• Ownership: Digital technology,
data and analytics have given some
companies the ability to turn individual
users’ data into saleable assets and
many others the capacity for improved
decision-making and increased
profitability. Intelligently utilizing
data is something that we typically
celebrate in our profession, but
Davis again challenges this view by
asking some very fundamental and
thought-provoking questions: “Does
our existence itself constitute a
creative act, over which we have
copyrights or other rights associated
with creation? If it does, then how
do those offline rights and privileges,
sanctified by everything from the
Constitution to local, state and federal
laws, apply to the online presence of
that same information?”
• Reputation: Davis hits the nail
on the head when he points out
that, thanks to the ability of data to
be combined and analyzed to drive
inferential and predictive judgments,
“the number of people who can form
an opinion about what kind of person
you are is exponentially larger and
farther removed…” And while these
online reputations are stubbornly
persistent, the accuracy of this
reputational assessment is too
often an afterthought.
CALL FOR ACTION
Unsatisfied with merely admiring the
problem, both Lanier and Davis also call
for action. Lanier proposes a technologi-
cal and marketplace solution to the oth-
erwise inevitable destiny that he believes
digital technology, user data, and busi-
ness analytics are rapidly leading us into,
problems that are so vividly illustrated
by the case of Amazon. He suggests an
elaborate (though high-level) framework
in which all personal data and creative
works are tagged so as to enable their
owner/creators to capture micropayments
whenever and however their data/works
are utilized. While his proposed remedy
is at this stage sketchy at best, from my
perspective he is to be commended for
engaging us all in a conversation about a
technology-enabled solution to a complex
set of problems that few others are even
willing to acknowledge.
Davis, like Lanier, is a technologist
rather than a Luddite (as he quite rightly
points out, “whereas big data is ethical-
ly neutral, the use of big data is not”). In
“Ethics of Big Data,” he strongly encour-
ages organizations that use data exten-
sively (as well as the policy-makers who
attempt to make judgments in support of
social good) to have meaningful discus-
sions about how and why we use data
and what the ethical implications are
23. J U LY / AU G U S T 2 014 | 23A NA L Y T I C S
of those actions. In his call for serious
ethical inquiry, Davis asserts that “Or-
ganizations realize that information has
value that can be extracted and turned
into new products…the ethical impact is
highly context-dependent. But to ignore
that there is an ethical impact is to court
an imbalance between the benefits of in-
novation and the detriment of risk.”
Especially, as Lanier would be quick
to add, “with technology itself enabling
the risk to be pushed off onto many, while
the benefits are captured by an ever
smaller few.”
As Packer reports, Amazon has giv-
en very little thought to the near-term
ethics or the long-term implications of
the way in which it has used its custom-
ers’ data to obtain its current level of
market power. But as Amazon’s current
battle [1] with publisher Hachette rages
on, with publishers, governments and
erstwhile business partners sure to fol-
low, it is clear that this particular story is
far from over.
As analytics professionals, neither is
ours. We have a significant stake in the
outcomes of these conversations about
ethics and the future. As such, we would
be wise to actively participate in those
conversations. At this particular moment,
we have considerable leverage to advo-
cate for a digital future that reflects our
own values.
The world of digital business – our
own personalized Siren Server – has
provided us with a massive, lucrative,
and free channel for our products and
services. Today’s digital enterprise de-
pends so much on our ever-expanding
ability to capture, transmit, store, inte-
grate and organize data, and our deep
capacity to use this data to summarize,
analyze, correlate, predict and optimize.
Through no fault of our own, we have
been bestowed with The Sexiest Job
of the 21st Century [2], and it is indeed
tempting to believe that we are an inte-
gral and indispensable part of the world
in which we live and work, and that we
always will be.
Turns out this is exactly what the pub-
lishers thought when Amazon first ap-
peared on the scene too. Beware: There
is no free lunch.
Vijay Mehrotra (vmehrotra@usfca.edu) is a
professor in the Department of Business Analytics
and Information Systems at the University of San
Francisco’s School of Management. He is also a
longtime member of INFORMS.
REFERENCES
1. For more on this, see http://www.nytimes.
com/2014/06/21/business/booksellers-score-
some-points-in-amazons-standoff-with-hachette.
html and http://www.latimes.com/books/
jacketcopy/la-et-jc-amazon-and-hachette-
explained-20140602-story.html#page=1.
2. http://hbr.org/2012/10/data-scientist-the-
sexiest-job-of-the-21st-century/ar/1
24. W W W. I N F O R M S . O R G24 | A N A LY T I C S - M AGA Z I N E . O R G
2014 is turning out to be an interesting year for
the healthcare industry. On the healthcare technology
front, this year has spurred 16 acquisitions since Jan.
1. State and federal government health insurance
exchanges finally started to operate at scale, offer-
ing affordable health insurance coverage to millions.
Twenty-six states and Washington, D.C., expanded
their Medicaid program as of May 2014, making a
large number of patients eligible for the safety net.
These are all good things that add to the success
of the Affordable Care Act (ACA), also known as
Obamacare.
At the same time we are just beginning to
see the impact of the new patient inflow on our
health system in the form of emergency room over-
crowding [1]. Opponents of the ACA argue that the
expansion of coverage without expanding the
primary care physician network across the nation
will lead to disaster. It remains to be seen which
way the pendulum will swing.
APPLE’S BIG SPLASH WITH HEALTHKIT
Meanwhile, Apple has released its HealthKit prod-
uct that connects multiple devices and apps. It has
shown promise to become the health data repository
BY RAJIB GHOSH
The two giants have
all the technology, talent
and financial firepower
needed to drive analytics
into the consumer health
space by enabling a
platform play for various
data generating devices
and apps.
HEALTHCARE ANALYTICS
Will Apple, Google
usher in new era in
healthcare analytics?
25. J U LY / AU G U S T 2 014 | 25A NA L Y T I C S
for consumers. In essence this was the promise
of the personal health record, or PHR, a promise
that rose to the peak of inflated expectation a few
years back and then fell to the trough of disillusion-
ment quite quickly [2]. But with Apple’s foray into
the space, this time it could be different.
The key promise, however, is the fusion of
data from multiple sources and use of analytics to
generate user-facing insights. The latter, howev-
er, is not there yet. In my last column I argued that
the true empowerment of the patient consumer
is waiting on the data fusion and analytics to
become mainstream. Consumers do not want
just a data repository like a PHR. They want
actionable information that PHR does not provide.
Apple’s announcement and subsequent ac-
tion may expedite the health data movement in
the right direction, but I am somewhat skeptical
regarding data liquidity in Apple’s “walled garden”
approach. Now that Apple has taken the lead
how far behind can Google be? Recently, Forbes
reported that Google is planning its own version
of a health platform. By the time this column goes
live we will know what Google is concealing up
its sleeves. These two giants have all the tech-
nology, talent and financial firepower needed to
drive analytics into the consumer health space
by enabling a platform play for various data
generating devices and apps.
Insights for the consumer, however, will come
at a price. As the insights with actionable consum-
er guidance increase, so too will the level of FDA
scrutiny, including requirement for mandatory FDA
approval. It is unclear how quickly Apple or Google
The key promise is
the fusion of data
from multiple sources
and use of analytics
to generate user-facing
insights. The latter,
however, is
not there yet.
26. W W W. I N F O R M S . O R G26 | A N A LY T I C S - M AGA Z I N E . O R G
HEALTHCARE ANALYTICS
will go for that since it is an unknown territory for
both companies. Having spent a decade in the
medical device industry I know first hand the pain
points of the manufacturers when their products
come under FDA’s purview.
APPLE-EPIC PARTNERSHIP
Apple is also partnering with Epic Systems,
the giant electronic medical record (EMR)
company that controls close to 20 percent of the
enterprise EMR market and covers 51 percent
of the patients in the United States. This is a
smart move by Apple. The ability to send user-
generated data to a healthcare professional’s
EMR system has always been a key requirement
for providers. This “end-to-end” data channel
establishes continuum of care, which acts as
the building block for analytics-driven population
health management (PHM) initiatives.
Since the introduction of the iPhone, Apple
products have enjoyed a widespread adoption
among healthcare professionals. A 2013 study by
the Black Book Rankings found that among physi-
cians who use medical apps on their smartphones,
68 percent used iPhones while 31 percent used
Android devices. Also, 59 percent of physicians ac-
cessed apps from their tablet, and most of those
users prefer iPad. Among U.S. consumers, Apple
has lost some ground recently to its key competitor,
Google Android, but still commands a large con-
sumer following.
When a system enjoys large market share
both among patients and providers and the sys-
tem connects with the largest EMR company in
When a system enjoys
large market share
both among patients and
providers and the system
connects with the largest
EMR company in
the country, we can expect
seamless bi-directional
data flow to reach
critical mass.
27. the country, we can expect seamless
bi-directional data flow to reach criti-
cal mass. This is a prerequisite to build
a cloud-based analytics solution that
can leverage data hubs at both ends of
the flow.
This is the reason why Apple’s Health-
Kit introduction is a key phenomenon,
albeit it does not do much in its early
incarnation. If Google wants to become
a serious player in the healthcare field
beyond fitness lovers, they have to think
in the same direction as well. Once that
happens imagine what sort of revolution
the rivalry of these technology compa-
nies can usher in!
The health data acquisition market is
still fragmented, and as a result EMR com-
panies have not shown much interest in
opening up their data repository to those
players. If Apple and Google can now turn
the table and make this a true platform
play using their controlling stakes in the
mobile device market, then it becomes
meaningful for the EMR companies to
forge powerful partnerships with one or
both of them. In turn that will create the
unification of episodic data and continu-
ous user-generated data – the Holy Grail!
Interoperability standards will be
firmed up and data security solutions will
emerge. Most importantly, patients and
providers will both benefit from the ana-
lytics solutions that will get a shot in the
arm from a data rich holistic picture of
the patient.
So far IBM is the lone warrior creat-
ing an ecosystem around its “Watson in
the cloud” analytics solution. It still lacks
the health data source. So what can
Apple, Google, IBM and Epic do together
to shake up healthcare? I’m getting goose
bumps just thinking about the possibilities.
Rajib Ghosh (rghosh@hotmail.com) is an
independent consultant and business advisor
with 20 years of technology experience in various
industry verticals where he had senior level
management roles in software engineering, program
management, product management and business
and strategy development. Ghosh spent a decade
in the U.S. healthcare industry as part of a global
ecosystem of medical device manufacturers, medical
software companies and telehealth and telemedicine
solution providers. He’s held senior positions at
Hill-Rom, Solta Medical and Bosch Healthcare. His
recent work interest includes public health and the
field of IT-enabled sustainable healthcare delivery
in the United States as well as emerging nations.
Follow Ghosh on twitter @ghosh_r.
REFERENCES
1. Laura Ungar, “More patients flocking to ERs
under Obamacare,” http://www.courier-journal.
com/story/news/2014/06/07/patients-flocking-
emergency-rooms-obamacare/10181349/
2. “Hype Cycle for Healthcare Provider
Applications, Analytics and Systems,” 2013,
Gartner http://www.healthcatalyst.com/health-
data-analytics-hype-cycle
J U LY / AU G U S T 2 014 | 27A NA L Y T I C S
Subscribe to Analytics
It’s fast, it’s easy and it’s FREE!
Just visit: http://analytics.informs.org/
28. W W W. I N F O R M S . O R G28 | A N A LY T I C S - M AGA Z I N E . O R G
The Institute for Operations Research and the
Management Sciences (INFORMS), the largest
professional society in the world for professionals
in the fields of analytics, operations research (O.R.)
and management science and the publishers of
Analytics magazine, announced that its Certified
Analytics Professional (CAP®
) exam will now be
given at hundreds of computer-based testing cen-
ters worldwide through an agreement with Kryterion,
the full-service provider of customizable assessment
and certification products and services.
Candidates for the CAP certification exam can
choose from Kryterion’s global network of online se-
cured testing locations to schedule their exam at a
convenient time and place. INFORMS’ online test-
ing center partner Kryterion, through strategic part-
nerships with colleges and universities, as well as
testing and training companies, provides over 700
testing locations in more than 100 countries. In the
United States alone, more than 400 testing centers
are available. CAP exams can now be scheduled al-
most any day of the week and at a time and location
that best suits the candidate.
Candidates for the CAP
certification exam can
choose from Kryterion’s
global network of online
secured testing locations
to schedule their exam at a
convenient time and place.
INFORMS INITIATIVES
CAP exam, continuing
education, analytics
conference cluster
29. J U LY / AU G U S T 2 014 | 29A NA L Y T I C S
Candidates can apply at www.in-
forms.org/applyforcertification. Upon ac-
ceptance into the program, candidates
receive an online voucher to present on
the Kryterion site.
Exam locations can be found at http://
www.kryteriononline.com/host_locations/.
Introduced in the spring of 2013, the
CAP program was created by subject
matter experts, many of whom are IN-
FORMS members. The CAP credential
is designed for general analytics pro-
fessionals in early- to mid-career and
is based on a rigorous job task analy-
sis and is vendor- and software-neutral.
Benefits of analytics certification include
gaining the ability to advance one’s ca-
reer by setting a professional with CAP
apart from the competition and obtain-
ing the structure to make continuing pro-
fessional development an integral part
of one’s job performance. The CAP pro-
gram assists hiring managers in finding
competent analytics talent and shows
that an organization hiring CAP profes-
sionals follows best analytics practice.
NEW INFORMS CONTINUING
EDUCATION COURSES
The INFORMS Continuing Education
program is offering two new courses this
fall: “Introduction to Monte Carlo and
Discrete-Event Simulation” and “Foun-
dations of Modern Predictive Analytics.”
The intensive, two-day, in-person
courses, like the program’s popular
current courses “Essential Practice
Skills for Analytics Professionals” and
“Data Exploration Visualization,” pro-
vide real take-away value to implement
immediately at work. Once you leave
the classroom, you will be able to ap-
ply the real skills, tools and methods
of analytics. The courses will give par-
ticipants hands-on practice in handling
real data types, real business problems
and practical methods for delivering
business-useful results.
In the course “Introduction to
Monte Carlo and Discrete-Event
Simulation,” taught by Barry Lawson,
University of Richmond and Lawrence
Leemis, College of William and
Ma ry, participants will learn the
basics of Monte Carlo and discrete-
event simulation and how to identify
real-world problem types appropriate
for simulation. They’ll also develop
skills and intuition for applying
Monte Carlo and discrete-event
simulation techniques.
Topic areas covered include Monte
Carlo modeling, sensitivity analysis,
input modeling and output analysis.
The course will be held at the
INFORMS office, Catonsville (Baltimore
area), Md., Sept 12-13, and Chicago,
Oct. 16-17.
30. W W W. I N F O R M S . O R G3 0 | A N A LY T I C S - M AGA Z I N E . O R G
INFORMS INITIATIVES
The second new course, “Foundations
of Modern Predictive Analytics,” will
be taught by James Drew, Worcester
Polytechnic Institute, Verizon (ret.).
Modern predictive analytics, the
science of discovering and exploiting
complex data relationships, has rapidly
changed in recent years, especially in
today’s businesses. This course will
give participants hands-on practice in
handling real data types, real business
problems and practical methods for de-
livering business-useful results.
Some of the topic areas to be covered
in this course are: linear regression, re-
gression trees, logistic regression and
CART (classification and regression
trees).
The course will be held in Washington,
D.C., Sept. 15-16, and San Francisco,
Nov. 7-8.
Learn more about these courses
including course outlines, instructor
biographies, program objectives and
how to register at: www.informs.org/
continuinged.
ANALYTICS CLUSTER SET FOR
INFORMS ANNUAL MEETING IN S.F.
The Analytics Section of INFORMS
will present the analytics cluster of ses-
sions and presentations at the INFORMS
Annual Meeting in San Francisco
Nov. 9-12. The cluster encompasses
20 sessions featuring the renowned
analytics practitioners and leaders. Nine
additional sessions will be jointly orga-
nized in collaboration with the Health
Applications Society (HAS),CPMS
(the Practice Section of INFORMS)
and the Section on O.R. in Sports
(SpORts).
The sessions/presentations within
the cluster cover such topics as:
• Successful application of analytics in
multiple industries such as healthcare,
transportation, defense and sports
• Analytics focus areas such as big data,
spreadsheets and predictive analytics
• Panel discussions on understand-
ing the connection between O.R. and
analytics, building analytics programs to
support organizations’ needs and busi-
ness analytics in healthcare industry
• Winners of the Innovative Applications
in Analytics Award and the SAS Student
Paper Competition
• Why’s, how’s and what’s of analytics
certification
More information about the confer-
ence can be found at http://meetings2.
informs.org/sanfrancisco2014/.
Help Promote Analytics Magazine
It’s fast and it’s easy! Visit:
http://analytics.informs.org/button.html
31. Solve key business problems utilizing big data. Earn an
AACSB-International accredited Master of Business
Administration with a specialization in Business Analytics
from the University of South Dakota.
Learn more: www.usd.edu/cde
The University of South Dakota’s
Beacom School of Business has been
continuously accredited by
AACSB-International since 1949.
Advance your career with an online Master of
Business Administration with a specialization
in Business Analytics.
DIVISION OF CONTINUING DISTANCE EDUCATION
414 East Clark Street | Vermillion, SD 57069
605-677-6240 | 800-233-7937
www.usd.edu/cde | cde@usd.edu
C
M
Y
CM
MY
CY
CMY
K
USD_Online MBA BA Analytics Magazine Ad.pdf 1 6/9/14 9:15 AM
32. W W W. I N F O R M S . O R G32 | A N A LY T I C S - M AGA Z I N E . O R G
Magic shows are fun because we get to experi-
ence the impossible. Still, we know there’s trickery
afoot. But what about those times when the magic
isn’t magic? When we witness something that’s seem-
ingly impossible but proves all too real? Not only real,
but the result of optimization?
Such is the case in the Formula 1 race car pit. If
you follow F1 racing, it comes as no surprise that pit
stops have been reduced to two seconds. But if you
aren’t an F1 devotee, the idea of lifting a car, chang-
ing four tires and sending it on its way in a mere two
seconds stretches the imagination.
The role of the pit has changed dramatically over
the years. For much of racing history it was assumed
cars would only stop in the event of problems. Sched-
uled tire changes or fuel stops weren’t part of the
BY E. ANDREW BOYD
The idea of lifting a
car, changing four tires
and sending it on its way
in a mere two seconds
stretches the imagination.
FORUM
Pit stop analytics
Quick stop:
Optimized F1
pit teams can
change four
tires in two
seconds.
33. J U LY / AU G U S T 2 014 | 33A NA L Y T I C S
equation. This orthodoxy was challenged
in 1982 when an analytically minded race
team from the United Kingdom focused in
on two important facts. First, softer tires
stuck to the track better during turns than
their harder cousins, though they wore
out more quickly. Second, less gas in the
tank translated into a lighter, and there-
fore faster, car. Calculations showed
that time spent changing tires and re-
filling the tank was more than offset by
the improved performance of the car on
the track. It’s a calculation any analytics
practitioner would be proud of.
The idea quickly caught on, making
pit stops – and their efficient execution –
an integral part of racing. Refueling was
banned in 1984 out of safety concerns,
but reinstated in 1994. During that 10-year
period pit crews refined their tire chang-
ing skills to the point where the fastest pit
stops took a little over four seconds. When
refueling was again instituted, the impetus
for faster tire changes disappeared since
refuelingwasthebottleneck.Thatchanged
in 2010 when F1 racing again reverted to
a no refueling policy, setting the stage for
lightening fast tire changes.
Achieving a two-second tire change
required optimizing the entire process.
Engineers took a look at everything from
the design of the wheel nuts (one per
wheel on F1 cars) to the special, self-
positioning pneumatic guns that remove
and tighten each nut. They then turned
their attention to the pit crews.
Teams of three work on each wheel:
one to remove the old tire, one to position
the new tire and one to operate the gun.
Their moves aren’t left to chance, but are
choreographed down to the position of
their hands and feet from start to finish.
It’s not hard to imagine John and Lillian
Gilbreth – progenitors of industrial engi-
neering and pioneers of time and motion
studies – standing nearby, stopwatches
in hand. They’d certainly be smiling in ap-
proval. With two jack operators and scat-
tered observers, as many as 20 people
crowd around a car during a pit stop – for
two seconds of work.
Optimization brings to mind models
and mathematical programs. But some-
times optimization is smart without being
sophisticated. And in the F1 pit, it works
like magic.
Andrew Boyd, INFORMS Fellow and INFORMS
VP of Marketing, Communications and Outreach,
served as executive and chief scientist at an
analytics firm for many years. He can be reached
at e.a.boyd@earthlink.net.
NOTES REFERENCES
1. Gray, W., “Tech Talk: Can F1 Pit Stops Get Even
Quicker?” Eurosport, April 9, 2013. See also: https://
uk.eurosport.yahoo.com/blogs/will-gray/gray-matter-
f1-stops-even-quicker-101951154.html. Accessed
May 24, 2014.
2. Examples of fast pit stops can be found at:
https://www.youtube.com/watch?v=aHSUp7msCIE
https://www.youtube.com/watch?v=Xvu0GlMa3xQ
34. W W W. I N F O R M S . O R G34 | A N A LY T I C S - M AGA Z I N E . O R G
CUSTOMER RELATIONSHIPS
Cloud-based analytical engine yields instant insight
using unstructured social media data.
nformation is generated in
today’s world more rapidly
than ever before, and it
will keep growing at an ex-
ponential rate. The rise of social media
combined with increased Internet pen-
etration has led to a significant increase
in user-generated content in the form
of product reviews and feedback, blogs,
independent news articles, Twitter
and Facebook updates. The crux of
leveraging such data lies in identifying
patterns from it and using the data to
generate actionable insights in real time.
This article proposes a cloud-based
analytical engine that analyzes com-
ments, reviews and opinions generated
by customers to understand the main
underlying themes and the general sen-
timent so that actionable insights can
be generated in real time. Algorithms
such as latent Dirichlet allocation for
topic modeling and the holistic lexicon-
based approach for sentiment mining
have been operationalized using a multi-
agent framework deployed in a cloud
Real-Time Text
Analytics
BY (l-r) AVEEK MUKHOPADHYAY
AND ROGER BARGA
I
35. J U LY / AU G U S T 2 014 | 35A NA L Y T I C S
depended on the time-intensive ETL pro-
cess (extract, transform, load). Depend-
ing upon the system and data complexity,
analytics could be delayed by hours, days
or even weeks while data management
put it all together.
In today’s business landscape, mini-
mizing the lag between acquiring data
and generating actionable insight has be-
come the key differentiator. Acting in real
time to respond to an event can result in
huge profits and improved customer rela-
tionships for a firm.
Real-time analytics can benefit in
multiple business scenarios, including:
• High-frequency trading (sophisticated
algorithms to rapidly trade securities)
• Real-time detection of fraudulent
transactions
• Real-time price adjustment based on
competitor information
• Real-time feedback from social
media for a product firm about its
new launch
• Real-time recommendations by retail
stores based on customer’s location
• Real-time traffic routing based on
information about vehicle frequency,
direction, etc.
Social media content comes from
users without any vested interest, thus
their opinions beget more trust. Orga-
nizations whose products and services
environment. This process meets com-
putational demands as it allows users
to run virtual machines within managed
data centers, freeing them from worry-
ing about acquisition of new hardware
and networks.
UNSTRUCTURED SOCIAL MEDIA DATA
According to a study by International
Data Corporation (IDC), mankind cre-
ated an estimated 150 exabytes (1 bil-
lion gigabytes) of data in 2005, a number
that jumped to 1,200 exabytes in 2010. A
more recent study by IDC and EMC put
the amount of data created in 2011 at 1.8
zettabytes (1 followed by 27 zeroes), a
number the study researchers expected
to double every two years.
Only 5 percent of this data is struc-
tured (comes in a standard format that
can be read by computers). The remain-
ing 95 percent is unstructured (photos,
phone calls and free-flow texts). A large
chunk of such unstructured data is in
text format. Posing challenges owing to
the sheer volume, depth and complex-
ity, such data, however, holds immense
potential for organizations. The key lies
in identifying patterns from the data and
gaining relevant insights.
REAL-TIME ANALYTICS
Not long ago, analyzing data and
generating business intelligence reports
36. W W W. I N F O R M S . O R G36 | A N A LY T I C S - M AGA Z I N E . O R G
REAL-TIME TEXT ANALYTICS
are mentioned in such media need to
remain current on relevant discussions
and be able to track the sentiment of ev-
ery employee, customer and investor. To
address this challenge, a cloud-based
real-time ecosystem was created for ana-
lyzing comments, reviews and opinions
mined from Twitter. In addition, tracking
trending themes in the customer space
and the evolution of these trends over
time was incorporated.
TEXT MINING ALGORITHMS
Topic modeling. Topic models are
statistical techniques that analyze words/
phrases in textual data to understand
the main themes running through them.
This model algorithm is based on LDA
(latent Dirichlet allocation) and uses the
observed words in tweets (extracted from
Twitter) to infer the hidden topic structure.
LDA is more easily understood by its
generative process. This generative pro-
cess defines a joint probability distribution
over the observed (the words) and hidden
(the topics) random variables. This joint
distribution is used to compute the condi-
tional distribution of the hidden variables
given the observed variables. This con-
ditional distribution is called the posterior
distribution.
A topic is assumed to be a collec-
tion of words with different probabilities
of occurrence. An individual tweet can
be assumed as generated from multiple
topics in different proportions. Now every
word generated in a tweet can be ran-
domly chosen in a two-step process:
• First, a topic is randomly selected
from the distribution of topics.
• Second, the chosen word is randomly
selected from the distribution of
words over that topic.
So, the joint probability distribution of word
W and topic T = Probability (W, T) =
Probability (T) * Probability (W | T).
Now when the individual probability of
occurrence of a word is known (because it
has already occurred in the tweet), the pos-
terior distribution is calculated as follows:
Probability (T | W) = Probability (W, T)
/ Probability (W)
Given the probabilities of observed
words, latent information like the vocabu-
lary distribution of a topic and the distri-
bution of topics over the tweet are thus
inferred.
Sentiment analysis. A holistic lexi-
con-based algorithm is used to analyze
individual feature-level sentiments as well
as cumulative sentiments over tweets.
Aggregating opinions for a feature:
The algorithm parses one tweet at a time
identifying the features present. A set of
opinion words for each feature is identi-
fied using a lexicon. An orientation score
38. W W W. I N F O R M S . O R G38 | A N A LY T I C S - M AGA Z I N E . O R G
REAL-TIME TEXT ANALYTICS
for each feature in the sentence is then
calculated by summing up the feature-
opinion scores for that sentence. (Each
feature-opinion score is obtained from
the sentiment polarity of the opinion
word and a multiplicative inverse of the
distance between the feature and opin-
ion word. Opinion words at a distance
from the feature are assumed to be less
associated to the feature compared to
the nearer words.)
For example, the phone is useful and
a great work of art.
Let the feature here be phone and
opinion words be “useful,” “great.”
Semantic orientation of useful = 1
Semantic orientation of great = 1
Distance between the words useful
and phone = 2
Distance between the words great
and phone = 5
score(f)=1/2+1/5= 0.7
Aggregating opinions for tweets: The
sentiment score for a tweet is the sum-
mation of the scores for all opinion words
present in the tweet.
For example, “The phone is useful
and a great work of art.”
The opinion words in the sentence are
“useful,” “great”
Semantic orientation of useful = 1
Semantic orientation of great = 1
score(t) = 1 +1= 2
Negation-rule: This identifies the ne-
gation word (which can be 1 or 2 places
before the opinion word) and reverses
the opinion expressed in a sentence.
For example, “The phone is not good.”
Here phone gets negative orientation.
Context-dependent rules: The features
for which we find no opinion words, context
dependent constructs are used to identify
the orientation score.
For example, “The phone is good but
battery-life is short.”
The only opinion word in the sentence
is “good” (“short” is a context-dependent
word).
Phone gets positive orientation be-
cause of “good.”
Battery-life gets negative orientation
because of the word “but” being present
between good and battery-life.
Topic Evolution. The next step to
topic modeling is to understand how top-
ics and trends develop, evolve and go viral
over time.
The algorithm maintains a fixed num-
ber of topic streams and their statistics.
Each tweet is processed as it comes in
and is assigned to the “closest” topic
stream (the topic stream most similar to
it). If no topic stream is close enough,
then a new stream is created and a stale
stream is killed to maintain a fixed number
40. W W W. I N F O R M S . O R G4 0 | A N A LY T I C S - M AGA Z I N E . O R G
REAL-TIME TEXT ANALYTICS
of topic streams. Streams are constantly
monitored for the rate of arrival of tweets.
Whenever there is a burst of tweets in a
particular topic stream, an alert for the
trending topic is generated.
THE REAL-TIME EDGE
A multi-agent distributed framework
enables the processing of real-time data
and facilitates decision-making by al-
lowing for easy deployment of analyti-
cal tasks in the form of process flows. In
this multi-agent paradigm, an agent is a
software program designed to carry out
one or more tasks and can communicate
with other agents in the system using
agent communication language. Thus, an
analytical task can be written as an agent,
and the analytical process flow can be es-
tablished by wiring together a set of com-
municating agents (an agency) that can
run in sequence or in parallel.
These agents were written using R to
offer the analyst the benefits of a powerful
and flexible statistical modeling language.
OPERATIONALIZATION IN THE
CLOUD
The entire real-time platform was then
deployed on a cloud ecosystem to allow
for the following processes:
Efficient resource management: The
cloud platform provides the necessary vir-
tual machine, network bandwidth and other
Figure 1: Real-time text mining agency.
41. J U LY / AU G U S T 2 014 | 41A NA L Y T I C S
infrastructure resources. Even when a
machine goes down because of an unex-
pected failure, a new virtual machine is al-
located for the application automatically.
Dynamic scaling and load balanc-
ing: The cloud solution allows scaling
out as well as scaling back an appli-
cation depending on resource require-
ments. Multiple services running in
tandem make the whole system com-
putationally resource intensive. As re-
source demands increase, new role
instances can be provisioned to handle
the load. When demand decreases,
these instances can be removed so that
payment for unnecessary computing
power is not required.
Availability durability: The cloud
storage services replicate data on three
different servers, guaranteeing it can be
accessed at all times, even if a server
shuts down unexpectedly.
Better mobility: The application can
be accessed from any place, as long as
there is an Internet connection. There is
no tight coupling with any physical server
or machine.
RESULTS
Figure 2 shows a snapshot of the topic
treemap generated in one run of the topic
modeling algorithm (different topics are
represented by different colors, with the
areasrepresentingoccurrencefrequency).
Figure 2: Topic modeling treemap.
42. W W W. I N F O R M S . O R G42 | A N A LY T I C S - M AGA Z I N E . O R G
REAL-TIME TEXT ANALYTICS
Incoming tweets over a time period
were captured in a stream graph visual-
ization as shown in the Figure 3 screen-
shot. Each topic is represented by a
stream in the visualization and is charac-
terized by the top words in that topic. At
any point of time, the top words in each
topic are displayed in a topic treemap
below the stream graph. It is possible to
get the keyword “treemap” at any past
time in history.
Successive runs of the sentiment
analysis algorithm for batches of tweets
are represented by the visual in Figure 4.
Each bar captures the sentiment
for that feature in a particular batch
of tweets. The height of the bar rep-
resents the number of opinion words
for the feature in that batch. The col-
or of each bar represents the overall
sentiment level expressed in a batch of
data, ranging from extremely negative
(dark red) to extremely positive (dark
green). The change in color of the bars
across various batches can be used
to identify stimuli that are driving the
change.
Selection of a particular bar provides
a deeper analysis of that batch. The size
of a bubble indicates the number of ref-
erences of a particular opinion word, and
the color shows the overall sentiment
score for the particular opinion word.
Both the size and color are indicators of
which opinion words drive the sentiment
for a feature in a batch.
Figure 3: Trends stream graph.
43. CLOSING THOUGHTS
Trending topics represent the popular
“topics of conversation,” and when de-
tected in real time, these hot topics are
the social pulses that are usually ahead
of any standard news media. Data ana-
lyzed via managed data centers can pro-
vide key insights into the evolving nature
and patterns of social information and
opinion and the general sentiment pre-
vailing over such subjects.
Aveek Mukhopadhyay is an associate manager
at Mu Sigma where he works with the Innovation
Development Team with a core focus on driving
the adoption of advanced analytical platforms
and techniques both internally and externally. He
has interests in the fields of text mining, machine
learning and analytics automation.
Roger Barga, Ph.D., is group program manager
for the CloudML team at Microsoft Corporation
where his team is building machine learning as
a service in the cloud. Barga is also a lecturer
in the Data Science program at the University
of Washington. He joined Microsoft in 1997 as a
researcher in the Database Group of Microsoft
Research (MSR), where he was involved in a
number of systems research projects and product
incubation efforts, before joining the Cloud and
Enterprise Division of Microsoft in 2011.
Figure 4: Sentiment analysis.
NOTES REFERENCES
1. The Economist (Feb. 25, 2010), “The Data Deluge”
(http://www.economist.com/node/15579717).
2. David M. Blei, “Probabilistic Topic Models,”
Communications of the ACM, April 2012, Vol. 55, No.
4 (http://www.cs.princeton.edu/~blei/papers/Blei2012.
pdf).
3. Xiaowen Ding, Bing Liu and Philip S. Yu,
“A Holistic Lexicon-Based Approach to Opinion
Mining” (http://www.cs.uic.edu/~liub/FBS/opinion-
mining-final-WSDM.pdf).
Help Promote Analytics Magazine
It’s fast and it’s easy! Visit:
http://analytics.informs.org/button.html
J U LY / AU G U S T 2 014 | 43A NA L Y T I C S
44. W W W. I N F O R M S . O R G44 | A N A LY T I C S - M AGA Z I N E . O R G
Key considerations for deep analytics on big data,
learning and insights.
hat is big data? Big data,
which means many things
to many people, is not a
new technological fad. In
addition to providing innovative solu-
tions and operational insights to endur-
ing challenges and opportunities, big
data with deep analytics instigate new
ways to transform processes, organi-
zations, entire industries and even so-
ciety. Pushing the boundaries of deep
data analytics uncovers new insights
and opportunities, and “big” depends on
where you start and how you proceed.
Big data is not just “big.” The expo-
nentially growing volume of data is only
one of many characteristics that are of-
ten associated with big data, such as
variety, velocity, veracity and others (the
six Vs; see box).
According to Gartner Research,
the worldwide market for analytics
will remain the top focus for CIOs
through 2017 [1]. According to Gartner,
Why do so many
analytics projects fail?
BY (l-r) HALUK DEMIRKAN AND BULENT DAL
W
THE DATA ECONOMY
45. J U LY / AU G U S T 2 014 | 45A NA L Y T I C S
more than half of all analytics projects
fail because they aren’t completed
within budget or on schedule, or be-
cause they fail to deliver the features
and benefits that are optimistically
agreed on at their outset.
Today, an abundance of knowledge
and experience exists to have success-
ful data and analytics-enabled decision
support systems. So why do so many
of these projects fail, and why are so
many executives and users still so un-
happy? While there are many reasons
for the high failure rate, the biggest rea-
son is that companies still treat these
projects as just another IT project. Big
data analytics is neither a product nor a
computer system. Instead, it should be
considered a constantly evolving strat-
egy, vision and architecture that contin-
uously seeks to align an organization’s
operations and direction with its strate-
gic business goals and tactical and op-
erational decisions. Table 1 includes a
list of common mistakes that can doom
analytics projects.
n Volume (data at rest):
terabytes to exabytes, petabytes
to zettabytes of lots of data
n Velocity (data in motion):
streaming data, milliseconds to
seconds, how fast data is being
produced and how fast the data
must be processed to meet the
need or demand
n Variety (data in many forms):
structured, unstructured, text,
multimedia, video, audio, sensor data,
meter data, html, text, e-mails, etc.
n Veracity (data in doubt):
uncertainty due to data
inconsistency and incomplete-
ness, ambiguities, latency, de-
ception, model approximations,
accuracy, quality, truthfulness or
trustworthiness
n Variability (data in change):
the differing ways in which the data
may be interpreted; different ques-
tions require different interpretations
n Value (data for co-creation and
deep learning): The relative impor-
tance of different complex data from
distributed locations. Big data with
deep analytics means greater insight
and better decisions, something that
every organization needs.
The six Vs of big data
46. W W W. I N F O R M S . O R G46 | A N A LY T I C S - M AGA Z I N E . O R G
WHY PROJECTS FAIL
KEY CONSIDERATIONS FOR DEEP
ANALYTICS
We live in an era of big data. Whether
you work in financial services, consumer
goods, travel, transportation, health-
care, education, supply chain, logistics
or industrial products and professional
services, analytics are becoming a com-
petitive necessity for your organization.
But having big data – and even people
who can manipulate it successfully – is
not enough. Companies need managers
who can partner effectively with analysts
to ensure that their work yields better
strategic and tactical decisions.
Big data with deep analytics is a jour-
ney that helps organizations solve key
business issues and opportunities by
converting data into insights to influence
business actions and drive critical busi-
ness outcomes. As organizations try to
take advantage of the big data opportuni-
ty, they need not be overwhelmed by the
various challenges that might await them.
Managers will need to start their
journey by [2]:
Identifying clear business need and
value. Almost everything needs to be a
business rather than a technology solu-
tion. Before companies start collecting big
Going Deep Wide on big
data with deep analytics for
deep learning
47. J U LY / AU G U S T 2 014 | 47A NA L Y T I C S
Table 1: Common mistakes for analytics projects.
Failing to build the need for big data within the organization
Islands of analytics with “Excel culture”
Data quality and reliability related issues
Not enough investigation on vendor products and rather than blindly taking the path of least
resistance
Departmental thinking rather than looking at the big picture
Considering this as a one-time implementation rather than a living eco-system
Developing silo dashboards to answer a few questions rather than strategic, tactical and opera-
tional dashboards
Not establishing company ontology and definitions for “single version of truth” culture
Lack of vision and not having a strategy; not having a clear organizational communications plan
Lack of upfront planning; overlooking the development of governance and program oversight
Failure to re-organize for big data
Not establishing a formal training program
Ignoring the need to sell success and market the big data program
Not having the adequate architecture for data integration
Forgetting rapidly increasing complexities with …volume, velocity, variety, veracity, and many more
48. W W W. I N F O R M S . O R G48 | A N A LY T I C S - M AGA Z I N E . O R G
WHY PROJECTS FAIL
data, they should have a clear idea of what
they want to do with it with from a business
sense. Here’s what you need to consider:
Turn over part or all of big data
solution delivery to business leaders.
Project management and ownership
from business (not IT) in big data solu-
tions is the key for success. In the mean-
time, make sure to have clear alignment
between business and IT.
Partner with business peers to
identify opportunities and solutions.
If we talk about big data, the impact of
these projects should also be “big.” Cre-
ate a cross-organization team and in-
volve all stakeholders early in the game.
Value co-creation of value with
customers. Overall business objective
should always be about customers. If
one of the initiatives is about big market-
ing outcome, than it should be about how
to set up customer-centric marketing,
how to provide targeted dynamic adver-
tisement, how to engage customers and
how to manage personalized shopping.
Start small – with an eye to scale
quickly. While big data solutions may
be quite advanced, everything else sur-
rounding it – best practices, methodolo-
gies, org structures, etc. – is nascent.
No one has all the answers, at least
not yet. Understand why traditional
business intelligence and data ware-
housing projects can’t solve a problem.
Small, simple and scalable. When
launching big data initiatives, avoid 1) get-
ting too complicated too fast, and 2) not
being prepared to scale once a solution
catches on. Big data solutions can quickly
grow out of control since discovering val-
ue from data prompts wanting more data.
Identify what part of the business
would benefit from quick wins. Look
for opportunities that will show quick
wins within no more than three months.
Success brings more people to the table.
This is not a one-time implementa-
tion. Understand that this is a living and
evolving organism that will grow expo-
nentially very fast. It is a culture change
in the company with the way that you
collect and use data, and the way you
make outcome-based decisions.
Develop a minimal set of big data
governance directives upfront. Big
data governance is a chicken-and-egg
problem – you can’t govern or secure
what you haven’t explored. However,
exploring vast data sets without gover-
nance and security introduces risk.
New processes to manage open
source risks. Most big data solutions
are being built on open source software,
but open source has both legal and skill
implications as firms are: 1) exposed to
risk due to intellectual property issues
and complex licensing agreements; 2)
concerned about liability if systems built
49. J U LY / AU G U S T 2 014 | 4 9A NA L Y T I C S
on open source fail; and 3) required to
use technology that is often early re-
lease and not enterprise-class.
New agile processes for solution
delivery. Successful firms will embrace
agile practices that allow end users of
big data solutions to provide highly in-
teractive inputs throughout the imple-
mentation process.
Integrate structured and unstruc-
tured data from multiple sources. Inte-
gration of data is one of the most important
and also complex processes to serve ef-
ficient and effective decision-making. In
terms of data, it includes machine data,
sensor data, videos, audio, documents,
enterprise content in call centers, e-mail
messages, wikis and, indeed, larger vol-
umes of transactional and application data.
Data sharing is key. In order for a
company to build a big data ecosystem
that drives business action, organiza-
tions have to share data.
Build a strong data infrastructure
to host and manage data. Make sure
to have secured and reliable in-house
and/or hosted data (e.g., cloud) and in-
formation management infrastructure.
USINESS ANALYTICS
PERATIONS RESEARCH
INFORMS CONFERENCE ON
Save the Date!Catch the Analytics Wave in Huntington Beach, CA
APRIL 12-14, 2015
50. W W W. I N F O R M S . O R G5 0 | A N A LY T I C S - M AGA Z I N E . O R G
WHY PROJECTS FAIL
Think about what information do I
collect today … and what analytics should
I perform that can benefit me and others.
New security and compliance
procedures to protect extreme-scale
data. In order to succeed with big data,
new processes must be developed that
recognize and protect the special nature
of extreme-scale data that may be large-
ly unexplored.
Be ready to support rapid growth.
Big data solutions can grow fast and ex-
ponentially. They can start as a pilot with
a few terabytes of data, then becomes
a petabyte very quickly. Since the same
data can be used different ways and re-
analyzed for new insights easily, nothing
ever gets deleted.
Funding must move out of IT for
big data success. Funding for these
projects should come from outside of the
CIO organization and move to a market-
ing or sales organization, for instance,
so that the business has a vested stake
in the game.
Create a road map that gradually
builds the skills of your organization.
It’s important to create a road map that
allows you to gradually build the required
skills within your staff, minimize risk and
capitalize on previous successes to gain
more support. In the organization, there
will be new roles and responsibilities such
as the data scientist, who possesses a
blend of skills that includes statistics, ap-
plied mathematics and computer science.
This is different than any current
decision support solution. With big
data, organizations should look for new
capabilities, such as: using advanced
analytics to uncover patterns previously
hidden; visualization and exploration to
help the business find more complete
answers, with new types and greater
volumes of data to best represent the
data to the user and highlight important
patterns to the human eye; enable oper-
ational decision-making with on-demand
stream data by making floor employees
into analytic consumers; and turn insight
into action to drive a decision – either
with a manual step or an automated pro-
cess. And most important be ready for
rapidly increasing benefits and complex-
ities from the six Vs.
WHAT IS NEXT IN THE DATA
ECONOMY?
Organizations have access to a
wealth of information, but they can’t get
value out of it because it is sitting in its
most raw form or in a semi-structured
or unstructured format [3]. As a result,
they don’t even know whether it’s worth
keeping.
So where is deep analytics for
deep learning headed in the next few
years? The exciting news is that many
52. W W W. I N F O R M S . O R G52 | A N A LY T I C S - M AGA Z I N E . O R G
WHY PROJECTS FAIL
organizations are already realizing the
value of big data analytics today. Insight-
driven, information-centric initiatives will
be deployed where the ability to capital-
ize on the six Vs of information will cre-
ate new opportunities for organizations
to exploit. By combining and integrating
deep analytics, local rules, scoring, opti-
mization techniques and machine learn-
ing with cognitive science into business
processes and systems, decision man-
agement helps deliver decisions that are
consistently optimized and aligned with
the organization’s desired outcomes.
Social analytics will ensure busi-
nesses know how, when and where to
creatively engage with individual con-
sumers and social communities to fos-
ter trusted, one-to-one relationships and
better understand and manage the way
their companies are perceived. Integrat-
ing demographic and transactional data
with what can be learned about attitudes
and opinions allows organizations to
truly understand the motivations and in-
tents of its constituents to better serve
them at the right time and place.
Deep analytics will help organiza-
tions uncover previously hidden patterns,
identify classifications, associations and
segmentations, and make highly accu-
rate predictions from structured and un-
structured information. Organizations will
use real-time analysis of current activity
to anticipate what will happen and iden-
tify drivers of various business outcomes
so they can address the issues and chal-
lenges before they occur. Many decisions
will be done automatically by computers
that also have deep-learning capabilities.
When you are in a process of starting
a big data journey, consider this ques-
tion: What should our big data with deep
analytics roadmap look like to achieve
our objectives?
Haluk Demirkan (haluk@uw.edu) is a professor
of Service Innovation and Business Analytics, and
the founder and executive director of Center for
Information Based Management at the Milgard
School of Business, University of Washington-
Tacoma. He has a Ph.D. in information systems
and operations management from the University of
Florida. He is a longtime member of INFORMS.
Bulent Dal (bulent.dal@obase.com) is a co-founder
and general manager of Obase Analytical Solutions
(http://www.obase.com/index.php/en/obase),
Istanbul, Turkey. His expertise is in scientific retail
analytical solutions. He has a Ph.D. in computer
sciences engineering from Istanbul University.
Acknowledgement
Part of this article is excerpted with permission
of the publisher, HBR Turkey, from Demirkan,
H. and Dal, B., “Big Data, Big Opportunities, Big
Decisions,” Harvard Business Review Turkish
Edition (published in Turkish), March 2014.
REFERENCES
1. Gartner, Inc., 2013, “Gartner Predicts Business
Intelligence and Analytics Will Remain Top Focus
for CIOs Through 2017,” Dec. 16, 2013, http://www.
gartner.com/newsroom/id/2637615.
2. Demirkan, H. and Dal, B., “Big Data, Big
Opportunities, Big Decisions,” Harvard Business
Review Turkish Edition (published in Turkish),
March 2014, pp. 28-30.
3. Davenport, T., 2013, “Analytics, 3.0,” Harvard
Business Review, December.
53. The Institute of Business Analytics Symposium
is a two-day event where presenters from major
companies across the U.S. share their experiences
in business analytics. We will explore a diverse
landscape from statistics, data-mining, and
forecasting to predictive modeling and operations
research.
It’s also a great networking opportunity for
businesses, students and academia.
Keynote Speakers:
- Wayne Winston - Hear from this renowned analytics
expert. Major league sports teams and Fortune 500
companies have requested his business analytics
services.
- Paul Adams, VP of Ticket Sales is beginning his
26th season with the Atlanta Braves.
For a complete list of presenters and to register
visit http://mycba.ua.edu/basymposium. Early
registration is available at a discounted rate
through August 15. Businesses registering four
or more individuals can receive a reduced rate.
The INFORMS Certified Analytics Professional (CAP®)
exam will be administered on September 24 as a
pre-symposium event and requires separate payment.
“Obviously he (Wayne Winston)
helped start the basketball
analytics revolution with us,”
said Dallas Mavericks
owner Mark Cuban.
Wayne Winston
Paul Adams
7th ANNUALBUSINESS
ANALYTICSSYMPOSIUMHotel Capstone, The University of Alabama, Tuscaloosa, Alabama
September 25-26, 2014
54. W W W. I N F O R M S . O R G54 | A N A LY T I C S - M AGA Z I N E . O R G
DATA SCIENTISTS IN DEMAND
According to executive search firm head Linda Burtch,
the job prospects for data scientists and other elite
analytics professionals have never been better – and
the future is even brighter.
n April, the executive search
firm Burtch Works released
the results of its first-of-its-
kind salary and demograph-
ics survey of data scientists, a follow-up
survey of big data professionals con-
ducted a year earlier. Among other find-
ings, the 2014 survey quantified that data
scientists are well paid, relatively young,
overwhelmingly male and that almost half
(43 percent) are employed on the West
Coast.
Linda Burtch, managing partner of
Burtch Works, has been involved in the
recruitment and placement of high-end
analytics talent for 30 years. She start-
ed her career with Smith-Hanley before
founding her own company five years
ago. Analytics magazine editor Peter
Horner interviewed Burtch in April, not
long after the survey of data scientists
was released. Following are excerpts
from the interview.
What did you find that surprised
you the most from the salary and de-
mographics survey of data scientists?
First of all, I find it funny that every-
one is interested in salaries and what
data scientists and big data profession-
als make, but it’s such a taboo subject to
actually talk about. Not to me. I talk about
salaries all the time. That’s my business.
What surprised me? That’s an inter-
esting question. It actually turned out
the way I thought it would – a lot of the
‘It’s their time
to shine’
BY PETER HORNER
I
55. J U LY / AU G U S T 2 014 | 55A NA L Y T I C S
data scientists. Data
storage has become so
much cheaper, comput-
ing power has become
much faster, nanotech-
nology and sensors are
now becoming ubiqui-
tous. Self-driving cars,
traffic sensors, the en-
ergy grid. The list goes
on and on and on.
Right now the ob-
vious stuff is happen-
ing with understanding
digital streams of data
in applications related
to social media. That’s pretty straight-
forward stuff, but wait until it hits the
healthcare industry, for example. Self-
driving cars are going to be a huge,
huge deal. While a lot of it is being done
out in California now, over the next five
years we are going to see it scattered
all over the United States.
When it comes to recruiting can-
didates and job placement, who are
you talking to?
I recruit in analytics – people who
have master’s degrees in statistics, op-
erations research, econometrics, people
who are out there working in business
applications, solving problems related
to marketing spend or credit worthiness
candidates living out on
the West Coast and a
higher predominance
of Ph.D.s among data
scientists than the gen-
eral analytics population
or the big data profes-
sionals, as I call them.
It all pretty much made
sense to me. It was in-
teresting because it was
actually quantified.
Weren’t you a little
surprised by the extent
of the concentration of
data scientists – nearly 50 percent –
on the West Coast?
That’s for the moment, for now, but
watch and see what happens. Analyt-
ics has been around for a long time, yet
some people still ask me, “Are you sure
this isn’t a fad?” It’s not.
Analytics has become a hugely profit-
able specialty area within organizations
as they try to optimize their operations,
or target their marketing or look at re-
turn on investment issues, and that has
been around for years and years.
I would argue that those issues are
sort of the humdrum stuff of analytics.
Data-driven decision-making is really
going to explode, and that’s what we are
seeing with this whole area going toward
Linda Burtch, founder and managing
partner of Burtch Works.
56. W W W. I N F O R M S . O R G5 6 | A N A LY T I C S - M AGA Z I N E . O R G
QA WITH LINDA BURTCH
or target marketing. More recently I’ve
gotten into data science. That’s a huge
umbrella description.
You mentioned operations research,
the heart and soul of INFORMS.
It is. When I started out in recruit-
ing more than 30 years ago, I focused
on operations research candidates. It’s
grown pretty dramatically since then.
They have a very fond place in my heart
because that’s how I got started. It’s one
of those things that I’ve really been in-
volved with – the INFORMS group back
in New York when I was living there,
and I’m really excited now because the
INFORMS group in Chicago is getting
re-energized. It’s really exciting to watch.
When looking at the job market-
place, do you distinguish between,
say, a data scientist and other analyt-
ics professionals?
Let me back up a little bit. Last sum-
mer, when I was putting together the big
data salary study, I saw that data scien-
tists were a breed apart, and that they
had higher compensation levels. So I
made the decision to take them out of the
general big data study and hold them for
later because it’s such an emerging field
that’s so different. They are working with
what I would call unstructured data. You
could get into a lot more detail over how a
data scientist is different from a big data
professional, but the primary distinguish-
ing feature, in my opinion, is that data
scientists are working with data that’s un-
structured. It’s something that’s going to
grow as sensors become more and more
prevalent and data streams become con-
tinuous in so many applications areas.
How would you describe the current
job market for quants, for lack of a
better word?
It’s hot. A couple of months ago we did
a flash survey in which we simply asked
how often are you are contacted about a
new job opportunity through LinkedIn. We
had 400 responses; 89 percent of the re-
spondents said they were contacted at
least monthly, and 25 percent said that they
were contacted at least weekly. I’m working
with elite data scientists, and they’re telling
me that they get calls once or twice a day
from recruiters, so it’s just crazy.
Our candidates are seeing a 14 per-
cent increase in salary when they change
jobs, so there’s a lot of churn out there.
If they stay with their existing company,
they might see an annual increase of be-
tween 2 percent and 3 percent, so the
14 percent is a nice bounce if they de-
cide to make a change. One of my data
scientists in Boston said he received 30
calls in one week after he left a job and
went on the job hunt.