La base para optimizar y potenciar la toma de decisiones en cualqueir empresa es la información. Pero no la información en bruto, sino aquella de la que podemos obtener valor tras su análisis.
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
How Retailers Can Gain Customer Insights from Big Data Analytics
1. 1
It used to be that retailers could consider them-
selves customer friendly if they put charcoal on the
shelf next to beer and ketchup in the summer. Nat-
urally enough, customers who bought one of these
items would tend to buy the others as well.Today,
retailers have to do a bit more than that in the bat-
tle to find favor with customers. In e-commerce it
is now common practice for retailers to know what
their customers like, their order history, discount
preferences and, wherever possible, where they live
as well. But only if they know how to interpret this
data or interlink it intelligently are they able to
present customized offerings to their customers
and target them wherever they currently are –
online, at home on the sofa or on the roadr.
To do this, they need the right analytics tools – as
well as the right data for the purpose. In an era of
global competition and volatile markets, this data
is essential to decision-makers seeking to optimize
their business. It gives retailers, for instance, dedi-
cated information about the purchasing behavior
of their customers. For companies, data is the key
to understanding their markets better, uncovering
hidden trends and identifying new business oppor-
tunities in good time. Decisions can thus be made
more quickly and with greater precision – with
the aim of gaining a greater understanding of
customers and being better able to meet their
requirements.
Data analytics puts marketing departments, for
example, in the position of being able to create
fine-grained demographic or customer segments
and customize products and services to suit their
requirements. Detailed segmentation of target
groups makes it easier to address them, reduces
waste and thus cuts the cost of marketing cam-
paigns. A telecommunications provider, for
example, can use data analytics to find out why
customers are leaving and counter it with targeted
measures.
The role of data
Many decision-makers and executives now recog-
nize the strategic value of data, exploit relevant
sources of data that give them information about
their products and customers and use business
intelligence tools to analyze purchasing frequency
for different products or changes in stock levels,
for example. According to a study by software
vendor Artegic, 75 percent of companies believe
that they can be significantly more successful if
they make use of personal data obtained from
online marketing.
Business intelligence tools allow them to adapt and
control their business and adopt a well-targeted
approach. A company’s management benefits
significantly from the information obtained and
BETTER DECISIONS THROUGH
BIG DATA
EXECUTIVE BRIEFING
To enable the correct business decisions to be made quickly, large
quantities of structured and unstructured data now have to be analyzed.
Analytics using big data technologies helps us to find the right answers.
2. 2
can use it as a strategic compass to identify
changes in the market and customer behavior
in good time in order to be proactive.
Data becomes big data
But however many dashboards, graphics and tables
executives have, it doesn’t mean they can just sit
back and relax. In recent years, the world of busi-
ness intelligence has been really shaken up – trig-
gered by the sheer quantity of data. Not long ago,
the amount of information available on which to
base business decision-making was relatively easy
to grasp, but in the last few years it has simply bal-
looned. Everything is essentially now digitized, and
new types of transaction data and real-time data
are emerging. Machines and computers are also
producing enormous quantities of data, and this
can be stored and analyzed on hardware that is
becoming increasingly reasonably priced and
dynamic. A modern aircraft, for example, generates
up to 10 terabytes of data for every 30 minutes of
flying time.With 25,000 flights a day, petabytes
of data are generated.
The transition toward digital business models and
new applications is also contributing to data
growth.Technologies such as cloud computing,
RFID, transactional systems, data warehouses,
document management systems and enterprise
content management systems are important
developments in the context of big data. Many of
these systems are continuously generating new
data streams. However, the critical factors in this
explosion of data are the Internet, the increasing
number of mobile devices and, above all, social
media such as Facebook,Twitter andYouTube.
Facebook alone, for example, generates 2.7 billion
“likes” and 300 million photos a day and scans
105 TB of data every half hour.
In addition, not only are the sheer volumes of data
generated these days huge; the data is also signi
ficantly less structured than the typical kinds of
business data generated in ERP systems. Social
media information such as text, photographs, audio
files or videos can no longer be allocated tidily to
rows and columns, as required by the relational
database model: this data is unstructured. Accord-
ing to an IDC study of data storage in Germany in
2013, 90 percent of data is now unstructured and
has to be captured and analyzed using quite new
techniques. (Source: IDC Storage*)
What that means is that companies now have to
deal with large volumes of unwieldy structured,
semi-structured and unstructured data from many
different sources.
These days, companies can no longer ignore
unstructured data from social networks, in par
ticular. A great deal can be learned from emails,
feedback forms, comments and ratings in social
networks and discussions in forums.The huge
volume of tweets generated every day – currently
amounting to around 12 terabytes of data –
provides a solid basis for trend research or product
development.
Typical types of data today
Structured data Data that is suitable for the tables and structures of relational databases
Semi-structured data Data that is often generated as a result of data interchange between
companies and is therefore often based on XML
Unstructured data Data from text files, speech-to-text applications, PDFs, scanned mail,
presentations, photographs, videos, audio files
3. 3
Which industries benefit from big data?
Depending on the technology at their disposal, com-
panies can get relatively easy access to large volumes
of useful market and customer data – and they want
to extract as much value as they can from this data.
According to an international IDC study commis-
sioned byT-Systems, every second company has al-
ready implemented big data projects or has concrete
plans to do so. In anSAS survey, three out of every
four companies that had launched big data projects
described business analytics as an effective aid to
decision-making. (Source:SAS Decision Making*)
According to the study, they benefit most from
increased profitability, reduced costs, more tar
geted risk management, process optimization,
more rapid decision-making and performance
improvements.
The outlay associated with big data pays off in
terms of hard cash, according to McKinsey. If big
data is analyzed correctly and in good time, retail-
ers, for example, can improve their margins by up
to 60 percent, and European public authorities can
save 250 million euros a year through more effi-
cient processes, according to the consulting firm.
If companies knew more about the locations
of their customers, they would be able to sell
additional products worth 600 million dollars.
(Source: McKinsey Big Data*)
Whereas up until recently only banks, financial ser-
vices companies and selected large corporations –
typical users of data warehousing and business
intelligence – had given any thought to automated
decision-making processes, now, according to the
Experton Group, retailers, utility companies and
companies in the life sciences, healthcare industry
and many other markets are increasingly also
recognizing that data is an important business
asset. (Source: Experton Big Data*)
In terms of departments within companies, the
benefits are felt, above all, in research and develop-
ment, sales and marketing, production, distribution
and logistics and finance and risk management. In
these five areas the business benefits of big data
are particularly marked.
Analyzing big data
Despite the undisputed benefits, converting the data
collected into useful information is still a challenge for
many companies.According to market research com-
panyGartner, over 85 percent of Fortune 500 compa-
nies will not be in a position to use big data effectively
in order to secure a competitive advantage by 2015.
“In terms of technology and administration, most
companies are poorly prepared for the challenges
associated with big data,” say theGartner analysts.
“Consequently, only a few of them will be able to
exploit this trend effectively and secure themselves
a competitive advantage.” (Source:Gartner PI*)
Three factors – the sheer volume of data involved,
the heterogeneity of the data and the processing
speed required – present a major challenge com-
pared with conventional data processing and analy-
sis. Given their origins and architecture, relational
databases can only be used efficiently for applica-
tions involving frequent transactions at the level of
data records or for scenarios with low to moderate
volumes of data.They are not designed for the pro-
cessing and analysis of data quantities measured in
petabytes or exabytes. Above all, it is not possible,
or at least very difficult, to store unstructured data
in table-based relational database systems.
Given the increasing volumes of data available for
analysis, companies need new approaches and
technologies, according to Gartner in its study Big
Data Opportunities, New Answers and New Ques-
tions. (Source: Gartner Big Data*) Not only do new
“big data systems” have to cope with these huge
quantities of data, they also have to analyze un-
structured data reliably – and as quickly as possible.
These real-time analyses require systems with ex-
tremely fast database access and efficient parallel-
ization so that tasks can be distributed across large
numbers of computers – an approach known in the
past as grid computing.
Google has been the pioneer of big data tools for the
analysis of unstructured data.With its MapReduce
programming module, the company subdivides the
processing of huge volumes of data in such a way
that the infrastructure can be adapted with flexibi
lity, depending on the volumes of data involved.This
resulted in the popular open-source project Hadoop,
which is now the standard for big data technology
together with in-memory and NoSQL databases for
unstructured data. In the context of business appli
cations, SAP set things in motion with its SAP HANA
database (High-PerformanceAnalyticAppliance)
based on in-memory technology.
Big data analytics relies on models and algorithms
designed to search through mountains of data in
4. 4
order to find connections and identify patterns
and similarities. Not only do these predictive or
business analytics solutions help to quickly give
an accurate picture of the current situation, they
also permit predictions and forecasts about
future developments. This is done on the basis
Source:
How Organisations are
approaching Big Data,
IDG, September 2013
(200 decision-makers
from companies with
over 100 employees in
the USA, Brazil, the
Netherlands, Austria,
South Africa and
Switzerland)
Source:
How Organisations are
approaching Big Data,
IDG, September 2013
Business goals related to decision-making capabilities and agility/speed are significantly
connected to a majority of respondents’ big data strategies and initiatives.
Increasing speed of decision-making
Increasing business agility
Improving the quality of decision-making
Improving the speed of response to IT security issues
Improving planning and forecasting capabilities
Meeting regulatory/compliance requirements
New customer acquisition/retention
Using immediate market feedback to improve customer satisfaction
Building new business partnerships
Improving internal communication
Developing new products/services and revenue streams
Strengthening existing business partnerships
Improving finance/accounting and procurement processes
Reducing CAPEX
Reducing OPEX
35 34 23 5 3
35 32 26 5 3
31 37 28 2 2
31 31 29 6 3
29 35 28 4 3
26 33 30 8 3
26 33 27 8 5
26 32 32 6 4
25 34 32 6 4
25 32 35 5 3
25 32 34 6 3
25 29 35 6 4
23 30 33 9 5
19 23 41 12 5
18 28 41 8 5
(5)To a significant extent (4) (3)To a moderate extent (2) (1)To a limited extent
To what extent is your organization’s big data strategy/big data initiatives connected
to each of the following business goals?
Base: 155 qualified respondents who have implemented or have plans to implement big data projects (figures in percent)
About half of all respondents have either already deployed
or are in the process of implementing big data projects at their organizations.
Already deployed/implemented big data initiatives
In the process of implementing big data projects
Planning to implement big data projects over the
next 12 months
Planning to implement big data projects within the
next 13 – 24 months
We have no immediate plans to implement big data
projects
At what stage is your organization currently with the planning and rollout of big data projects?
Base: 200 qualified respondents (figures in percent)
25
23
21
10
23
of statistical and stochastic methods, data mod-
els and simulations with best- and worst-case
scenarios.
People with job titles such as “data scientist” are
required for this entirely new set of activities.
5. 5
How is meaningful information obtained from
large quantities of unstructuredTwitter and
Facebook text, video and consumer data? A lot
of work has to be done before the data that
finds its way into a company can be turned into
information on which executives can base their
decision-making. Countless selection, process-
ing and analysis steps are involved.
Based on the analysis of numerous case studies,
analytics expert Ken McLaughlin in his blog
“Data to Decisions” suggests six concrete steps
for data-driven decision-making using business
analytics.
Step 1: Establish a goal
A clearly defined goal must meet two re
quirements: It must be both achievable and
measurable. “Reduce product shipping costs
by 15 percent” would be a clearly formulated
goal, for example.
Step 2: Model alternatives
The goal determines the direction, the alterna-
tives and how the goal is to be achieved. Exam-
ple: “Costs of a reasonably priced shipper” ver-
sus “costs of an automated handling process”
would be possible alternatives.
Step 3: Identify the required data
Identify the data and metrics required to model
the alternative. In the example: previous ship-
ping costs and software and hardware costs for
automated processes.
Step 4: Collect and organize data
Before the models can be evaluated, data has
to be collected and organized.
Step 5: Analyze data
To evaluate the data, the appropriate analytical
techniques and then the best alternative have
to be selected.
Step 6: Decide and execute
Finally, the action that delivers the best results
should be executed and the real results observed.
What are the risks?
A central question in connection with big data is
that of data quality. Does data occur more than
once, does it contain errors or inconsistencies, or
are entire records missing? Users are generally
aware of the importance of this question, as a study
by Omikron Data Quality shows.Thirty-nine per-
cent of those surveyed said they believed that a
big data approach is condemned to failure if the
data is of poor quality.
“It is clear that, when there is a larger volume of
data, statistical significance increases and the re-
sults of BI analytics are more reliable,” according to
the study. “However, if the initial data is incorrect,
duplicated or inconsistent, this significance is mis-
leading: in the worst-case scenario, you get appar-
ently clear results that are mathematically sound –
but in fact incorrect. If actions are then taken based
on the results of analytics, which is, of course, the
goal of BI, negative consequences are inevitable.”
(Source: Omikron Data Quality*).
If the analyses and forecasts are to be accurate,
the foundation (i.e., the data) must be correct. In
typical BI, there are proven processes and methods
in the ETL (extract, transform, load) process for
tidying up data before the information is stored
in the data warehouse.These include profiling,
cleansing, enriching and comparing with reference
data.
Data to Decisions: the six steps
6. The challenge of data silos
A further fundamental challenge (or key question)
when dealing with big data is the distribution of the
data to parallel systems. On the one hand, for his-
torical reasons, data silos – from CRM, ERP or other
systems, for example – have mastered the architec-
ture of data storage and increasingly also have to
handle the archiving of historical data. On the other
hand, given rising data volumes, many companies
merely allocate the data flooding in to different
storage locations – without processing or trans-
forming it beforehand.
These distributed and heterogeneous data process-
ing and storage structures are neither cost effective
nor expedient for potential data analyses.They
prevent the exchange and integration of data and
make it difficult to maintain a holistic view of data
management.
Modern integration technologies can be used here
that turn the structured, unstructured and semi-
structured data from a variety of sources into an
integral part of the enterprise-wide data manage-
ment strategy.
To this end, software solutions tap sources of data
throughout the company, read and extract it and
load it into the storage system provided. In the next
step, this data is loaded into data models, enriched
with further data from other sources and then ana-
lyzed. Cloud-based systems help to provide storage
capacity for large volumes of data.
No big data without skilled staff
Successful big data analytics requires not just suit-
able technologies but also skilled staff. Big data an-
alytics can only be implemented with the help of
highly qualified specialists who can handle the rele-
vant tools and technologies and are also able to un-
derstand the requirements of specific departments
and ensure that the technology that is put in place
meets these requirements.
For some time now, a chief data officer (CDO) has
been included in the list of C-level executives in
many US companies.The focus of the CDO’s activi-
ties is on managing data as an asset and converting
it into something with a concrete business value.
Capital One appointed the first CDO in the industry
in the year 2003.
Since then, CDOs have become increasingly com-
mon in lists of top executives, above all in large
public institutions that are overwhelmed with data.
According to Gartner, there are CDOs in 2 percent
of companies around the world and in 6 percent of
large companies.This is forecast to increase to 20
percent of large companies by 2017. In Europe the
CDO is still relatively unknown.Whether it is really
necessary to establish a CDO is a matter of debate,
particularly since the role is not precisely defined.
However, there is an urgent need for big data
experts who are able to work with data effectively.
These IT experts have to have different skills from
those required for conventional IT systems. In addi-
tion to meeting the technical requirements, these
specialists must be able to work with statistical and
stochastic methods as well as analytical models and
have sound industry expertise.
The Experton Group therefore demands that new
types of jobs are created with titles such as data
scientist or data artist.The data scientist is the data
expert who selects the analytical methods and
analyzes the data. A data scientist requires a good
general education with knowledge of mathematics
and stochastics, programming fundamentals,
SQL and databases, information technology and
networks.
Presentation and visualization of the data is then
handled by the data artist, whose training includes
graphic design, psychology, some mathematics,
IT and communications.These jobs form what you
might call the core of big data staff. Other new jobs
are being added to this core group.The table on the
next page shows all of these.
6
7. 7
Big data job descriptions
Position Responsibilities Required expertise
Data scientist Decides which forms of analysis
are most suitable and which
raw data is required and then
analyzes it
Mathematics, stochastics, pro-
gramming, SQL and databases,
information technology and
networks
Data artist Presents the analyses clearly in
the form of charts and graphics
Graphic design, psychology,
mathematics, IT and communi
cations
Data architect Creates data models and decides
which analytical tools are to be
used
Databases, data analysis, BI
Data engineer Looks after the hardware and
software, in particular the ana
lytical systems and the network
components
Hardware and software
knowledge, programming
Information broker Obtains information and makes it
available, for example by providing
customer data or in-house data
from a variety of sources
Databases, communications,
psychology
Who is going to train big data specialists?
Until now, however, companies have hardly ever
been able to call on staff resources like these. “Data
scientist and data artist are jobs for which a two- to
three-year period of training would be required, but
due to the cross-cutting nature of the work, they
scarcely exist today,” says Holm Landrock, a senior
advisor at the Experton Group.
Only a few companies and organizations are com-
mitted to training data scientists and data artists in
any way, but what they offer is far from a compre-
hensive program of training. IT Companies such as
SAS, EMC and Oracle do offer training in this direc-
tion.The Fraunhofer also offers training for data
scientists.
But short courses like this are just a drop in the
ocean.The Experton Group therefore recommends
that the ICT industry should get together with
education providers – such as vocational acade-
mies, technical colleges, industry associations and
chambers of industry and commerce – to create
new job profiles as quickly as possible.Training
staff for a role as a data scientist or one of the other
new jobs types is not some kind of Good Samaritan
project but a foundation stone for future big data
projects and the resulting sustainable business
success.
8. 8
What big data solutions exist?
There is no standard solution, but some processing
methods have emerged in recent years that serve
as the basis for big data analytics today and will
continue to do so in the next few years.
The ideal solution for coming to grips with huge
volumes of data is the old principle of “divide and
conquer”. Arithmetic calculations are subdivided
into many small calculations and distributed to
multiple servers. Google’s MapReduce algorithm
has emerged as the de facto standard for distri
buted computing. A typical MapReduce application
calculates multiple terabytes of data on thousands
of machines.
MapReduce is implemented in practice by means of
the software library Apache Hadoop. By subdivid-
ing the data into smaller chunks and processing
them in parallel on standard computers, Hadoop
has emerged as the current industry standard for
big data environments.
The Chinese mobile phone provider China Mobile,
for example, was able to use Hadoop to analyze the
phone usage of all of its customers and the proba-
bility of them churning.The “scale-up” solution it
was using prior to this enabled the company to
analyze the data of only around ten percent of its
customers. Now, however, all customer data can
be taken into account, and targeted marketing
measures have been introduced to reduce churn.
Source:
How Organisations
are approaching
Big Data, IDG,
September 2013
In-memory permits real-time analytics
However, a Hadoop cluster is not capable of han-
dling all big data tasks. If the data is on a hard disk,
slow database accesses cannibalize the gains made
through parallelization.This is why in-memory
databases have established themselves for the
accelerated processing of extremely large quan
tities of data.These databases store the data in
working memory (RAM) and call it from there.
That makes them faster than that use conventional
disk technology by a factor of around 1,000.
To obtain the maximum in terms of performance,
wherever possible in-memory databases therefore
load the entire volume of data – together with the
database applications – into main memory, which
has to be large enough to cope. Business data ana-
lytics can thus be carried out virtually in real time
rather than taking days or weeks.
SAP’s highly popular HANA (High-Performance
Analytic Appliance), for example, a database
About two-thirds of respondents are extremely/very likely to consider using
or to continue to use in-memory databases.
In-memory databases
(e.g., SAP HANA, Oracle Exadata)
Log file analysis software
NoSQL databases
Columnar databases
Hadoop/MapReduce
(5) Extremely likely (4)Very likely (3) Somewhat likely (2) Not very likely
(1) Not at all likely Not familiar with this type of solution
How likely are you to consider using or to continue to use each of the following big data solutions?
Base: 155 qualified respondents who have implemented or have plans to implement big data projects
(figures in percent)
28 38 15 9 3
20 32 26 10 3
20 31 26 9 7
17 28 28 12 4
15 25 26 12 6
6
9
6
11
15
9. 9
system based on in-memory technology, was
unveiled as a high-performance platform for the
analysis of large volumes of data in mid-2010 by
Hasso Plattner and SAP technology bossVishal
Sikka. Database specialist Oracle also now offers
a database system based on in-memory techno
logy: Exadata.
In-memory databases are no longer a niche
product. According to a study by TNS-Infratest
commissioned by T-Systems, 43 percent of
German companies are already using in-memory
technologies for data analytics or plan to do so
in the near future. Ninety percent of users say
their experience with the technology has been
good or very good. (Source: T-Systems New
Study*)
However, the majority of German companies
regard in-memory technology as complementary
to time-critical analytics as things stand. But almost
20 percent of companies see it as an important
response to the challenges of big data.They expect
in-memory systems to become a central element
of data analytics environments.
In addition, there are technologies such as NoSQL
databases for unstructured data. NoSQL is the
collective term for “non-relational” database
systems and also the term used to describe a shift
away from relational databases to new or forgotten
database models. NoSQL database systems are
an efficient way to store and process unstructured
data such as text, audio files, videos and photo-
graphic material.
Source:
How Organisations are
approaching Big Data,
IDG, September 2013
Overall, respondents believe that in-memory databases best address big data’s
challenges, but there are significant differences by region.
Which of the following solutions do you believe would best address the challenges associated with big data?
Base: 147 qualified respondents who are familiar with two or more big data solutions shown in Q.3
(figures in per cent)
Make or buy?
The current market situation for big data solutions
presents a final challenge on the way to big data
success. Numerous providers are offering software
tools based on Hadoop.These include Cloudera,
Hortonworks, Datameer and HStreaming as well as
big names such as IBM, Intel and EMC. But they are
all coming up against the same limitation: none of
them have standardized industry solutions that can
be customized quickly to suit customers’ require-
ments.They often have to specially develop these
systems in joint projects together with their cus-
tomers.
Companies wanting to use the technology are
faced with a typical “make or buy” decision.When
analytics is carried out on a one-off basis, or there
Respondents in EMEA are significantly more likely to favor in-memory databases (60%),
compared to only 22% in the US and 14% in Brazil.
In-memory databases
(e.g., SAP, HANA, Oracle Exadata)
NoSQL databases
Log file analysis software
Columnar databases
Hadoop/MapReduce
Not sure
30
19
15
12
11
14