2. ziffdavis.com 2 of 9
Ziff Davis | White Paper | Big Data, Little Data and Everything in Between
ziffdavis.com
Executive Summary
“Big Data” has been on corporate radar screens for years now. Unfortunately, outside of the
data scientists and statisticians who spend their days immersed in truly complex datasets,
both end users and key decision makers often struggle to make sense of the data their
organizations collect and generate. Whether an organization’s data is really “big” is even a topic
of debate.
Characterizing a company’s data, though, can’t simply be dismissed as a matter of semantics. It
frequently falls on IT to provide appropriate analytics solutions for heterogeneous users with a
wide range of skill sets, job descriptions, and analytical needs, whether the data being analyzed
is truly unstructured, web-scale data or just too many spreadsheets. Regardless of the type
or scale of data your users need to harness and analyze, they need a straightforward, visual
solution that is easy to use on the front end and highly scalable on the back end. Fortunately,
IBM SPSS Modeler, SPSS Analytical Server, and SPSS Analytical Catalyst provide just such
an ecosystem that can make different kinds of data stores, from Hadoop to those proverbial
spreadsheets, useful sources of business insight and decision support.
3. ziffdavis.com 3 of 9
Ziff Davis | White Paper | Big Data, Little Data and Everything in Between
Introduction
Modern businesses no longer struggle to collect customer data, record internal metrics, or
even build web-scale data warehouses. As the
volume of available data continues to increase
exponentially and our ability to collect it from a
multitude of sources has improved dramatically,
the real problem has become how to turn all
of this data into usable information. How can
the data drive strategic planning and tactical
decision-making in concrete ways from the
executive boardroom down to specific lines of
business?
In the same way, IT departments have
become quite adept at building storage and
data management infrastructures. Storage
virtualization, cloud technologies, and even
Hadoop clusters let IT collect, store, and manage
all manner of data. However, as the “keepers
of the data”, IT is also frequently asked to
implement and support an analytics solution that
can do more with all of this data than merely
spit out reports. Easy-to-use and big data
analytics are not two concepts that usually go
hand-in-hand, but whatever solution IT delivers
needs to be flexible and scalable on the backend
while meeting the needs of a variety of users
on the front end. It needs to connect to existing
data stores and be ready for future sources of
data, the structure and scale of which may not
even be predictable. Too often, this puts IT in the
unenviable position of rolling out a solution to a
very poorly defined problem.
Do Your Users Really Have “Big Data?” Does it Matter?
One way to approach the widely varying needs of end users is to look for multiple solutions
that suit particular analytics requirements. For example, human resources may want to analyze
metrics collected from various departments and job classifications to help determine pay
grades and compensation. The data they need to examine would hardly be considered “big
data” but would be completely overwhelming in basic pivot tables or spreadsheets.
What is Big Data?
The term “Big Data” is used
so frequently that it would
hardly seem to require a
definition. Yet it is frequently
misused and misunderstood.
IT administrators know that:
• Everyone thinks they have big data
• Everyone believes they should be
leveraging big data
• If they happen to not have big data
or the tools to analyze it, they want
them…now
In reality, data is not so easily
quantified. That said, “big data”
refers to collections of data that
are too large and complicated for
management and analysis with
standard tools. These tools were
built for relational databases that
pre-date a ubiquitous World Wide
Web, machine-to-machine data, and
unstructured data that now dominate
our most challenging analytical tasks
4. ziffdavis.com 4 of 9
Ziff Davis | White Paper | Big Data, Little Data and Everything in Between
The marketing department, on the other hand, may be analyzing potentially millions of records
from social media, online advertising, and overall market trends, and attempting to correlate
that information with actual brick and mortar point of sale data. They may be encountering
unstructured data, data that normalize poorly, and both transactional and historical data in
near real-time. Most people would consider this a far better example of big data than the HR
information being analyzed above.
But does it matter? Probably not, if IT can identify a single, unified analytics platform that can
scale both on the back end and for end users, no matter how “big” their data. Realistically,
the IT department can only support a finite number of tools and it is likely that others in the
organization will want to analyze aggregated data that spans the business, a task made much
harder with disparate data management and analytics tools.
Trends in Predictive Analytics
This shift away from traditional data management paradigms with statisticians and data
scientists as the sole end users of an organization’s data is paralleled by a move away from
strict analytical reporting and towards predictive analytics. Predictive analytics have been a
hallmark of business intelligence and decision support systems for some time, but again, these
systems have largely been the domain of statisticians and with executives enjoying the insights
they provide.
Now, however, systems are emerging that allow a much larger group of end users to use
historical and transactional data to model business problems and predict potential outcomes.
The idea of being “data-driven” is extending beyond the C-Suite and trickling down to the rest
of the organization. Tools for predictive analytics are:
• Becoming visual and easier to use so that they are accessible to many users
• Becoming differentiated and/or scalable, making them suitable for statisticians to build
advanced models and for line of business employees to intelligently formulate questions
and use them for front-line decision-making
• Enabling embedded features such that even customer-facing applications can include
predictive features
Opportunities Created by Effective Predictive Analytics
“Want of foresight, unwillingness to act when action would be simple and effective, lack of
clear thinking, confusion of counsel until the emergency comes…these are the features which
constitute the endless repetition of history.” - Winston Churchill
No, Winston Churchill was not talking about predictive analytics or big data in 1935 when
he made these remarks. But predictive analytics are, in fact, a key enabler of so-called
“organizational learning.” Businesses can ask how to better meet customer needs, respond
5. ziffdavis.com 5 of 9
Ziff Davis | White Paper | Big Data, Little Data and Everything in Between
to market fluctuations, manage risk, and otherwise seek new competitive advantages
by developing predictive models based on historical data. When implemented correctly,
organizations can use predictive models to answer critical questions that are far better
addressed statistically than with intuition:
• How is the current business environment like environments the organization has
encountered in the past?
• What approaches worked well then? What approaches didn’t?
• What patterns of customer behavior can we correlate with products, marketing, and
strategic shifts?
• What changes led to emerging quality problems or customer complaints?
• What is the general perception of our products in social media? And what effects do
particular campaigns have on those perceptions?
While these are high-level questions that predictive analytics can help answer, the right
software can also suggest operational adjustments.
Recent high-profile data breaches also highlight opportunities that can be created by
predictive analytics tools. For example, companies could identify transactional patterns
associated with an ongoing attack and address vulnerabilities before they reach critical scale.
Predictive Analytics Next
Large organizations have used predictive analytics for years. Researchers have employed
predictive techniques and tools to model everything from climate change to the efficacy
of cancer drugs. However, the next generation of predictive tools is here. These tools are
accessible enough to find their way into the hands of end users and embedded predictive
analytics are increasingly being surfaced to customers in online applications and ecommerce.
As a result, we’re seeing predictive tools pushed down to operations and moving into the realm
of not just business intelligence but “predictive intelligence.”
As businesses in all sectors look to create cultures of data, IT departments are being asked to
identify solutions that empower end users with robust predictive tools. The traditional “decision
support system” is too far removed from daily decision making and is better suited to strategic
planning.
Heterogeneous Users and Diverse Use Cases
Instead, increasingly savvy users are demanding access to streams of real-time data,
vital historical information, and far more complex data than the distilled reports that many
businesses provide. Interactive dashboards that include predictive analytics and deep
drill-down and visualization capabilities are quickly replacing simple BI scorecards.
6. ziffdavis.com 6 of 9
Ziff Davis | White Paper | Big Data, Little Data and Everything in Between
Simultaneously, the data czars in an organization (usually statisticians and data scientists)
need to be able to develop increasingly sophisticated analytics applications to surface to
users.
Real-time and Embedded Analytics
Advanced analytics platform must at once:
• Accesses data stores from across an organization
• Supports the development of complex applications and deep data insights
• Be nearly transparent to most end users
The multifaceted nature of current (and future) analytical needs is driving the growth of
embedded analytics. In particular, embedded predictive analytics support everything from
customer recommendation engines to line of business applications like CRM that improve
customer service and responsiveness in sales and marketing teams.
In fact, for predictive analytics to be truly transformative in an organization and accessible
to the broadest cross-section of users, a growing number of IT and BI professionals believe
that users shouldn’t even realize they are accessing predictive tools. Rather, they should be
application-embedded such that users are seamlessly provided with decision support, without
any need to conduct their own analyses. For example:
• Field agents in homeland security positions should not need to log into a separate
analytics application to gain insight into emerging threats based on increased chatter on
social media
• Customers visiting a website should automatically be presented with product
recommendation tied to past purchases, profiles built from similar users, and their current
locations
• Insurance agents should have a complete view of a client’s risk profile that aggregates
everything from credit scores to prior claims to healthcare data
The Bottom Line for IT
Businesses, and their IT departments in particular, must substantially alter their definition
of users to include customers, partners, internal end users, developers, statisticians, and
executives. Fundamentally, when IT groups are asked to implement predictive analytics
solutions, they are actually being asked to provide an ecosystem of platforms and tools
suitable for every user covered by this new definition, enabling them to make better decisions
faster.
7. ziffdavis.com 7 of 9
Ziff Davis | White Paper | Big Data, Little Data and Everything in Between
As we’ve seen, responsibility for business analytics is increasingly being taken on from the
executive to the operational levels of modern enterprises with statisticians and data scientists
leading the most complex and strategically important analytical initiatives. Although IT has
some analytical needs in its own right (e.g., tracking hardware capacity, application readiness,
etc.), IT’s real focus is on providing platforms. Integrated platforms that can support:
1. Complex analytics with hooks into Hadoop and other varied data stores
2. More basic standalone analytics needs
3. Executive-level decision support
4. Embedded predictive analytics are hard to find in the market today
No discussion of predictive analytics tools would be complete without addressing ways that
they address Big Data. As we will see in the next section, IBM SPSS Modeler, SPSS Analytical
Server, and SPSS Analytical Catalyst form exactly the sort of integrated platform outlined
above that can address both Big Data needs and satisfy requirements for analysis of local
data stores.
Big Data – Complicated, Messy, and
Really Useful
Actually using Big Data in meaningful and
insightful ways, influence customers, and,
as described in the discussion of predictive
analytics above, “make better decisions faster”,
is one of the greatest challenges facing
organizations today. Big Data is messy for
several reasons:
• Its scale is such that many tools buckle
under the sheer volume of records involved
• Data often don’t fit (because of their
inherent structure or lack thereof) into the
neat, glorified spreadsheets to which users
are accustomed
• Data must often be aggregated from
sources that were never meant to be
merged and joined to generate insights
All of these challenges aside, organizations can’t afford to ignore their vast stores of data
What is Hadoop?
Hadoop is an open source technology
for storing, indexing, and analyzing
very complicated datasets. Originally
conceived by Google to perform deep
analytics on unstructured search data,
Hadoop has grown into a mature tool for
distributed storage and analysis of data
that fits poorly into standard relational
tables.
Though incredibly powerful, Hadoop is
not only complicated but often poorly
understood outside the data science
community. As with Big Data, IT often
receives mandates to implement
Hadoop because every other data-driven
organization is using it…Aren’t they?
8. ziffdavis.com 8 of 9
Ziff Davis | White Paper | Big Data, Little Data and Everything in Between
if they wish to remain competitive. Similarly, IT can’t afford, in a very literal sense, to simply
accumulate massive datasets and not deploy platforms that enable users to leverage them in
real time for both operational and strategic purposes.
Asking the Right Questions
One of the most challenging aspects of Big Data analytics is simply being able to ask the
right questions. In traditional data collection activities like clinical drug trials or educational
assessments, questions and hypotheses are formulated in advance and data structures are
built specifically to answer those questions: “Is this curriculum associated with a statistically
significant improvement in test scores?” and “Does treatment with this medication improve
clinical outcomes when compared to placebo?”
With Big Data, however, users need to be able to explore and visualize the data before they
can start asking meaningful questions. Especially with unstructured data, questions can
rarely be precisely formulated in advance. Exploratory tools, though, like those found in IBM
SPSS Modeler, let users connect with statisticians and data scientists, asking much more
open-ended questions. For example, “There appears to be a group of customers who aren’t
returning while another group appears to be quite loyal. Are there underlying characteristics of
the two groups that could explain this split? And have any of our advertising campaigns been
able to bring back customers? What are defining characteristics of the customers we won
back?”
Statisticians aren’t marketers, quality control engineers, manufacturers, or sales staff. They
have the expertise to answer the questions but require input from lines of business and
subject matter experts to know what questions need answering. Again, this is where IT enters
the picture. IT needs to provide the tools that let salespeople talk to statisticians.
Yes, Your Users Can Access Hadoop
Hadoop is intimidating even to experienced users. Hadoop and the data it is designed to
manage and analyze are simply too complicated for end users to jump in and begin the kinds
of exploratory analysis described above. IBM SPSS Analytical Server, though, provides a
connection to a variety of data sources (including Hadoop) while IBM SPSS Catalyst gives
users a unique browser-based means of exploring the aggregated data, regardless of its
source. Each of these components contributes to the dialog between data scientists
and users.
Performance and Scalability, No Matter How “Big” the Data
Because this platform can scale from a single-user desktop deployment of SPSS Modeler
to a full-blown predictive analytics ecosystem, the tools include several performance
enhancements. SQL pushback is built into SPSS Analytical Server, a technique that allows
9. ziffdavis.com 9 of 9
Ziff Davis | White Paper | Big Data, Little Data and Everything in Between
SQL database servers to execute code on their own hardware.
SPSS Analytical Server also supports analysis of real-time data streams. While Hadoop is
well-suited to dealing with very large datasets and batch processing of data, real-time data
will quickly overwhelm Hadoop. Analytical Server, on the other hand, can deliver real-time
analytical capabilities on large numbers of large data streams. It also speeds analytics, whether
the results are being delivered to customers in an e-commerce setting or enterprise users
exploring potential relationships in Big Data applications.
Conclusion: Teaching Users What They Want, Giving Them What They Need
IT has a unique opportunity in IBM SPSS predictive analytics tools to deliver a robust, highly
scalable solution that meets the needs of heterogeneous users in ways that few other
platforms can. In bringing these tools to an organization, IT can then bring a range of predictive
analytics to bear on a variety of business problems. In fact, SPSS predictive software is
a complete solution for harnessing Hadoop, relational databases, and even the mass of
spreadsheets that tend to accumulate in lines of business.
When users aren’t clear on their data analysis needs (and they generally aren’t), tools
like SPSS Modeler are sufficiently flexible to help both IT and statisticians translate user
requirements into data-rich applications. Perhaps more importantly, this ecosystem of tools
can make data stores that are utterly inaccessible to most users into deeply interactive
environments that connect lines of business to decision-makers and data scientists whose
work would otherwise not be well-informed by “feet on the ground.”
To learn more about IBM SPSS Modeler, Analytical Server, and Catalyst, visit:
http://www-01.ibm.com/software/analytics/applications/big-data/