Asking the Right Questions of Your Data

Copyright © Think Big Analytics and Neustar Inc.1
Asking the Right
Questions of your
Data
Mike Peterson
VP of Platforms and Data Architecture, Neustar
Jun 26, 2013

We have come a long way!!!
3
But where/when is the GOLD?
Unintended Consequence of Big
Data
We need to ask the right Questions
Oh, and lets remember religion
and not forget GOVERNANCE
Copyright © Neustar Inc.

Big Data Evolution Status
4
» New data platform is built – 3Tier
» Collected many Pbs of data
» Hadoop infrastructure in place for 2yrs
» Established Data Science teams
» Machine Learning is in place
» Increased technology skills
» Focused data teams
» Active in the community

Our Partners are still a part of our process
5 Copyright © Think Big Analytics and Neustar Inc.
» Expertise in Technologies
» Trusted partner
» Collaborative Teams
» Open source leader
» Invested in client success
» Price/performance

Some Unintended Consequences
6
» More Customer Reporting Request
» Because we suddenly have lots of customer
data available
» Meaning more work for the DW team!!!
» DR Site is more required than ever
» More data, means more critical data to protect
» Network Stress to support DR and other additional
access
» Data Governance is overwhelmed with request
» Retention Policies need to be re-thought

Questions
7
» Customer Driven Questions
» Easy to understand
» Subject Questions
» Discover the pivot and you have a good start
» Exploratory Questions
» Thinking of the unformed questions
» Working from the top down
» Narrowing the answer before you test all the data

Questions - Approaches
• Understand what manual process you want to automate:
what is currently manually predicted that could be
automated and determine if there’s any way to get training
data comprising of <input,output> pairs.
• Consider methods to augment existing data with a “pivot”
column that can be used to join. For example, geo-location
of an IP address could lead to joining with Census Data
based on zip+4.

• Determine if your problem is one of prediction or one of
grouping (clustering). The latter is more of a task that can
lead to better understanding rather than solving a direct
business problem.

• Determine if you are more interested in finding “interesting”
relationships among data columns rather than knowing the
columns. This is a task I’d call more of “discovery” than
prediction but the idea is to determine one column as the
output column in terms of the other columns as input.
• Doing this for all output columns can lead to “discovery”
of those correlations that are the strongest (e.g., every
time a customer buys beer at 5PM, he is likely to buy
diapers). This is more of a fishing expedition, but can
lead to unusual insights.

Impetus Approach to Questioning Data
11 Copyright © Neustar Inc.
EXISTING DATA
PROPERTY
BUSINESS
STRATEGY
CUSTOMER
PROBLEM
STATEMENTS
ANALYSIS OF
DATA PROPERTY
DISCUSSION
WITH
STAKEHOLDERS
ANALYSIS OF
PROBLEM
STATEMENT
DATA NEEDS
STATEMENT
REFINED
PROBLEM
STATEMENT
DATA ANALYTICS
PLAN

Who knew there was religion in Analytics
12
» Statistical Analysis vs. Machine Learning
» Stats people think “truth”
» Machine Learning people think “near truth”
» Truth is easy to bound
» Cost models make sense to org
» Near Truth is hard to explain and bound
» It is where the real exploration happens
» But – it can consume the Data Scientist
» Both can net real returns – and they need to co-
exist

GOVERNANCE
14
» Don’t forget about Governance
» Contracts
» PII
» Brand
» CPO & CISO are your friends - honestly
» Protect your CUSTOMER DATA
» It will slow you down in the beginning
» But you want your results to be reputable
» We need to get to a policy framework at some
point that is automated

About Impetus
» Accelerated consulting and services leader for Big Data;
Headquartered in San Jose since 1996; 1400+; Presences
in Silicon Valley, Atlanta, NYC; offices in India; Expertise
through Architects
» Pioneers in distributed software engineering with vertical
and functional expertise; Dedicated innovation labs; 200+
Big Data practitioners; 80+ dedicated to R&D

Drill
* Incoming
Question
* Problem
Landscape
* Underlying
Constraints
* Specific Goals
Assess
* Goal Driven
Hypotheses
* Data
Requirement
* Resource
Requirements
* Analysis Plan
Target
* Data Collection
* Quality
Assessment
* Cross
Validation
* Restructuring
Analyze
* Test Previous
Hypotheses
* Explore New
Hypotheses
* Test
* Quantify
Results
Recommend
* Summary of
Results
* Key Novel
Insights
* Impact Analysis
* Action Items
Data Science Approach

» Recommender Systems
» Sentiment Analysis
» Topic Identification
» Predictive Analytics
» Data Stream Analytics
Data Science Focus
Areas
Contact us at bigdata@impetus.com

Asking the Right Questions of Your Data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a Asking the Right Questions of Your Data

Similar a Asking the Right Questions of Your Data (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

Asking the Right Questions of Your Data

Notas del editor