SlideShare una empresa de Scribd logo
1 de 18
Copyright © Think Big Analytics and Neustar Inc.1
Asking the Right
Questions of your
Data
Mike Peterson
VP of Platforms and Data Architecture, Neustar
Jun 26, 2013
2 Copyright © Neustar Inc.
We have come a long way!!!
3
But where/when is the GOLD?
Unintended Consequence of Big
Data
We need to ask the right Questions
Oh, and lets remember religion
and not forget GOVERNANCE
Copyright © Neustar Inc.
Big Data Evolution Status
4
» New data platform is built – 3Tier
» Collected many Pbs of data
» Hadoop infrastructure in place for 2yrs
» Established Data Science teams
» Machine Learning is in place
» Increased technology skills
» Focused data teams
» Active in the community
Copyright © Neustar Inc.
Our Partners are still a part of our process
5 Copyright © Think Big Analytics and Neustar Inc.
» Expertise in Technologies
» Trusted partner
» Collaborative Teams
» Open source leader
» Invested in client success
» Price/performance
Some Unintended Consequences
6
» More Customer Reporting Request
» Because we suddenly have lots of customer
data available
» Meaning more work for the DW team!!!
» DR Site is more required than ever
» More data, means more critical data to protect
» Network Stress to support DR and other additional
access
» Data Governance is overwhelmed with request
» Retention Policies need to be re-thought
Copyright © Neustar Inc.
Questions
7
» Customer Driven Questions
» Easy to understand
» Subject Questions
» Discover the pivot and you have a good start
» Exploratory Questions
» Thinking of the unformed questions
» Working from the top down
» Narrowing the answer before you test all the data
Copyright © Neustar Inc.
Questions - Approaches
• Understand what manual process you want to automate:
what is currently manually predicted that could be
automated and determine if there’s any way to get training
data comprising of <input,output> pairs.
• Consider methods to augment existing data with a “pivot”
column that can be used to join. For example, geo-location
of an IP address could lead to joining with Census Data
based on zip+4.
Questions - Approaches
• Determine if your problem is one of prediction or one of
grouping (clustering). The latter is more of a task that can
lead to better understanding rather than solving a direct
business problem.
Questions - Approaches
• Determine if you are more interested in finding “interesting”
relationships among data columns rather than knowing the
columns. This is a task I’d call more of “discovery” than
prediction but the idea is to determine one column as the
output column in terms of the other columns as input.
• Doing this for all output columns can lead to “discovery”
of those correlations that are the strongest (e.g., every
time a customer buys beer at 5PM, he is likely to buy
diapers). This is more of a fishing expedition, but can
lead to unusual insights.
Impetus Approach to Questioning Data
11 Copyright © Neustar Inc.
EXISTING DATA
PROPERTY
BUSINESS
STRATEGY
CUSTOMER
PROBLEM
STATEMENTS
ANALYSIS OF
DATA PROPERTY
DISCUSSION
WITH
STAKEHOLDERS
ANALYSIS OF
PROBLEM
STATEMENT
DATA NEEDS
STATEMENT
REFINED
PROBLEM
STATEMENT
DATA ANALYTICS
PLAN
Who knew there was religion in Analytics
12
» Statistical Analysis vs. Machine Learning
» Stats people think “truth”
» Machine Learning people think “near truth”
» Truth is easy to bound
» Cost models make sense to org
» Near Truth is hard to explain and bound
» It is where the real exploration happens
» But – it can consume the Data Scientist
» Both can net real returns – and they need to co-
exist
Copyright © Neustar Inc.
13 Copyright © Neustar Inc.
GOVERNANCE
14
» Don’t forget about Governance
» Contracts
» PII
» Brand
» CPO & CISO are your friends - honestly
» Protect your CUSTOMER DATA
» It will slow you down in the beginning
» But you want your results to be reputable
» We need to get to a policy framework at some
point that is automated
Copyright © Neustar Inc.
About Impetus
» Accelerated consulting and services leader for Big Data;
Headquartered in San Jose since 1996; 1400+; Presences
in Silicon Valley, Atlanta, NYC; offices in India; Expertise
through Architects
» Pioneers in distributed software engineering with vertical
and functional expertise; Dedicated innovation labs; 200+
Big Data practitioners; 80+ dedicated to R&D
Drill
* Incoming
Question
* Problem
Landscape
* Underlying
Constraints
* Specific Goals
Assess
* Goal Driven
Hypotheses
* Data
Requirement
* Resource
Requirements
* Analysis Plan
Target
* Data Collection
* Quality
Assessment
* Cross
Validation
* Restructuring
Analyze
* Test Previous
Hypotheses
* Explore New
Hypotheses
* Test
* Quantify
Results
Recommend
* Summary of
Results
* Key Novel
Insights
* Impact Analysis
* Action Items
Data Science Approach
» Recommender Systems
» Sentiment Analysis
» Topic Identification
» Predictive Analytics
» Data Stream Analytics
Data Science Focus
Areas
Contact us at bigdata@impetus.com
Thank you
Questions?

Más contenido relacionado

La actualidad más candente

SEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateSEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateKoray Tugberk GUBUR
 
Overview of Startup Fundraising
Overview of Startup FundraisingOverview of Startup Fundraising
Overview of Startup FundraisingRoy Rodenstein
 
Floodgate vc fundraising primer ppt
Floodgate vc fundraising primer pptFloodgate vc fundraising primer ppt
Floodgate vc fundraising primer pptMike Maples, Jr
 
Digital Marketing Conversion Optimization Marketing Strategy Deck Ungagged Day 2
Digital Marketing Conversion Optimization Marketing Strategy Deck Ungagged Day 2Digital Marketing Conversion Optimization Marketing Strategy Deck Ungagged Day 2
Digital Marketing Conversion Optimization Marketing Strategy Deck Ungagged Day 2Roland Frasier
 
Roland Frasier Traffic & Conversion Presentation January 2014
Roland Frasier Traffic & Conversion Presentation January 2014Roland Frasier Traffic & Conversion Presentation January 2014
Roland Frasier Traffic & Conversion Presentation January 2014Roland Frasier
 
Effective Networking and Business Development Skills
Effective Networking and Business Development Skills Effective Networking and Business Development Skills
Effective Networking and Business Development Skills Peter Cosgrove
 
How to talk to anyone
How to talk to anyoneHow to talk to anyone
How to talk to anyoneMayurPatil236
 
How To Start an Info Publishing Business
How To Start an Info Publishing BusinessHow To Start an Info Publishing Business
How To Start an Info Publishing BusinessPerry Belcher
 
Investment Thesis Fundamentals (April 2016)
Investment Thesis Fundamentals (April 2016)Investment Thesis Fundamentals (April 2016)
Investment Thesis Fundamentals (April 2016)Dave McClure
 
Ethereum Mining How To
Ethereum Mining How ToEthereum Mining How To
Ethereum Mining How ToNugroho Gito
 
How To Create The Perfect Offer
How To Create The Perfect OfferHow To Create The Perfect Offer
How To Create The Perfect OfferPerry Belcher
 
Time to Wow! and Buyer-centric Funnel Design
Time to Wow! and Buyer-centric Funnel DesignTime to Wow! and Buyer-centric Funnel Design
Time to Wow! and Buyer-centric Funnel DesignDavid Skok
 
De-Risking Your Startup -- SaaStr 2017 Talk
De-Risking Your Startup -- SaaStr 2017 TalkDe-Risking Your Startup -- SaaStr 2017 Talk
De-Risking Your Startup -- SaaStr 2017 TalkLeo Polovets
 
Content Marketing Examples, Tips, Plans & Ideas
Content Marketing Examples, Tips, Plans & IdeasContent Marketing Examples, Tips, Plans & Ideas
Content Marketing Examples, Tips, Plans & IdeasPerry Belcher
 
Ungagged 2015 Las Vegas 17 SEO, Conversion & Monetization Templates
Ungagged 2015 Las Vegas 17 SEO, Conversion & Monetization TemplatesUngagged 2015 Las Vegas 17 SEO, Conversion & Monetization Templates
Ungagged 2015 Las Vegas 17 SEO, Conversion & Monetization TemplatesRoland Frasier
 
The value-add of VCs
The value-add of VCsThe value-add of VCs
The value-add of VCsBoris Golden
 

La actualidad más candente (20)

Internet safety presentation sv
Internet safety presentation svInternet safety presentation sv
Internet safety presentation sv
 
SEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateSEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
SEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
 
Overview of Startup Fundraising
Overview of Startup FundraisingOverview of Startup Fundraising
Overview of Startup Fundraising
 
Floodgate vc fundraising primer ppt
Floodgate vc fundraising primer pptFloodgate vc fundraising primer ppt
Floodgate vc fundraising primer ppt
 
Digital Marketing Conversion Optimization Marketing Strategy Deck Ungagged Day 2
Digital Marketing Conversion Optimization Marketing Strategy Deck Ungagged Day 2Digital Marketing Conversion Optimization Marketing Strategy Deck Ungagged Day 2
Digital Marketing Conversion Optimization Marketing Strategy Deck Ungagged Day 2
 
Roland Frasier Traffic & Conversion Presentation January 2014
Roland Frasier Traffic & Conversion Presentation January 2014Roland Frasier Traffic & Conversion Presentation January 2014
Roland Frasier Traffic & Conversion Presentation January 2014
 
Effective Networking and Business Development Skills
Effective Networking and Business Development Skills Effective Networking and Business Development Skills
Effective Networking and Business Development Skills
 
How to talk to anyone
How to talk to anyoneHow to talk to anyone
How to talk to anyone
 
How To Start an Info Publishing Business
How To Start an Info Publishing BusinessHow To Start an Info Publishing Business
How To Start an Info Publishing Business
 
Understanding VCs
Understanding VCsUnderstanding VCs
Understanding VCs
 
Investment Thesis Fundamentals (April 2016)
Investment Thesis Fundamentals (April 2016)Investment Thesis Fundamentals (April 2016)
Investment Thesis Fundamentals (April 2016)
 
Ethereum Mining How To
Ethereum Mining How ToEthereum Mining How To
Ethereum Mining How To
 
How To Create The Perfect Offer
How To Create The Perfect OfferHow To Create The Perfect Offer
How To Create The Perfect Offer
 
Time to Wow! and Buyer-centric Funnel Design
Time to Wow! and Buyer-centric Funnel DesignTime to Wow! and Buyer-centric Funnel Design
Time to Wow! and Buyer-centric Funnel Design
 
De-Risking Your Startup -- SaaStr 2017 Talk
De-Risking Your Startup -- SaaStr 2017 TalkDe-Risking Your Startup -- SaaStr 2017 Talk
De-Risking Your Startup -- SaaStr 2017 Talk
 
Content Marketing Examples, Tips, Plans & Ideas
Content Marketing Examples, Tips, Plans & IdeasContent Marketing Examples, Tips, Plans & Ideas
Content Marketing Examples, Tips, Plans & Ideas
 
Ungagged 2015 Las Vegas 17 SEO, Conversion & Monetization Templates
Ungagged 2015 Las Vegas 17 SEO, Conversion & Monetization TemplatesUngagged 2015 Las Vegas 17 SEO, Conversion & Monetization Templates
Ungagged 2015 Las Vegas 17 SEO, Conversion & Monetization Templates
 
Blockchain for dummies
Blockchain for dummiesBlockchain for dummies
Blockchain for dummies
 
The value-add of VCs
The value-add of VCsThe value-add of VCs
The value-add of VCs
 
The Helpful VC
The Helpful VC The Helpful VC
The Helpful VC
 

Destacado

Asking Questions of Data
Asking Questions of DataAsking Questions of Data
Asking Questions of DataTony Hirst
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questionscrystalpullen
 
Asking better questions
Asking better questionsAsking better questions
Asking better questionsInnoTech
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostelloData Con LA
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDouglas Joubert
 
Leading and Lagging Key Performance Indicators
Leading and Lagging Key Performance IndicatorsLeading and Lagging Key Performance Indicators
Leading and Lagging Key Performance IndicatorsRicky Smith CMRP, CMRT
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 

Destacado (8)

Asking Questions of Data
Asking Questions of DataAsking Questions of Data
Asking Questions of Data
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
 
Asking better questions
Asking better questionsAsking better questions
Asking better questions
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
Leading and Lagging Key Performance Indicators
Leading and Lagging Key Performance IndicatorsLeading and Lagging Key Performance Indicators
Leading and Lagging Key Performance Indicators
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Understanding lead and lag indicators
Understanding lead and lag indicatorsUnderstanding lead and lag indicators
Understanding lead and lag indicators
 

Similar a Asking the Right Questions of Your Data

Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data TipsQubole
 
What makes an effective data team?
What makes an effective data team?What makes an effective data team?
What makes an effective data team?Snowplow Analytics
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
 
How to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing OrganizationHow to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing OrganizationKissmetrics on SlideShare
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubCloudera, Inc.
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
Why Big Data is Really about Small Data
Why Big Data is Really about Small DataWhy Big Data is Really about Small Data
Why Big Data is Really about Small DataHurwitz & Associates
 
Data_Harmonization_ClearStory
Data_Harmonization_ClearStoryData_Harmonization_ClearStory
Data_Harmonization_ClearStoryWilliam Davis
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
How To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise DataHow To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise DataSnapShot
 
Data and analytic strategies for developing ethical it
Data and analytic strategies for developing ethical itData and analytic strategies for developing ethical it
Data and analytic strategies for developing ethical itHyoun Park
 
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Sandra Fernandes
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseJake Bouma
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data scienceTyrone Systems
 

Similar a Asking the Right Questions of Your Data (20)

Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
What makes an effective data team?
What makes an effective data team?What makes an effective data team?
What makes an effective data team?
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
How to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing OrganizationHow to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing Organization
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
Why Big Data is Really about Small Data
Why Big Data is Really about Small DataWhy Big Data is Really about Small Data
Why Big Data is Really about Small Data
 
Data_Harmonization_ClearStory
Data_Harmonization_ClearStoryData_Harmonization_ClearStory
Data_Harmonization_ClearStory
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
How To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise DataHow To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise Data
 
Data and analytic strategies for developing ethical it
Data and analytic strategies for developing ethical itData and analytic strategies for developing ethical it
Data and analytic strategies for developing ethical it
 
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
 
Rapid-fire BI
Rapid-fire BIRapid-fire BI
Rapid-fire BI
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital Enterprise
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data science
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Asking the Right Questions of Your Data

  • 1. Copyright © Think Big Analytics and Neustar Inc.1 Asking the Right Questions of your Data Mike Peterson VP of Platforms and Data Architecture, Neustar Jun 26, 2013
  • 2. 2 Copyright © Neustar Inc.
  • 3. We have come a long way!!! 3 But where/when is the GOLD? Unintended Consequence of Big Data We need to ask the right Questions Oh, and lets remember religion and not forget GOVERNANCE Copyright © Neustar Inc.
  • 4. Big Data Evolution Status 4 » New data platform is built – 3Tier » Collected many Pbs of data » Hadoop infrastructure in place for 2yrs » Established Data Science teams » Machine Learning is in place » Increased technology skills » Focused data teams » Active in the community Copyright © Neustar Inc.
  • 5. Our Partners are still a part of our process 5 Copyright © Think Big Analytics and Neustar Inc. » Expertise in Technologies » Trusted partner » Collaborative Teams » Open source leader » Invested in client success » Price/performance
  • 6. Some Unintended Consequences 6 » More Customer Reporting Request » Because we suddenly have lots of customer data available » Meaning more work for the DW team!!! » DR Site is more required than ever » More data, means more critical data to protect » Network Stress to support DR and other additional access » Data Governance is overwhelmed with request » Retention Policies need to be re-thought Copyright © Neustar Inc.
  • 7. Questions 7 » Customer Driven Questions » Easy to understand » Subject Questions » Discover the pivot and you have a good start » Exploratory Questions » Thinking of the unformed questions » Working from the top down » Narrowing the answer before you test all the data Copyright © Neustar Inc.
  • 8. Questions - Approaches • Understand what manual process you want to automate: what is currently manually predicted that could be automated and determine if there’s any way to get training data comprising of <input,output> pairs. • Consider methods to augment existing data with a “pivot” column that can be used to join. For example, geo-location of an IP address could lead to joining with Census Data based on zip+4.
  • 9. Questions - Approaches • Determine if your problem is one of prediction or one of grouping (clustering). The latter is more of a task that can lead to better understanding rather than solving a direct business problem.
  • 10. Questions - Approaches • Determine if you are more interested in finding “interesting” relationships among data columns rather than knowing the columns. This is a task I’d call more of “discovery” than prediction but the idea is to determine one column as the output column in terms of the other columns as input. • Doing this for all output columns can lead to “discovery” of those correlations that are the strongest (e.g., every time a customer buys beer at 5PM, he is likely to buy diapers). This is more of a fishing expedition, but can lead to unusual insights.
  • 11. Impetus Approach to Questioning Data 11 Copyright © Neustar Inc. EXISTING DATA PROPERTY BUSINESS STRATEGY CUSTOMER PROBLEM STATEMENTS ANALYSIS OF DATA PROPERTY DISCUSSION WITH STAKEHOLDERS ANALYSIS OF PROBLEM STATEMENT DATA NEEDS STATEMENT REFINED PROBLEM STATEMENT DATA ANALYTICS PLAN
  • 12. Who knew there was religion in Analytics 12 » Statistical Analysis vs. Machine Learning » Stats people think “truth” » Machine Learning people think “near truth” » Truth is easy to bound » Cost models make sense to org » Near Truth is hard to explain and bound » It is where the real exploration happens » But – it can consume the Data Scientist » Both can net real returns – and they need to co- exist Copyright © Neustar Inc.
  • 13. 13 Copyright © Neustar Inc.
  • 14. GOVERNANCE 14 » Don’t forget about Governance » Contracts » PII » Brand » CPO & CISO are your friends - honestly » Protect your CUSTOMER DATA » It will slow you down in the beginning » But you want your results to be reputable » We need to get to a policy framework at some point that is automated Copyright © Neustar Inc.
  • 15. About Impetus » Accelerated consulting and services leader for Big Data; Headquartered in San Jose since 1996; 1400+; Presences in Silicon Valley, Atlanta, NYC; offices in India; Expertise through Architects » Pioneers in distributed software engineering with vertical and functional expertise; Dedicated innovation labs; 200+ Big Data practitioners; 80+ dedicated to R&D
  • 16. Drill * Incoming Question * Problem Landscape * Underlying Constraints * Specific Goals Assess * Goal Driven Hypotheses * Data Requirement * Resource Requirements * Analysis Plan Target * Data Collection * Quality Assessment * Cross Validation * Restructuring Analyze * Test Previous Hypotheses * Explore New Hypotheses * Test * Quantify Results Recommend * Summary of Results * Key Novel Insights * Impact Analysis * Action Items Data Science Approach
  • 17. » Recommender Systems » Sentiment Analysis » Topic Identification » Predictive Analytics » Data Stream Analytics Data Science Focus Areas Contact us at bigdata@impetus.com

Notas del editor

  1. Sometimes clustering could be enough to solve a business problem
  2.   We must understand the columns well before understanding the relationships
  3. Data Science results lead to better database marketing – churn analytics, upselling, cross selling, RFM/LTVThese are some of the areas where we’ve used data science and machine learning to come up w/ some interesting models.