The document discusses strategies for managing big data and turning information chaos into reliable data. It notes that data is being created at an unprecedented rate and exceeds standard management tools. A framework is proposed that includes defining objectives and strategy, establishing organization and processes, designing the data architecture, and selecting applications. The goal is to extract useful insights from data to drive positive business outcomes and competitive advantages.
2. Information Overload
• Data creation/delivery exceeding standard management
tools
• Volume, variety, velocity, and variability
• Interesting facts:
– Every 6 hours, the NSA gathers as much data as is stored in the
entire Library of Congress
– Facebook’s photo collection has over 140 billion photos
– In 2012, every day 2.5 quintillion bytes of data created, with
90% of the world’s data created in the last two years alone
– Twitter averages 500 million tweets per day
3. Analysis Evolution
• Derived from stable,
fixed sources
From
Business
Intelligence
Actionable
Information
• Fixed types
Big
Data
Data
To
Data Analytics
Collection
• Varied types
• Serial
Processing
• Iterative
Analysis
• Pattern Analysis
Reporting
• Derived from diverse,
dynamic sources
Actionable
Information
4. Business Relevance
•
•
•
•
•
•
•
Provides customer/environmental insights
Establishes a competitive advantage
Shapes marketing strategies
Reduces uncertainty
Enables optimization
Improves decision making
Increases productivity
Ref. Turn Information into a Strategic Asset - SAP
Provides Critical Information to Drive Positive Business Outcomes
5. Data Management Framework
1
•
2 Organization
3
Process/
Methods
System
5
3
•
Define data usage in analysis, process
control, and business management.
Establish processes to monitor and
ensure data quality.
Develop data structures to address
company-wide requirements
•
5
Select, design, and implement software
applications to accomplish strategic
objectives
Strategy
Controls
4
Define process/data owners, roles, and
responsibilities
4
•
1
Define objectives; Confirm data strategy
alignment to business strategy
2
•
Objectives
Data Architecture
Applications
6. Strategy
• Define key business objectives or
problems to solve
• Clarify data required for strategic
choices
• Identify what’s required to
establish a competitive
advantage
Acquire, Grow, Retain
Customers
Create New Business
Models
Improve IT Economics
Manage Risks
Optimize Operations
and Reduce Fraud
Transform Financial
Processes
Ref. IBM Use Cases (IBMbigdatahub.com)
7. Controls
• Proactively secure data and comply with
privacy regulations
• Understand retention requirements
• Incorporate Data Quality Management and
define quality metrics
• Document organization roles/responsibilities
• Define data reporting, access and latency
requirements
• Establish analytics driven business processes
• Fight bureaucracy and organizational silos
8. Data Architecture
• Categorize data and usage
– Content format: structured, semi-structured, or
unstructured
– Type: transactional, meta data,
– Analysis: real-time or batch
– Processing methodology: predictive analysis, analytical,
query/reporting
– Data source: web, machine generated, data entry, etc.
• Define data structures to support cross-business
needs
• Document data definitions
9. Applications
•
•
•
•
•
Web Crawlers
Social Media
Network Logs
Sensor Networks
SAP
Acquisition
•
•
Data Management
•
•
•
•
R
Python
SQL
MapReduce/Hive/
Pig
Analytics
Visualization
•
•
•
•
•
Flat Files
Relational
Databases
Hadoop/NoSQL
MongoDB
Jpg/png
BI (Spoyfire,
Jaspersoft)
Web Apps (ext-js,
d3.js)
Various Toolsets are Available to Fulfill Data Intelligence Needs
12. Various Choices Available to Implement Analytics…
Approach
Ease of
Learning
Availability
on Systems
Java
Hive
Pig
Commercial
Tools
Streaming
frameworks
Streaming
Also works outside of Hadoop with no code
changes!
Analysis
Flexibility
13. Implementation Methodology
Motivation/
Constraints
Business
Discovery
Data
Discovery
Design
Build
New/Changing
Operational
Reqmts
What does
Customer seek
to accomplish?
What data is
available to
work with?
Data
Architecture
Architect
Data
Exploration
Problem
Statement
Where is data
located?
Infrastructure
Architecture
Infrastructure
Data Analytics
Ingest Data
Process
Existing Tools,
Custom Code
Data Mining,
Scientist
Techniques
Analytics,
Visualization
Presentation
Visualization
Tools/Product
Selection
Deliver, Train
Result
Evaluation
Legal &
Compliance
Regulations
Pain Points
Security
Concerns
Organization’s
Culture
Market
Pressures &
Mission
Expansion
Data
Ecosystem
Budget,
Resource
Reductions
Existing Data
Architecture
Limitations
What
architecture to
support data?
What
additional
data is
required?
What type of
analytics used,
needed?
Decision
Support
Predictive
Modeling
Action
Planning
Continuous
Improvement
14. Summary
• Begin with the end in mind
• Incorporate controls to drive data quality
• Protect the data
Notas del editor
DQM required to draw the correct conclusions. Data integrity