This document discusses challenges and opportunities for companies to gain competitive advantage through leveraging big data and data analytics. It notes that (1) enterprises can gain operational advantages by leveraging social, local and mobile technologies to generate insights from individual data, (2) commonly used information architectures do not effectively support collaboration and sharing of all types of information across networks, and (3) companies must address both collaboration/communication and making sense of vast information streams. The document then provides statistics on growth of digital data and challenges of analyzing unstructured data to reveal relevant insights.
2. Challenges and opportunities in gaining advantage
and leverage through data
Companies today are evolving into virtual networks of permanent and
transient teams of people.
̶ Enterprises today can garner competitive operating advantage by
leveraging social, local and mobile technology to generate leverage
through individuals
̶ This leverage comes through the application of targeted, specific data at
the point and time of informational advantage
Commonly used information architectures do not address delivery,
collaboration and interchange of ALL-types of information across networks of
people as a core principle.
̶ Knowledge workers create, analyze, manage, decide, evaluate, and
synthesize information of all types as their dominant activity throughout
the enterprise.
Solving the Right Problems – Companies must address two fundamental
activities that intersect their daily routine:
̶ Collaboration, communication and information sharing
̶ Making sense of information - separating noise from the constant stream
2
3. Big Data Volume Statistics and Predictions
Digital Storage Acquisition in zettabytes
IDC: Universal Digital Data Explosion Study
8 zb
A years worth of data
generated in the 90’s
is created within 1
minute in 2011 1.8 zb
0.13 zb
1990 2005 2010 2015
Gartner: Unstructured data alone will explode to 650% its present volume by 2017.
Are you positioned to take advantage of the big data predictions?
3
4. What is Big Data? Where Does it Come From?
Big Data includes both internal AND external content. Not all data must reside
internally for analysis
Data is organized and managed by its type of structure
Type of Data Structured Semi-Structured Unstructured
Short Definition Strictly meets its Has a structure but Has little to no
object definition may differ greatly structure and not
between files easily read by a
machine
Examples Relational, Flat File, Excel, Word, xml, Pdf, xray, legal
web services, … html, tweets, documents, video,
email,… im
Big Data is everywhere: Search engines, Instant Messaging, Social Media,
Legal documents and Contracts, Medical Records and test/scan outcomes,
Digital Media, Internal unstructured documents, stock tickers, press releases,
et al.
4
5. The search challenge with unstructured data:
Data Science
% of Relevant Data that are Returned
Inefficient Optimal
Worst Incomplete
% of Returned Data that are Relevant
Source - Brewster Kahle
5
6. How to Reveal the Content in Big Data and Determine
its Relevance and Confidence.
Sentiment analysis, also called text analytics, provides the ability to filter big
data to determine its relevance. (Social Media, Search engines, et al)
Happy
Capture
Sentiment Unhappy
Tweets on
Analysis
Brand X Need
Help
Textual ETL breaks down content to its granular information using taxonomies
and ontologies. (pdf, doc, swift, et al)
For Unstructured: For Semi-structured:
- stop word processing - textual structure mapping
- stemming - variable pattern recognition
- alternate spelling - variable symbol recognition
- synonym concatenation - multiple index type support
- homograph resolution - utilities including:
- spell checking - raw data hidden character display
- word and phrase proximity - multiple path processing
- final index trimming
6
7. The Value of Big Data
Data Science: To Support or To Drive?
Perform analysis & exploration of Big Data.
Analyze RAW and/or integrated data, remove ‘noise’, mine for peaks and
valleys, determine relevance and exploit the data for predictive analysis.
ROIi
Top Level: Integrate and
enrich with External Data
̶ Predictive Analysis Integrated and
& Exploration Big Data Utilization Predictive Analysis –
RAW Internal &
Reports
Drive the Business External Data
Mid Level: Integrate and
enhance proprietary Informed Integrated
data. Decisions/Insights – Internal Data &
̶ BI Reports Enhanced Support Purchased
External data
Bottom Level: Support
operational systems. Internal
Operate & Support
̶ Operational Reports Proprietary
Business Data
7
8. Big Data Architecture
Non-relational distributed file system. Can Augment existing systems.
Provides the ability to internalize Optimal big data while continuing to access
and report on external data to position for predictive analysis.
Can use open source: Hadoop, Clojure, Storm, et al. and/or an enterprise level
vendor to manage/monitor and support such as Teradata, Greeplum, Neteeza,
Exadata, etal.
Scalable and Extensible solution
MPP (Massive Parallel Processing) reduces query response and acquisition
time.
Capable of handling RAW data.
Additional benefits:
̶ increased IT agility in meeting business requirements
̶ Softens the brittleness of the data models
̶ Ability for Real time analysis
̶ Positions BI for next generation architecture
8
9. Big Data Management
As with all forms of data, a critical aspect of getting value out of big data is data
management best practices.
Data Management practices include:
̶ Data Quality & Discovery
̶ Relationship or linking algorhythyms
̶ Data Governance
̶ Confidence levels and status codes
̶ Metadata management
Information available about the data should include:
̶ Where did the data point come from?
̶ What type of cleansing/linkage or modification was performed?
̶ When did this data arrive?
̶ What is the temperature of the data?
̶ Who are the consumers of the data?
̶ When is the data required?
̶ What is the value of the data?
̶ What is it linked to?
9