As a big data technologist, you can bet that you have heard it all: every crazy claim, myth, and outright lie about what big data is and what it isn't that you can imagine, and probably a few that you can't.If your company has a big data initiative or is considering one, you should be aware of these false statements and the reasons why they are wrong.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Why Everything You Know About bigdata Is A Lie
1. Why Everything You Know About bigdata
Is A Lie
-Delivering Data Driven Business Insights
Adopt
MarketInnovate
Sunil S Ranka
Director – Big Data and Advance Analytics
2. Key Topics
About Jade
About Me
What is Big Data
Key Myths
Why everything is Lie
Real World Example
Next Steps
4. Services High-Tech Manufacturing Energy Social Media & Entertainment
5 Global Delivery
Centers
8 Offices
Worldwide
Atlanta
Pune
Noida
San Jose
Los Angeles
London
Hyderabad
San Diego
Global Delivery Model Serving Many Industries
5. Strategic
Partnerships
Salesforce.com
Sales, Service, Marketing,
force.com
Testing
Tools/Frameworks
QC, QTP, Selenium, LoadRunner,
JIRA Bugzilla, JUnit, TestNG
Microsoft
Dynamics, SharePoint,
Office 365, Lync, BI
Custom Development
Java, .Net, J2EE, Product
Engineering, Open Source
Technologies
Integration
Oracle SOA, Tibco, Weblogic,
Oracle Cloud Platform, ICS, JCS,
Mulesoft, Dell Boomi
Infrastructure
Management
IBM AIX, HP-UX, RHEL,
OEL Linux, Windows Server
Cloud Financials, Projects, SCM,
HCM and EBS Financials,
Procurement, Value Chain, CRM,
Demantra, Agile, GRC
Oracle EBS Suite
ServiceNow
IT Service Automation Applications,
CreateNow Development Suite,
Orchestration, Discovery
Big Data & Analytics
Hadoop, KNIME, R, Tableau, Hadoop
9. About Me
• Venture Partner : Investing and Advisor with early stage startups focusing on Data.
• Director – Big Data and Advance Analytics
• Oracle ACE (Business Intelligence with Proficiency in Big Data)
• Extensively worked with fortune 500 leaders.
• Held positions of Head Of Product Development, Architect, etc.
• http://sranka.wordpress.com, sunil_ranka
• Featured Tech writer for IT Next Magazine.
• Speaking engagements at following conferences :
• COLLABORATE ( 2009, 2010 , 2011 ,2012, 2013,2015)
• BIWA SIG TechCast Series (2010 , 2011 , 2012, 2013,2014,2016),
• NorCal OAUG-2010 at Santa Clara Convention Center, CA
• Session speaker at NoCouG in San Francisco
• Oracle Open World ( 2009 , 2010 , 2012)
My Tag Line :: “Superior BI is the antidote to Business Failure”
11. Data is the new Oil. Data is just like crude. It’s
valuable, but if unrefined it cannot really be used.
– Clive Humby, DunnHumby
11
We have for the first time an economy based on
a key resource [Information] that is not only renewable,
but self-generating. Running out of it is not a problem,
but drowning in it is.
– John Naisbitt
12. Big Data and Analytics is Helping
Smarter Revenue
Management
Smarter Healtcare
Analytics
$16Billion
Reduced
Improper Payment
Smarter Crime
Prevention
Helps detect life
threatening conditions
up to 24 hours sooner
30%
Cut
serious crime
by
Tax Agency
* Courtesy - IBM
13. Analytics Maturity Pyramid
No Reporting
Struggling to get basic information
Reactive Analytics
Concerned with current Issues
What Happened ?
Diagnostic Analytics
Hindsight
Why it Happened ?
Predictive Analytics
Insight
What will Happened?
Prescriptive Analytics
Foresight
What should I do ?
14. What is Big Data
Big data Represents new data features created by today’s Data Driven Organization for Decision
Making
volume
Variety
Velocity
Value
Data At Scale
Terabyte To Petabyte of Data
Data In Many Forms
Structured, unstructured, text, Media
Data In Motion
Analysis of stream data to make decision in real time
Data with Insight
Deriving valuable insight from the data
Characteristicsofbigdata
15. Harnessing Big Data
OLTP: Online Transaction Processing (DBMSs)
OLAP: Online Analytical Processing (Data Warehousing)
RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
15
16. Who’s Generating Big Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
The progress and innovation is no longer hindered by the ability to
collect data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable
fashion
16
17. The Model Has Changed…
The Model of Generating/Consuming Data has
Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
17
19. Myths
Big data will change everything.
Big data means 'a lot' of data
Data lake is big Data
Hive can be used for reporting
Big Data is Only for Large Corporations
You Need to Hire a Big Data Scientist to Start With Big Data
Big Data Technology Will Eliminate the Need for Data
Integration
The only cost for big data is hardware and software.
20. Myths Continues…
Big data applications require little or no performance optimization.
I don’t have enough data for big data.
Big Data predicts the future.
Hadoop is the Holy Grail of big data.
Big data is an IT matter.
Data warehouses aren’t needed for advanced analytics.
Hadoop will replace enterprise data warehouses
With huge volumes of data, small data quality issues are acceptable
22. Big Data Needs Diversified Skill Sets
Math and
Operations Research
Expertise
Develop analytic algorithms
Visualization
Expertise
Interpret data sets,
determine correlations and
present in meaningful ways
Tool Developers
Mask complexity and
analytics to lower skills
boundaries
Industry Vertical
Domain Expertise
Develop hypothesis, identify
relevant business issues,
ask the right questions
Data Experts
Data architecture, management,
governance, policy
Decision Making
Executive and
Management
Apply information to solve
business issues
"By 2015, big data demand will reach 4.4 million jobs globally, but only one-third of those jobs will be filled."
Source: Gartner "Gartner's Top Predictions for IT Organizations and Users, 2013 and Beyond: Balancing Economics, Risk, Opportunity and Innovation" 19 Oct 2012
23. Industry Implementation Trends
Hybrid Approach
(Large Enterprises)
• Building Hybrid environments as
they want to leverage their existing
investments in their traditional
environments
• Setting up their own internal cloud
environments for security,
regulatory issues as well as to
achieve cloud benefits of simplicity
and elasticity
Migrating Legacy Applications
(Medium Enterprises)
• All new investments are in the
cloud
• Migrating existing on premise to
cloud based on ROI & Business
Objectives
Starting with Cloud
(Small & Startup Enterprises)
• Embracing cloud as they do not
have any legacy systems
24. Different Phase
• Expand to multiple usecase
• Establish IT SLAs, ROI Metrics and growth Plans
• Expand to more advanced predictive capabilities
• Enable a platform capable of managing greater
volumes and variety of data
• Look to partners to simplify and modernize existing
platform with cost-effective delivery models
• Optimize and integrate apps on converged data
platform
• Establish digital business practices as the new normal
supported by all key executive sponsors
• Provide detailed business SLAs, revenue targets, and
other financial targets
• Normalize data lifecycle/governance, data
monetization, microservice development
• Work with Business and identify usecase
• Commit dedicated resources to development and
operations
• Develop an agile project plan
• Educate business users on analytics
• Accelerate analytics knowledge and skills required to
support to value creation
• Use partners to supplement analytic skills gaps
•Understanding capability of big data ecosystem
•Develop Basic Skills in big Data Management
•Create a Pilot Use Case
•Establish leadership commitment
•Establish working infrastructure
Phase1
(Experimental)
Phase2
(Implementation)
Phase3
(Expansion)
Phase4
(Optimization)
26. Data Lake Reference Architecture
Data Lake
Measure
Normalization
and
integration
Master
Metadata
Feature
Surrogate
Keys
Key
Exists
Exception
Handling
Feature DataSet
Customer
Institution
Accounts
Measure Data Set
Key
Accounts
Partnership
Sales
GL
Margins
Derived/Aggregated Fact
Gross Margin
Aggregates
Unified
Customer View
Unified Sales
Views
Unified Partner
Views
Data Staging
Company 1
Data
Company 2
Data
Company 3
Data
Company 4
Data
Predictive Analytics Layer
(Machine Learning)
Predictive Analytics Outcome
- Customer Retention
- Cross Sell Up Sell
- Customer Segmentations
- Customer 360
- Revenue Forecast
- Customer Churn
Reusable
Jade
Connectors
Data Service
Layer
Real-Time
Analytics
Hour/Daily
Report
Weekly/
Monthly
Report
API Layer
Reporting
Layer
Data Lake
27. Consumption
Zone
Source
System
File Data
OB Data
ETL Extracts
Streaming
Transient
Loading Zone
Raw Data
Refined
Data
Trusted
Data
Discovery
Sandbox
Original unaltered
data attributes
Tokenized Data
APIs
Reference Data Master Data
Data Wrangling
Data Discovery
Exploratory Analytics
Metadata Data Quality Data Catalog Security
Hadoop Data Lake
Integrate to
common format
Data Validation
Data Cleansing
Aggregations
OLP or ODS
Enterprise Data
Warehouse
Logs
(or other unstructured
data)
Cloud Services
Business Analysts
Researchers
Data Scientists
Data Lake Reference Architecture
30. Analytics Cloud/OnPrem
Data Cloud/OnPrem
Hive Metastore
Elastic Cloud HDFS
Infinite Compute
Hadoop/Spark
Ingest Transform Analyze
External
Dashboards
Internal
Dashboards
Tableau Excel R Zeppelin
Web interface for distributed users
Data set definition
Social metadata dictionary
Export Web interface to dash-
boarding, query, and
data dictionary
Integrated ingestion,
transformation, and
query application for
business analysts
World-class, elastic
Big Data infrastructure
Hybrid Analytics Cloud/On Premises
31. Analytics Cloud/OnPrem
Analytics Cloud/OnPrem
Hive Metastore
Elastic Cloud HDFS
Infinite Compute
Hadoop/Spark
External
Dashboards
Internal
Dashboards
Tableau Excel R Zeppelin
Web interface for distributed users
Data set definition
Social metadata dictionary
Export Web interface to dash-
boarding, query, and data
dictionary
Integrated ingestion,
transformation, and query
application for business
analysts
World-class, elastic
Big Data infrastructure
Build reports
and
dashboards
Build outgoing
connectors
Ingest Transform Analyze
Business
Analytics, data
science
training
Write ETL and
perform data
engineering
Build
connectors
Hybrid Analytics Cloud/OnPrem
Oil which is the fuel for modern economy for centuries. However, Oil in its raw form has little value. It needs to be refined and separated into a large number of consumer products, from petrol and kerosene to asphalt and chemical reagents used to make plastics and pharmaceuticals. It is also used in manufacturing a wide variety of materials.
Big Data is just like oil, in it’s raw form it provide no value to enterprise, until it is processed and valuable and actionable business insights are “distilled”.
Just like the technology that made available 100 years ago to discover oil and process it to consumable products. Big Data technology is going to transform and revolutionize the way enterprise get and use.