Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Big data
1. Big Data
What’s the real BIG problem?
--Brian Pereira, Editor-in-chief, InformationWeek
20:20 MSL – 17-May-2013
2. Terminology
• Data – Unprocessed, captured in raw form
• Information – Processed data – meaningful, insightful
SELECT first_name, last_name FROM student_details;
• Structured Data – Databases, tabular data
(It’s searchable, you can filter it, and extract meaningful information)
• Unstructured Data – Tweets, video, social media
updates, blogs, images
3. Unstructured Data = Big Data
• Users posting comments and reviews about a particular product on
Facebook
• Users creating their own videos using their smart phones and
uploading on YouTube
• Journalists posting tweets during the launch event of a particular
product
• CC TVs at traffic intersections or in stores -- streaming video feeds
back to the server for storage
• Sensors around an aircraft or spaceship or a piece of complex
machinery -- relaying data about temperature, wind speed, fuel
levels – back to a server.
4. Big Data definition
Big Data is the trillions and quintillions of bytes
being generated, mostly in unstructured form,
by millions of users and devices
5. How BIG is Big Data?
• 90% of Digital Data generated in last two years
• 23x – The expected increase in digital data in india
during 2012 – 2020 (From 127 Exabytes to 2.9 Zettabytes)
• 12 Terabytes is the size of tweets in a day
• 5 Exabytes of data was created between the dawn of
civilization and 2003; today that much of information is
created every two days!
• 72 Hours of video uploaded to YouTube every minute!
7. What led to the Big Data explosion?
• Nexus of forces – cloud, social, mobile, information
(Intersection of cloud & mobile)
• The Internet of things (connected devices, sensors)
• Devices going digital (cameras, phones, power meters, traffic
signals etc)
• User generated content (digital photos, videos, social media,
blogs, SMS, email, tweets)
• Earlier – transaction systems for structured data – under
control by the organization
8. How much can be analysed?
0.5% or less of the digital information is
analysed in India today
36% is the size of data that technology can
analyse now
32% expected growth in the global Big Data
technology services market by 2016
9. Analyze this!
• We need to analyze all the data and look for insights that
can help us make decisions (in business)
• Can you analyze the video stream from a camera in real-
time and predict a crime?
• Can a marketer analyze all the tweets to gauge how his
customers feel about his product?
• Can sensors in a car analyze (monitoring the “health” of
different system in real-time, predict the failure of a part
– and send an SMS to the service center?
10. So what’s the real problem?
• Not all data captured is useful to business
• You need to find the right data sets in a heap
of data (harnessing the data)
• And do this fast enough to:
– make timely decisions, prevent a disaster, prevent
outage, curb negative customer sentiments on
social media
11. Examples
• Walmart and Amazon are harnessing Big Data
to improve customer service, stock better
inventory, gauge sales trands and improve
operational efficiencies
• Big Data Analytics used in genetic research, to
improve traffic management, generate alerts
on freak weather (storms), prevent crime,
improve power grid efficiencies, etc
12. Quotes
• “Big Data is contextual though in sheer numbers,
I would place the market beyond 100 TB when
‘normal’ systems start struggling bit” – Arun
Gupta, CIO, Cipla
• “Top challenges in managing the massive amount
of data are backup, security and incorporation of
unstructured data into business processes” – N
Jayantha Prabhu, CTO, Essar Group
13. Big Data facts
• Big Data (term) did not exist five years ago
• It was less than a $100 million industry in 2009
(Deloitte)
• The Big Data market is worth $5 billion (IDC)
• By 2015, Big Data revenue will touch $30 billion (IDC)
• By 2017, it will cross $54 billion (IDC)
• 300 exabytes of data is stored today (IBM)
14. Players
• Leaders – IBM, Microsoft, SAS, SAP, QlikTech,
NetApp, Teradata, EMC
• Disrupters – Amazon, Google
• US Startups – Cloudera, GoodData, Parcel
• Indian Startups – Mu Sigma, Xurmo,
Metaome, Vizury, Meshlabs
• Indian companies – TCS, Infosys, Wipro, HCL
Tech
16. Tools for analyzing data
• EMC GreenPlum
• NetApp E-Series
• IBM InfoSphere solutions
• IBM Smarter Analytics (hardware, software, services)
• SAS Business Analytics
• SAP HANA
• Oracle Big Data Appliance
• Hadoop
• MapReduce (programming model)
• QlikView BI Dashboards
• JasperSoft BI Suite
• http://www.infoworld.com/d/business-intelligence/7-top-tools-taming-big-data-
191131?page=0,2
17. The people factor
• Demand for Data Scientists, statisticians, data
architects will grow
• Skills: analytical skills, statistics
18. Related fields
• Predictive analytics
• In-memory databasesM
• Data modeling
• Data visualization
• Business Intelligence