This presentation offers a basic understanding of Big Data. It does this by defining Big Data, offers a History of Big Data, Big Data by the Numbers and the 8 Laws of Big Data
7. WHATISBIGDATA?
“Big Data Dreams”
7
Big Data /big ‘date/- Is a collection of data sets so large
and complex that it becomes difficult to process using
on-hand database management tools or traditional data
processing applications.
Challenges of big data include capture, curation, storage,
search, sharing, transfer, analysis and visualization.
8. “WHATISBIGDATA?”
“Big Data Dreams”
8
The three Vs characterize what big data is all about, and also help
define the major issues that IT needs to address:
•Volume. The massive scale and growth of unstructured data outstrips
traditional storage and analytical solutions.
•Variety. Traditional data management processes can’t cope with the
heterogeneity of big data—or “shadow” or “dark data,” such as access
traces and Web search histories.
•Velocity. Data is generated in real time, with demands for usable
information to be served up immediately.
10. “WHATISBIGDATA?(ANDWHATITISN’T)”
Big Data Analytics is…
A technology-enabled strategy for gaining richer,
deeper insights into customers, partners, and the
business—and ultimately gaining competitive
advantage.
Working with data sets whose size and variety is
beyond the ability of typical database software to
capture, store, manage, and analyze.
Processing a steady stream of real-time data in
order to make time-sensitive decisions faster than
ever before.
Distributed in nature. Analytics processing goes to
where the data is for greater speed and efficiency.
A new paradigm in which IT collaborates with
business users and “data scientists” to identify and
implement analytics that will increase operational
efficiency and solve new business problems.
Moving decision making down in the organization
and empowering people to make better, faster
decisions in real time.
Big Data Analytics Isn’t …
Just about technology. At the business level, it’s
about how to exploit the vastly enhanced sources of
data to gain insight.
Only about volume. It’s also about variety and
velocity. But perhaps most important, it’s about
value derived from the data.
Generated or used only by huge online companies
like Google or Amazon anymore. While Internet
companies may have pioneered the use of big data
at web scale, applications touch every industry.
About “one-size-fits-all” traditional relational
databases built on shared disk and memory
architecture. Big data analytics uses a grid of
computing resources for massively parallel
processing (MPP).
Meant to replace relational databases or the data
warehouse. Structured data continues to be
critically important to companies. However,
traditional systems may not be suitable for the new
sources and contexts of big data.
10
11. “WHATISBIGDATA?”
“Big Data Dreams”
11
“Every two days now we create as much information as
we did from the dawn of civilization up until 2003.
That’s something like five exabytes of data”
-Erik Schmidt, CEO of Google
By 2015 the digital universe is expected to reach 8
Zettabytes -Intel
13. “WHATISBIGDATA?”(ANDWHOWORKSIT…)
“Big Data Dreams”
A new kind of professional is helping organizations make sense of the
massive streams of digital information: the data scientist. Data scientists are
responsible for modeling complex business problems, discovering business
insights, and identifying opportunities.
They bring to the job:
•Skills for integrating and preparing large, varied data sets
•Advanced analytics and modeling skills to reveal and understand hidden
relationships
•Business knowledge to apply context
•Communication skills to present results
13
30. “BYTHENUMBERS”
“Big Data Dreams”
•More sources and more devices
•Mobile
•Pictures
•Video
•SMS
•GPS
•Social Media
•Facebook
•Twitter
•Youtube
•Reviews
•Automated Sources
•RFID
•Telemetry
•Security cameras
30
Real-time
correlation of data
can be turned into
golden nuggets of
information.
31. “8LAWSOFBIGDATA”
“Big Data Dreams”
31
Big Data Law #1
The Faster You Analyze Your Data, the
Greater its Predictive Power.
Great list developed by Dave Feinleib – Managing Director of Big Data Group.
39. Q&A
“Big Data Dreams”
39
Q&A
Michael C. DeAloia
Regional Vice President
Expedient Data Centers
(M) 216.212.4067
(E) michael.dealoia@expedient.com
Notas del editor
Gartner defines a strategic technology as one with the potential for significant impact on the enterprise within the next 3 years. Factors that denote high potential for significant impact include: -High potential for disruption to IT or the biz -Need for a major dollar investment -Risk being late to adopt
Your new best friend the Data Scientist.
Humans have been whining about being bombarded with too much information since the advent of clay tablets. The complaint in Ecclesiastes that “of making many books where there is no end” resonated in the Renaissance when the invention of the printing press flooded Western Europe with what an alarmed Erasmus called, “swarms of new books.” US Census of 1880 50 million people. On Average the census had taken the U.S. eight years to complete.
US Census of 1890 Used the Hollerith Tabulating System that utilized punched cards with 80 variables on them. Takes the U.S. one year to complete the census instead of eight years.
In 1935, President FDR’s Social Security Act launches the U.S. gov on its most ambitious data gathering project ever, as IBM wins a gov. contract to keep employment records on 26 million working Americans and 3 million employers.
In 1943, At Bletchley Park, a British facility in WW II dedicated to breaking the Nazi codes, engineers develop a series of groundbreaking mass data-processing machines, culminating in the first programmable electronic computer. The device – Colossus – can read paper tape at 5,000 characters a second. Reducing decoding from weeks to a few hours.
In 1961, NSA (a nine-year old intelligence agency) has hired 12,000 cryptologists confronts information overload during the Cold War as it begins collecting and processing signals automatically with computers while struggling to digitize its backlog of records. In July of 1961, NSA receives 17,000 reels of tape.
In 1989 British computer scientist Tim Berners-Lee proposes leveraging the Internet to share information globally through a hyper text system called the World Wide Web. He is quoted as saying, “The information contained would grow past a critical threshold, so that the usefulness of the scheme would in turn encourage its increased use.”
In 1997, NASA researchers – Michael Cox and David Ellsworth – use the term “big data” for the first time to describe a familiar challenge in the 1990’s supercomputers generating massive amounts of information.
Retailers are beginning to amass information on customers shopping and personal habits. At the end of 2004, Wal Mart boasts a cache of 460 terabytes – more than double the amount of data on the Internet at the time.
In 2009, India establishes the Unique Identification Authority of India – to fingerprint, photograph and take an iris scan of all 1.2 billion people in the country and assign each person a 12-digit ID number, funneling all the data into the world’s largest database.
Scanning 200 million pages of information or 4 terabytes of disk storage in a matter of seconds, IBM’s Watson computer system defeats two human challengers in the quiz show Jeopardy. The New York Times later dubs this a moment a “triumph of Big Data computing.”
In March of 2012 the Obama administration announces a $200 million Big Data Research and Development Initiative calling for every federal agency to have a “big data strategy.”
Akami analyzes 75 million events per day to better target advertisements.
Processes and mines petabytes of user data to power “people you may know.”
Processed over 4TB worth of raw images into 11 million finished PDF’s in 24 hours.
Decoding the human genome used to take ten years and now it can be done in seven days.
It’s systems process 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day. Has over a 100 petabytes of data stored in a single hadoop disk cluster. Company believes it has the single larges hadoop cluster in the world.
Companies are moving away rapidly from batch processing to real-time to gain a competitive advantage.
The more you copy and move your data, the less reliable it becomes.
More diverse data leads to greater insights. Combining multiple data sources can lead to the most interesting insights of all.
Don’t throw your data away.
The number of photos, emails, and IM’s while large is limited by the number of people. Networked “sensors” data from mobile phones, GPS and other devices is much larger.
Don’t think of big data as a stand alone new shiny technology. Think about your core business problems and how to solve them by analyzing big data.
More data alone isn’t sufficient. Look for ways to broaden the use of data across your organization.
Those that fail to leverage the numerous internal and external data sources available will be leapfrogged by new entrants.