More Related Content
Similar to Introduction to Big Data by Manouj Bongirr (20)
Introduction to Big Data by Manouj Bongirr
- 2. A Big Data - Technology, Consulting & Training Firm
-- Big Logic was founded in the US, based upon seeing the value of Apache Hadoop as it
provides a Big Data Analytics Platform.
-- At Big Logic, we share our experiences after guiding many enterprises through successful Big
Data projects. We empower you to decide on build versus buy when it comes to achieving your
defined business objectives across various technical environments.
Copyright ©2012 Big Logic Technologies
- 4. Big data is a term applied to data sets whose size is beyond the ability of commonly used
software tools to capture, manage, and process the data within a tolerable elapsed time.
Gartner Predicts
800% data
growth over next
5 years
4
Copyright ©2012 Big Logic Technologies
80-90% of data
produced today
is unstructured
- 7. gigabyte (GB)
109
1024MB
terabyte (TB)
1012
1024GB
petabyte (PB)
1015
1024TB
exabyte (EB)
1018
1024PB
zettabyte (ZB)
1021
1024EB
yottabyte (YB)
1024
1024YB
2020
35 zettabytes
i.e. 35Billion TBs
44x as much
Data and Content
Over Coming Decade
2009
800,000 petabytes
Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010
1 zettabyte = 1 099 511 627 776 GB
7
Copyright ©2012 Big Logic Technologies
- 10. “ Moore's law is the observation that, over the history of computing hardware, the
number of transistors on integrated circuits doubles approximately every two years. ”
..Intel co-founder Gordon E. Moore
Copyright ©2012 Big Logic Technologies
- 11. RAM Max Capacity : 32GB
HDD Max Size : 6TB
-------------------CPU Max Speed-------------------
Copyright ©2012 Big Logic Technologies
- 14. If I Need to process 100TB datasets
• On 1 node:
– scanning @ 50MB/s = 23 days
• On 1000 node cluster:
– scanning @ 50MB/s = 33 min
Challenge: Hardware Problems / Process and combine data from
Multiple disks
Copyright ©2012 Big Logic Technologies
- 15. •Apache Hadoop is an open source framework for storing, processing
and analysing massive amounts of multi-structured data in a
distributed environment.
•Hadoop was inspired by Google's MapReduce and Google File
System (GFS) papers.
Copyright ©2012 Big Logic Technologies
- 16. If you are in any of the above segments you would be the part of the above revenue
Copyright ©2012 Big Logic Technologies