SlideShare a Scribd company logo
1 of 13
BigData
Boom
Volume |Velocity |Variety
What |Why |When
Author
• Astute corporate resource with 10+ years of corporate experience with emphasis on database management, programming, software
development, testing, web technologies and product improvement for corporations. Combines expert software and database management
expertise with strong qualifications in Software, Data Engineering & Information Management.
Concurrently, manage all the database functions for the current company. Industry experience in Information Technology. Strong
understanding of the complex challenges in Software Development and problem troubleshooting. An expert on identifying and solving
problems, gaining new business contacts, reducing costs, coordinating staff and evaluating performance. Professional traits include;
problem-solving, decision-making, time management, multitasking, analytical thinking, effective communication, and computer
competencies.
• Oracle Certified Professional OCA on 9i
• Oracle Certified Professional OCP on 9i
• Oracle Certified Professional OCP on 10g
• Oracle Certified Professional OCP on 11g
• Oracle Certified Professional OCP on 12c
• Oracle Certified Professional OCP on MySQL 5
• Oracle Certified Professional OCE on 10g managing on Linux
• Oracle Certified Professional OCP on E-Business Apps DBA
• Microsoft Certified Technology Specialist on SQL Server 2005
• Microsoft Certified Technology Specialist on SQL Server 2008
• Microsoft Certified IT Professional on SQL Server 2005
• Microsoft Certified IT Professional on SQL Server 2008
• Sun Certified Java Programmer 5.0
• IBM Certified Database(DB2) Associate 9.0
• ITIL V3 Foundation Certified
• COBIT 5 Foundation Certified
• PRINCE2 Foundation Certified
Agenda
• What is Big Data
• Why Big Data
• When Big Data
• Traditional Databases
• Hadoop
• Hadoop Projects
• BigData andTPL Holdings
• Hadoop Distributions
By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
What is Big Data ?
• Big data is an all-encompassing term for any collection of data sets
so large and complex that it becomes difficult to process using
traditional data processing applications.The challenges include analysis,
capture, search, sharing, storage, transfer, visualization, and privacy
violations.
• Definition of Big Data as the threeVs -Volume ,Velocity andVariety.
• Big data is data sets that are so voluminous and complex that traditional
data processing , application software are inadequate to deal with them.
Big data challenges include capturing data, data storage, data analysis,
search, sharing, transfer, visualization, querying , updating, information
privacy and data source.There are a number of concepts associated with
big data: originally there were 3 concepts volume, variety, velocity. Other
concepts later attributed with big data are veracity ( Wikipedia )
By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
What is Big Data ?
• Volume. Many factors contribute to the increase in data volume.Transaction-
based data stored through the years. Unstructured data streaming in from
social media. Increasing amounts of sensor and machine-to-machine data being
collected. In the past, excessive data volume was a storage issue. But with
decreasing storage costs, other issues emerge, including how to determine
relevance within large data volumes and how to use analytics to create value
from relevant data.
• Velocity. Data is streaming in at unprecedented speed and must be dealt with in
a timely manner. RFID tags, sensors and smart metering are driving the need to
deal with torrents of data in near-real time. Reacting quickly enough to deal
with data velocity is a challenge for most organizations.
• Variety. Data today comes in all types of formats. Structured, numeric data in
traditional databases. Information created from line-of-business applications.
Unstructured text documents, email, video, audio, stock ticker data and
financial transactions. Managing, merging and governing different varieties of
data is something many organizations still grapple with.
By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
Why Big Data
• The hopeful vision is that organizations will be able to take data from any
source and make it to the actionable or harness relevant data and analyze it
to find answers that enable
• 1) Overall Cost reductions
• 2)Time reductions
• 3) New products development and optimized offerings
• 4) Smarter business decision making. For instance, by combining big data and high-
powered analytics
• 5)Faster Resolutions
By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
When Big Data ?
• It depends on the requirement of the organization and the available
organization data as we explain earlier about the 3Vs.
• The real issue is not that you are acquiring large amounts of data. It's what you
do with the data that counts.
• What actions you can take with the huge data stream.
• Industry leader like China Mobile which have 7 tera bytes per Day and the
Facebook which have 10 tera bytes per Day.
• Analysis on calls records.
• Analysis on sentiments.
• Analysis on weather information.
• Analysis on vehicles traffic and location trend.
• Analysis on years of SalesTrend , target and glitches.
• Analysis on biological data for example DNA , RNA etc.
• Analysis on Customers Information
• Analysis on Operating System and Hardware logs to prevent the attacks and
take the actions before the actual failure will be occur
• And much more.
By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
Traditional Databases and Hadoop
• Mr. AhmedWaleed has describe very well regarding the difference between RDBMS and
Hadoop , www.w3trainingschool.com
• Like Hadoop, traditional RDBMS cannot be used when it comes to process and store a large
amount of data or simply big data. Following are some differences between Hadoop and
traditional RDBMS.
• DataVolume
• Data volume means the quantity of data that is being stored and processed. RDBMS works better when
the volume of data is low(in Gigabytes). But when the data size is huge i.e, inTerabytes and Petabytes,
RDBMS fails to give the desired results.
• On the other hand, Hadoop works better when the data size is big. It can easily process and store large
amount of data quite effectively as compared to the traditional RDBMS.
• Architecture
• If we talk about the architecture, Hadoop has the following core components:
• HDFS(Hadoop Distributed File System), Hadoop Map Reduce(a programming model to process large data
sets) and HadoopYARN(used to manage computing resources in computer clusters).
• Traditional RDBMS possess ACID properties which are Atomicity,Consistency, Isolation, and Durability.
• These properties are responsible to maintain and ensure data integrity and accuracy when a transaction
takes place in a database.
• These transactions may be related to Banking Systems, Manufacturing Industry,Telecommunication
industry,Online Shopping, education sector etc.
• Throughput
• Throughput means the total volume of data processed in a particular period of time so that the output is
maximum. RDBMS fails to achieve a higher throughput as compared to the Apache Hadoop Framework.
• This is one of the reason behind the heavy usage of Hadoop than the traditional Relational Database
Management System.
• Data Variety
• Data Variety generally means the type of data to be processed. It may be structured, semi-structured and
unstructured.
• Hadoop has the ability to process and store all variety of data whether it is structured, semi-structured or
unstructured. Although, it is mostly used to process large amount of unstructured data.
• Traditional RDBMS is used only to manage structured and semi-structured data. It cannot be used to manage
unstructured data. So we can say Hadoop is way better than the traditional Relational Database Management
System.
• Latency/ ResponseTime
• Hadoop has higher throughput, you can quickly access batches of large data sets than traditional RDBMS, but you
cannot access a particular record from the data set very quickly. Thus Hadoop is said to have low latency.
• But the RDBMS is comparatively faster in retrieving the information from the data sets. It takes a very little time to
perform the same function provided that there is a small amount of data.
• Scalability
• RDBMS provides vertical scalability which is also known as ‘Scaling Up’ a machine. It means you can add more
resources or hardwares such as memory, CPU to a machine in the computer cluster.
• Whereas, Hadoop provides horizontal scalability which is also known as ‘Scaling Out’ a machine. It means adding
more machines to the existing computer clusters as a result of which Hadoop becomes a fault tolerant. There is no
single point of failure. Due to the presence of more machines in the cluster, you can easily recover data irrespective of
the failure of one of the machines.
• Data Processing
• Apache Hadoop supports OLAP(Online Analytical Processing), which is used in Data Mining techniques.
• OLAP involves very complex queries and aggregations. The data processing speed depends on the amount of data
which can take several hours. The database design is de-normalized having fewer tables. OLAP uses star schemas.
• On the other hand, RDBMS supports OLTP(Online Transaction Processing), which involves comparatively fast query
processing. The database design is highly normalized having a large number of tables. OLTP generally uses 3NF(an
entity model) schema.
• Cost
• Hadoop is a free and open source software framework, you don’t have to pay in order to buy the license of the
software.
• Whereas RDBMS is a licensed software, you have to pay in order to buy the complete software license.
• We have provided you all the probable differences between Big Data Hadoop and traditional RDBMS. Hope you
enjoyed reading the blog.
Hadoop
• The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers using
simple programming models. It is designed to scale up from single servers
to thousands of machines, each offering local computation and storage.
Rather than rely on hardware to deliver high-availability, the library itself is
designed to detect and handle failures at the application layer, so delivering
a highly-available service on top of a cluster of computers, each of which
may be prone to failures
By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
Hadoop Projects
• Hadoop Common:The common utilities that support the other Hadoop
modules.
• Hadoop Distributed File System (HDFS™): A distributed file system that
provides high-throughput access to application data.
• HadoopYARN:A framework for job scheduling and cluster resource
management.
• Hadoop MapReduce:AYARN-based system for parallel processing of large
data sets.
By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
Hadoop Distributions
• Cloudera Enterprise
• www.cloudera.com OnlineTraining Available
• Hortonworks Enterprise
• www.hortonworks.com OnlineTraining Available
• Map R Enterprise
• www.mapr.com only Classroom training availables
By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
Cloudera, Hortonworks and MapR Fight for
Hadoop Supremacy
• Who's going to win, Cloudera, Hortonworks or MapR? All three are battling
for Hadoop supremacy in terms of prominent customers, funding and
market share.
• The latest blow was figuratively struck by Cloudera as Intel yesterday
announced it was quitting on its own distribution and joining forces with
the Hadoop pioneer.
• http://adtmag.com/blogs/dev-watch/2014/03/hadoop-war.aspx
By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2

More Related Content

What's hot

Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architecture
Costa Pissaris
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Erik Fransen
 

What's hot (20)

Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
 
Building the Modern Data Hub
Building the Modern Data HubBuilding the Modern Data Hub
Building the Modern Data Hub
 
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architecture
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big Thing
 
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Slides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data ArchitectureSlides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data Architecture
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words Matter
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 

Similar to Big Data Boom

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 

Similar to Big Data Boom (20)

Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
Big Data
Big DataBig Data
Big Data
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Big Data Boom

  • 2. Author • Astute corporate resource with 10+ years of corporate experience with emphasis on database management, programming, software development, testing, web technologies and product improvement for corporations. Combines expert software and database management expertise with strong qualifications in Software, Data Engineering & Information Management. Concurrently, manage all the database functions for the current company. Industry experience in Information Technology. Strong understanding of the complex challenges in Software Development and problem troubleshooting. An expert on identifying and solving problems, gaining new business contacts, reducing costs, coordinating staff and evaluating performance. Professional traits include; problem-solving, decision-making, time management, multitasking, analytical thinking, effective communication, and computer competencies. • Oracle Certified Professional OCA on 9i • Oracle Certified Professional OCP on 9i • Oracle Certified Professional OCP on 10g • Oracle Certified Professional OCP on 11g • Oracle Certified Professional OCP on 12c • Oracle Certified Professional OCP on MySQL 5 • Oracle Certified Professional OCE on 10g managing on Linux • Oracle Certified Professional OCP on E-Business Apps DBA • Microsoft Certified Technology Specialist on SQL Server 2005 • Microsoft Certified Technology Specialist on SQL Server 2008 • Microsoft Certified IT Professional on SQL Server 2005 • Microsoft Certified IT Professional on SQL Server 2008 • Sun Certified Java Programmer 5.0 • IBM Certified Database(DB2) Associate 9.0 • ITIL V3 Foundation Certified • COBIT 5 Foundation Certified • PRINCE2 Foundation Certified
  • 3. Agenda • What is Big Data • Why Big Data • When Big Data • Traditional Databases • Hadoop • Hadoop Projects • BigData andTPL Holdings • Hadoop Distributions By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
  • 4. What is Big Data ? • Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.The challenges include analysis, capture, search, sharing, storage, transfer, visualization, and privacy violations. • Definition of Big Data as the threeVs -Volume ,Velocity andVariety. • Big data is data sets that are so voluminous and complex that traditional data processing , application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying , updating, information privacy and data source.There are a number of concepts associated with big data: originally there were 3 concepts volume, variety, velocity. Other concepts later attributed with big data are veracity ( Wikipedia ) By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
  • 5. What is Big Data ? • Volume. Many factors contribute to the increase in data volume.Transaction- based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data. • Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations. • Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with. By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
  • 6. Why Big Data • The hopeful vision is that organizations will be able to take data from any source and make it to the actionable or harness relevant data and analyze it to find answers that enable • 1) Overall Cost reductions • 2)Time reductions • 3) New products development and optimized offerings • 4) Smarter business decision making. For instance, by combining big data and high- powered analytics • 5)Faster Resolutions By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
  • 7. When Big Data ? • It depends on the requirement of the organization and the available organization data as we explain earlier about the 3Vs. • The real issue is not that you are acquiring large amounts of data. It's what you do with the data that counts. • What actions you can take with the huge data stream. • Industry leader like China Mobile which have 7 tera bytes per Day and the Facebook which have 10 tera bytes per Day. • Analysis on calls records. • Analysis on sentiments. • Analysis on weather information. • Analysis on vehicles traffic and location trend. • Analysis on years of SalesTrend , target and glitches. • Analysis on biological data for example DNA , RNA etc. • Analysis on Customers Information • Analysis on Operating System and Hardware logs to prevent the attacks and take the actions before the actual failure will be occur • And much more. By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
  • 8. Traditional Databases and Hadoop • Mr. AhmedWaleed has describe very well regarding the difference between RDBMS and Hadoop , www.w3trainingschool.com • Like Hadoop, traditional RDBMS cannot be used when it comes to process and store a large amount of data or simply big data. Following are some differences between Hadoop and traditional RDBMS. • DataVolume • Data volume means the quantity of data that is being stored and processed. RDBMS works better when the volume of data is low(in Gigabytes). But when the data size is huge i.e, inTerabytes and Petabytes, RDBMS fails to give the desired results. • On the other hand, Hadoop works better when the data size is big. It can easily process and store large amount of data quite effectively as compared to the traditional RDBMS. • Architecture • If we talk about the architecture, Hadoop has the following core components: • HDFS(Hadoop Distributed File System), Hadoop Map Reduce(a programming model to process large data sets) and HadoopYARN(used to manage computing resources in computer clusters). • Traditional RDBMS possess ACID properties which are Atomicity,Consistency, Isolation, and Durability. • These properties are responsible to maintain and ensure data integrity and accuracy when a transaction takes place in a database. • These transactions may be related to Banking Systems, Manufacturing Industry,Telecommunication industry,Online Shopping, education sector etc. • Throughput • Throughput means the total volume of data processed in a particular period of time so that the output is maximum. RDBMS fails to achieve a higher throughput as compared to the Apache Hadoop Framework. • This is one of the reason behind the heavy usage of Hadoop than the traditional Relational Database Management System.
  • 9. • Data Variety • Data Variety generally means the type of data to be processed. It may be structured, semi-structured and unstructured. • Hadoop has the ability to process and store all variety of data whether it is structured, semi-structured or unstructured. Although, it is mostly used to process large amount of unstructured data. • Traditional RDBMS is used only to manage structured and semi-structured data. It cannot be used to manage unstructured data. So we can say Hadoop is way better than the traditional Relational Database Management System. • Latency/ ResponseTime • Hadoop has higher throughput, you can quickly access batches of large data sets than traditional RDBMS, but you cannot access a particular record from the data set very quickly. Thus Hadoop is said to have low latency. • But the RDBMS is comparatively faster in retrieving the information from the data sets. It takes a very little time to perform the same function provided that there is a small amount of data. • Scalability • RDBMS provides vertical scalability which is also known as ‘Scaling Up’ a machine. It means you can add more resources or hardwares such as memory, CPU to a machine in the computer cluster. • Whereas, Hadoop provides horizontal scalability which is also known as ‘Scaling Out’ a machine. It means adding more machines to the existing computer clusters as a result of which Hadoop becomes a fault tolerant. There is no single point of failure. Due to the presence of more machines in the cluster, you can easily recover data irrespective of the failure of one of the machines. • Data Processing • Apache Hadoop supports OLAP(Online Analytical Processing), which is used in Data Mining techniques. • OLAP involves very complex queries and aggregations. The data processing speed depends on the amount of data which can take several hours. The database design is de-normalized having fewer tables. OLAP uses star schemas. • On the other hand, RDBMS supports OLTP(Online Transaction Processing), which involves comparatively fast query processing. The database design is highly normalized having a large number of tables. OLTP generally uses 3NF(an entity model) schema. • Cost • Hadoop is a free and open source software framework, you don’t have to pay in order to buy the license of the software. • Whereas RDBMS is a licensed software, you have to pay in order to buy the complete software license. • We have provided you all the probable differences between Big Data Hadoop and traditional RDBMS. Hope you enjoyed reading the blog.
  • 10. Hadoop • The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
  • 11. Hadoop Projects • Hadoop Common:The common utilities that support the other Hadoop modules. • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. • HadoopYARN:A framework for job scheduling and cluster resource management. • Hadoop MapReduce:AYARN-based system for parallel processing of large data sets. By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
  • 12. Hadoop Distributions • Cloudera Enterprise • www.cloudera.com OnlineTraining Available • Hortonworks Enterprise • www.hortonworks.com OnlineTraining Available • Map R Enterprise • www.mapr.com only Classroom training availables By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2
  • 13. Cloudera, Hortonworks and MapR Fight for Hadoop Supremacy • Who's going to win, Cloudera, Hortonworks or MapR? All three are battling for Hadoop supremacy in terms of prominent customers, funding and market share. • The latest blow was figuratively struck by Cloudera as Intel yesterday announced it was quitting on its own distribution and joining forces with the Hadoop pioneer. • http://adtmag.com/blogs/dev-watch/2014/03/hadoop-war.aspx By JBH Syed| BSCS | MSDEIM | MCTS | MCITP | OCA | OCP | OCE | SCJP | ITL V3F | COBIT 5F | PRINCE2