SlideShare a Scribd company logo
1 of 35
© Orzota, Inc. 2013
Big Data, Hadoop, NoSQL and
more …
Varad Meru
Software Development Engineer, Orzota, Inc.
varad@orzota.com
in.linkedin.com/in/vmeru
@vrdmr
© Orzota, Inc. 2013 2
 Mission: Make big data easy for consumption
 Offers Big Data/Hadoop Solutions and Software
Services to companies
 Develops Software to help companies consume Big
Data
 Founded in March 2012
 Headquartered in Silicon Valley, California
 Offshore offices in Chennai, India
About Orzota
© Orzota, Inc. 2013 3
 We work on
o Big Data
o Hadoop
o Cloud Technologies
o Data Science
o Products and Services
o Everything that it takes to be a valued Player.
About Orzota (contd.)
© Orzota, Inc. 2013 4
 Community Development
 Occasional seminars by Architects, Engineers,
Managers.
 We invite professionals and aspiring professionals to
join Big Data / Hadoop communities in their
geographies.
 Pune Hadoop User Group – Participant + Organizer.
 Chennai Hadoop User Group – Participant + Sponsor.
About Orzota (contd.)
© Orzota, Inc. 2013 5
About Me
• Orzota, Inc.
• Currently working with
Hadoop, Mahout, Cloud, etc.
• Past Work Experience
• Persistent Systems – Search,
Recommendation Engines and
User Behavior Analytics.
• Area of Interest
• Data Science, Information
Retrieval
• Distributed Systems
6
© Orzota, Inc. 2013
7
© Orzota, Inc. 2013
Some of the Innovation
Centers in
Technological World
Agenda
• Introduction to BigData
• Technologies and Domain
• Hadoop EcoSystem
• Introduction to MapReduce
• Architecture – HDFS + MapReduce.
• NoSQL Databases
• CAP Theorem
• Different NoSQL Databases
• Other Trends
© Orzota, Inc. 2013 8
© Orzota, Inc. 2013
Big Data
9
• What is Big Data?
• What does it mean to me?
• Why so much fuss in the industry?
• Who uses these technologies?
• How are they used in the Industry and Academia?
• When to start using them?
• How to learn them?
10
Big Data
© Orzota, Inc. 2013
• Volume - Amassing terabytes—even petabytes—of information.
• 12 terabytes of Tweets created each day.
• 350 billion annual meter readings.
• Velocity - Sometimes 2 minutes is too late.
• Scrutinize 5 million trade events.
• 500 million daily call detail records
• Variety - Big data is any type of data.
• 80% data growth in images, video and documents.
“Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of
processing to enable enhanced decision making, insight discovery and process optimization.”
– Laney Douglas. "The Importance of 'Big Data': A Definition"
11
Big Data – 3 Vs
© Orzota, Inc. 2013
Problem
12
• Store and Process Data for -
• Search Engines,
• Recommendations Engines,
• Fraud Detection,
• Aadhar (Govt. of India),
• Spam Detection, etc.
• Also, in some cases Real-time (e.g. Facebook)
© Orzota, Inc. 2013
Solutions ?
13
• Classical Solutions
• Database + Programming Language (Java-Oracle, C#-
SQL Server)
• Data Warehouses – Teradata, Netezza, Microsoft PDW
• Legacy Network Systems
• Novel
• CORBA
• Java RMI – RPC
© Orzota, Inc. 2013
Problems of the Solutions
14
• Problems with Classical Solutions
• CAP Theorem, by Prof. Eric Brewer (Berkeley) –
• Choose any 2 between
Consistency, Availability and Partitioning
• ACID Properties
• For Small number of Transactions, cumulative overhead still
manageable.
• For Very large number of Transactions – Facebook Posts?
• Very High Licensing Fees.
• Closed Source – Stick with the Company’s Eco-System.
© Orzota, Inc. 2013
Solution to the Problems of the
Solutions
15
• Focus on Problem Domain
• What’s more important for your Solution?
• Consistency, Availability, and Partitioning
• Which Industry/Company already face similar
Problems?
• How/Where to Collect Data?
• Technology Fields – Internet Companies
• Hadoop, NoSQL Datastores
• Open Source, Free and with Friendly Licenses.
© Orzota, Inc. 2013
© Orzota, Inc. 2013
Hadoop Eco-System
16
Introduction
17
• Started by Doug Cutting and Mike Caferella for Nutch –
Open Search Engine.
• Further Developed at Yahoo!, Facebook and contributed
by people from many companies.
• Named after a Little Toy Elephant owned by Doug’s Son.
• Inspired by 2 research papers from Google
• The Google File System – 2003
• MapReduce – 2004
© Orzota, Inc. 2013
Introduction (contd.)
18
• Contains 3 modules
• Distributed File System
• MapReduce
• Commons (A Java library containing common functions
used by both DFS and MapReduce)
• Apache Top Level Project
• Hadoop’s Website – hadoop.apache.org
• Two Parallel Release Cycles – 1.x and 2.x
© Orzota, Inc. 2013
19
• A Rich Eco-System built around Hadoop
• Hive – Large Scale Data Warehouse
• Hbase – NoSQL Database
• Pig – A Data-flow language on top of Hadoop
• Flume – Log Management for Hadoop
• Oozie –Workflow framework
• Mahout – Machine Learning Library on top of Hadoop
• Vaidya – Performance benchmarking framework.
• MRUnit – Unit testing framework for MapReduce Programs.
• And many more …
© Orzota, Inc. 2013
Introduction (contd.)
MapReduce in 2 minutes –
Problem Statement – Sum of Double of set of
Numbers.
The intermediate array after
Processing
20
MapReduce
1 3 4 5 6 8 9 11 17 21 1
3
4
5
6
8
9
11
17
21
2
6
8
10
12
16
18
22
34
42
© Orzota, Inc. 2013
21
Introduction – contd.
Mapping Phase
• Splitting the input
• Sending
slaves(datanodes) the
mapping code - f(x).
• Apply the f(x) method
on the data split 1
1
9
8
6
11
4
3
17
21
The Master
Node
This node
contains the
code of the
function to be
applied on
individual entries
of Array
Written in the
map() method in
Hadoop.
Mapping Phase
Code f(x) being sent to the
slave node for applying the
logic on the data piece. In our
case the data piece is an entry
from the Array.
Slave Nodes
© Orzota, Inc. 2013
22
Introduction – contd.
Spill Phase
• Masternode directs the
Mappers to send the
processed f(x) output
data to intermediate
location.
• Shuffle and Sorting
2
2
18
16
12
22
8
6
34
42
The Master
Node.
The Results of the
Processed Data
(from the slave
nodes is given to s
specific node
where reducer
function runs)
Spill Phase :- Shuffle and Sort
Slave Nodes
© Orzota, Inc. 2013
23
Introduction – contd.
Reduce Phase
• MasterNode
(JobTracker) to invokes
the Reduce task once
the spilling is over.
• Get location of the Spill
output from
MasterNode
(Namenode).
g(x)=162
The Master
Node.
The Results of
the Processed
Data (from the
slave nodes is
given to s
specific node
where reducer
function runs)
Reducer Phase
Slave Nodes
© Orzota, Inc. 2013
Steps involved in writing a MapReduce program
• Write the Mapper
• Write the Reducer
• Write the Driver
Life’s Simple until you start customizing and work on
Data Cleansing
24
MapReduce Programming
© Orzota, Inc. 2013
25
Hadoop – Bird’s Eye View
© Orzota, Inc. 2013
DN TT
DN TT
DN TT
DN TT
DN TT
DN TT
DN TT
DN TT …
… …
Name
Node
Job
Tracker
DFS Message Path
MapReduce Processing Msg
© Orzota, Inc. 2013
NoSQL – Not Only SQL
26
Non-Relational Databases
• Data Model not bound by a Schema.
• No Predetermined Schema, Run-Time Columns
• Sample Data
• Twitter Streams
• Web Forms
• Sensor Networks
27
Introduction
© Orzota, Inc. 2013
Schema-less Systems
Entry 1
{“name”:“emp1”}
Entry 2
{“name”:“emp2”,“e_id”:“1”,“e_addr”:“Cupertino”}
Entry 3
{“name”:“emp3”,“e_id”:“3”}
Entry 4
{“name”:“emp4”,“e_id”:“6”, “dob”:“03-Sep-1964”}
28
© Orzota, Inc. 2013
Business Requirements
• High Writes, Low Reads – Sensor Networks, Large Hadron
Collider, Click Logging.
• High Reads, Low Writes – Archival Storage.
• Don’t have any fixed Schema.
Open Question - Where Else?
29
© Orzota, Inc. 2013
NoSQL Types
• Key-Value Pair
• Riak, Voldemort, etc.
• Document Oriented
• CouchDB, MongoDB, etc.
• BigTable Implementations
• Cassandra, HyperTable, Hbase, etc.
• Graph oriented
• Neo4j, etc.
30
© Orzota, Inc. 2013
31
Introduction
© Orzota, Inc. 2013
Source: http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/
© Orzota, Inc. 2013
Wake up - Conclusion Time
• BigData on the Rise
• Technology and the Domain
• Smart Engineers needed, with BigData skills
• Chance to develop niche areas of Expertise even before
stepping into the Industry
• 3rd Year Students – Select your final year projects very
carefully, with the tools mentioned in this Seminar
• 4th Year Students – Equip your self with the necessary
skills for better industry opportunities.
© Orzota, Inc. 2013
Recommendations
33
• I recommend aspiring professionals and young
professionals read:
• How to Solve it by Computer – RG Dromey
• Code Complete 2 – Steve McConnell
• Advanced Programming in the Unix Environment – Richard
Stevens
• Many Books on Hadoop, NoSQL Datastores, and Big Data
in general.
© Orzota, Inc. 2013
… and many more
© Orzota, Inc. 2013
Questions ?
34
35
Contact Us at –
Thank You
Linkedin.com/company/orzota-inc-
Twitter.com/orzota
© Orzota, Inc. 2013

More Related Content

What's hot

PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Adam Doyle
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1Adam Muise
 
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...DataWorks Summit
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHortonworks
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreTrendwise Analytics
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Seeling Cheung
 

What's hot (20)

PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
 
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare Transformation
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 

Similar to Big Data, Hadoop, NoSQL and more ...

From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationCloudera, Inc.
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015Cloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next GenerationWes McKinney
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data ModelingAdam Doyle
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceData Science Milan
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaNeo4j
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopCloudera, Inc.
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
 
Future of Data Strategy
Future of Data StrategyFuture of Data Strategy
Future of Data StrategyDenodo
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
Oracle Data Science Platform
Oracle Data Science PlatformOracle Data Science Platform
Oracle Data Science PlatformOracle Developers
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 

Similar to Big Data, Hadoop, NoSQL and more ... (20)

From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next Generation
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data Modeling
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
Future of Data Strategy
Future of Data StrategyFuture of Data Strategy
Future of Data Strategy
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Oracle Data Science Platform
Oracle Data Science PlatformOracle Data Science Platform
Oracle Data Science Platform
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 

More from Varad Meru

Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesVarad Meru
 
Generating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningGenerating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningVarad Meru
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Varad Meru
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Varad Meru
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemVarad Meru
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...Varad Meru
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemVarad Meru
 
Cloud Computing: An Overview
Cloud Computing: An OverviewCloud Computing: An Overview
Cloud Computing: An OverviewVarad Meru
 
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Varad Meru
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionVarad Meru
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Varad Meru
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project GuidanceVarad Meru
 
OpenSourceEducation
OpenSourceEducationOpenSourceEducation
OpenSourceEducationVarad Meru
 

More from Varad Meru (16)

Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
Generating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningGenerating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep Learning
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction Problem
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
Cloud Computing: An Overview
Cloud Computing: An OverviewCloud Computing: An Overview
Cloud Computing: An Overview
 
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
 
OpenSourceEducation
OpenSourceEducationOpenSourceEducation
OpenSourceEducation
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Big Data, Hadoop, NoSQL and more ...

  • 2. Big Data, Hadoop, NoSQL and more … Varad Meru Software Development Engineer, Orzota, Inc. varad@orzota.com in.linkedin.com/in/vmeru @vrdmr © Orzota, Inc. 2013 2
  • 3.  Mission: Make big data easy for consumption  Offers Big Data/Hadoop Solutions and Software Services to companies  Develops Software to help companies consume Big Data  Founded in March 2012  Headquartered in Silicon Valley, California  Offshore offices in Chennai, India About Orzota © Orzota, Inc. 2013 3
  • 4.  We work on o Big Data o Hadoop o Cloud Technologies o Data Science o Products and Services o Everything that it takes to be a valued Player. About Orzota (contd.) © Orzota, Inc. 2013 4
  • 5.  Community Development  Occasional seminars by Architects, Engineers, Managers.  We invite professionals and aspiring professionals to join Big Data / Hadoop communities in their geographies.  Pune Hadoop User Group – Participant + Organizer.  Chennai Hadoop User Group – Participant + Sponsor. About Orzota (contd.) © Orzota, Inc. 2013 5
  • 6. About Me • Orzota, Inc. • Currently working with Hadoop, Mahout, Cloud, etc. • Past Work Experience • Persistent Systems – Search, Recommendation Engines and User Behavior Analytics. • Area of Interest • Data Science, Information Retrieval • Distributed Systems 6 © Orzota, Inc. 2013
  • 7. 7 © Orzota, Inc. 2013 Some of the Innovation Centers in Technological World
  • 8. Agenda • Introduction to BigData • Technologies and Domain • Hadoop EcoSystem • Introduction to MapReduce • Architecture – HDFS + MapReduce. • NoSQL Databases • CAP Theorem • Different NoSQL Databases • Other Trends © Orzota, Inc. 2013 8
  • 9. © Orzota, Inc. 2013 Big Data 9
  • 10. • What is Big Data? • What does it mean to me? • Why so much fuss in the industry? • Who uses these technologies? • How are they used in the Industry and Academia? • When to start using them? • How to learn them? 10 Big Data © Orzota, Inc. 2013
  • 11. • Volume - Amassing terabytes—even petabytes—of information. • 12 terabytes of Tweets created each day. • 350 billion annual meter readings. • Velocity - Sometimes 2 minutes is too late. • Scrutinize 5 million trade events. • 500 million daily call detail records • Variety - Big data is any type of data. • 80% data growth in images, video and documents. “Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” – Laney Douglas. "The Importance of 'Big Data': A Definition" 11 Big Data – 3 Vs © Orzota, Inc. 2013
  • 12. Problem 12 • Store and Process Data for - • Search Engines, • Recommendations Engines, • Fraud Detection, • Aadhar (Govt. of India), • Spam Detection, etc. • Also, in some cases Real-time (e.g. Facebook) © Orzota, Inc. 2013
  • 13. Solutions ? 13 • Classical Solutions • Database + Programming Language (Java-Oracle, C#- SQL Server) • Data Warehouses – Teradata, Netezza, Microsoft PDW • Legacy Network Systems • Novel • CORBA • Java RMI – RPC © Orzota, Inc. 2013
  • 14. Problems of the Solutions 14 • Problems with Classical Solutions • CAP Theorem, by Prof. Eric Brewer (Berkeley) – • Choose any 2 between Consistency, Availability and Partitioning • ACID Properties • For Small number of Transactions, cumulative overhead still manageable. • For Very large number of Transactions – Facebook Posts? • Very High Licensing Fees. • Closed Source – Stick with the Company’s Eco-System. © Orzota, Inc. 2013
  • 15. Solution to the Problems of the Solutions 15 • Focus on Problem Domain • What’s more important for your Solution? • Consistency, Availability, and Partitioning • Which Industry/Company already face similar Problems? • How/Where to Collect Data? • Technology Fields – Internet Companies • Hadoop, NoSQL Datastores • Open Source, Free and with Friendly Licenses. © Orzota, Inc. 2013
  • 16. © Orzota, Inc. 2013 Hadoop Eco-System 16
  • 17. Introduction 17 • Started by Doug Cutting and Mike Caferella for Nutch – Open Search Engine. • Further Developed at Yahoo!, Facebook and contributed by people from many companies. • Named after a Little Toy Elephant owned by Doug’s Son. • Inspired by 2 research papers from Google • The Google File System – 2003 • MapReduce – 2004 © Orzota, Inc. 2013
  • 18. Introduction (contd.) 18 • Contains 3 modules • Distributed File System • MapReduce • Commons (A Java library containing common functions used by both DFS and MapReduce) • Apache Top Level Project • Hadoop’s Website – hadoop.apache.org • Two Parallel Release Cycles – 1.x and 2.x © Orzota, Inc. 2013
  • 19. 19 • A Rich Eco-System built around Hadoop • Hive – Large Scale Data Warehouse • Hbase – NoSQL Database • Pig – A Data-flow language on top of Hadoop • Flume – Log Management for Hadoop • Oozie –Workflow framework • Mahout – Machine Learning Library on top of Hadoop • Vaidya – Performance benchmarking framework. • MRUnit – Unit testing framework for MapReduce Programs. • And many more … © Orzota, Inc. 2013 Introduction (contd.)
  • 20. MapReduce in 2 minutes – Problem Statement – Sum of Double of set of Numbers. The intermediate array after Processing 20 MapReduce 1 3 4 5 6 8 9 11 17 21 1 3 4 5 6 8 9 11 17 21 2 6 8 10 12 16 18 22 34 42 © Orzota, Inc. 2013
  • 21. 21 Introduction – contd. Mapping Phase • Splitting the input • Sending slaves(datanodes) the mapping code - f(x). • Apply the f(x) method on the data split 1 1 9 8 6 11 4 3 17 21 The Master Node This node contains the code of the function to be applied on individual entries of Array Written in the map() method in Hadoop. Mapping Phase Code f(x) being sent to the slave node for applying the logic on the data piece. In our case the data piece is an entry from the Array. Slave Nodes © Orzota, Inc. 2013
  • 22. 22 Introduction – contd. Spill Phase • Masternode directs the Mappers to send the processed f(x) output data to intermediate location. • Shuffle and Sorting 2 2 18 16 12 22 8 6 34 42 The Master Node. The Results of the Processed Data (from the slave nodes is given to s specific node where reducer function runs) Spill Phase :- Shuffle and Sort Slave Nodes © Orzota, Inc. 2013
  • 23. 23 Introduction – contd. Reduce Phase • MasterNode (JobTracker) to invokes the Reduce task once the spilling is over. • Get location of the Spill output from MasterNode (Namenode). g(x)=162 The Master Node. The Results of the Processed Data (from the slave nodes is given to s specific node where reducer function runs) Reducer Phase Slave Nodes © Orzota, Inc. 2013
  • 24. Steps involved in writing a MapReduce program • Write the Mapper • Write the Reducer • Write the Driver Life’s Simple until you start customizing and work on Data Cleansing 24 MapReduce Programming © Orzota, Inc. 2013
  • 25. 25 Hadoop – Bird’s Eye View © Orzota, Inc. 2013 DN TT DN TT DN TT DN TT DN TT DN TT DN TT DN TT … … … Name Node Job Tracker DFS Message Path MapReduce Processing Msg
  • 26. © Orzota, Inc. 2013 NoSQL – Not Only SQL 26
  • 27. Non-Relational Databases • Data Model not bound by a Schema. • No Predetermined Schema, Run-Time Columns • Sample Data • Twitter Streams • Web Forms • Sensor Networks 27 Introduction © Orzota, Inc. 2013
  • 28. Schema-less Systems Entry 1 {“name”:“emp1”} Entry 2 {“name”:“emp2”,“e_id”:“1”,“e_addr”:“Cupertino”} Entry 3 {“name”:“emp3”,“e_id”:“3”} Entry 4 {“name”:“emp4”,“e_id”:“6”, “dob”:“03-Sep-1964”} 28 © Orzota, Inc. 2013
  • 29. Business Requirements • High Writes, Low Reads – Sensor Networks, Large Hadron Collider, Click Logging. • High Reads, Low Writes – Archival Storage. • Don’t have any fixed Schema. Open Question - Where Else? 29 © Orzota, Inc. 2013
  • 30. NoSQL Types • Key-Value Pair • Riak, Voldemort, etc. • Document Oriented • CouchDB, MongoDB, etc. • BigTable Implementations • Cassandra, HyperTable, Hbase, etc. • Graph oriented • Neo4j, etc. 30 © Orzota, Inc. 2013
  • 31. 31 Introduction © Orzota, Inc. 2013 Source: http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/ © Orzota, Inc. 2013
  • 32. Wake up - Conclusion Time • BigData on the Rise • Technology and the Domain • Smart Engineers needed, with BigData skills • Chance to develop niche areas of Expertise even before stepping into the Industry • 3rd Year Students – Select your final year projects very carefully, with the tools mentioned in this Seminar • 4th Year Students – Equip your self with the necessary skills for better industry opportunities. © Orzota, Inc. 2013
  • 33. Recommendations 33 • I recommend aspiring professionals and young professionals read: • How to Solve it by Computer – RG Dromey • Code Complete 2 – Steve McConnell • Advanced Programming in the Unix Environment – Richard Stevens • Many Books on Hadoop, NoSQL Datastores, and Big Data in general. © Orzota, Inc. 2013 … and many more
  • 34. © Orzota, Inc. 2013 Questions ? 34
  • 35. 35 Contact Us at – Thank You Linkedin.com/company/orzota-inc- Twitter.com/orzota © Orzota, Inc. 2013

Editor's Notes

  1. Complete till this in 8 mins. You have 25 minutes left.