SlideShare una empresa de Scribd logo
1 de 25
Zero lecture
Big Data Analytics Lab
VISHAL CHOUDHARY
8CS4-21: Big Data Analytics
Lab
 Credit:2
 Max. Marks: 50 (IA:30, ETE:20)
 0L+0T+2P
 End Term Exam: 2 Hours
List of Experiments:
1. Implement the following Data structures in
Java
i) nked Lists
ii) ii) Stacks
iii) iii) Queues
iv) iv) Set
v) v) Map
2.Perform setting up and Installing Hadoop in its three operating modes:
Standalone, Pseudodistributed, Fully distributed.
Hadoop Mainly works on 3 different Modes:
 Standalone Mode
 Pseudo-distributed Mode
 Fully-Distributed Mode
1. Standalone Mode
In Standalone Mode none of the Daemon will run i.e. Namenode, Datanode,
Secondary Name node, Job Tracker, and Task Tracker. We use job-tracker and
task-tracker for processing purposes in Hadoop1. For Hadoop2 we use Resource
Manager and Node Manager. Standalone Mode also means that we are installing
Hadoop only in a single system. By default, Hadoop is made to run in this
Standalone Mode or we can also call it as the Local mode. We mainly use
Hadoop in this Mode for the Purpose of Learning, testing, and debugging.
 Hadoop works very much Fastest in this mode among all of these 3 modes. As
we all know HDFS (Hadoop distributed file system) is one of the major
components for Hadoop which utilized for storage Permission is not utilized in
this mode. You can think of HDFS as similar to the file system’s available for
windows i.e. NTFS (New Technology File System) and FAT32(File Allocation
Table which stores the data in the blocks of 32 bits ). when your Hadoop works in
this mode there is no need to configure the files – hdfs-site.xml, mapred-
site.xml, core-site.xml for Hadoop environment. In this Mode, all of your
Processes will run on a single JVM(Java Virtual Machine) and this mode can only
be used for small development purposes.
2. Pseudo Distributed Mode (Single Node Cluster)
.
 In Pseudo-distributed Mode we also use only a single node, but the
main thing is that the cluster is simulated, which means that all the
processes inside the cluster will run independently to each other. All the
daemons that are Namenode, Datanode, Secondary Name node,
Resource Manager, Node Manager, etc. will be running as a separate
process on separate JVM(Java Virtual Machine) or we can say run on
different java processes that is why it is called a Pseudo-distributed.
 One thing we should remember that as we are using only the single
node set up so all the Master and Slave processes are handled by the
single system. Namenode and Resource Manager are used as Master
and Datanode and Node Manager is used as a slave. A secondary
name node is also used as a Master. The purpose of the Secondary
Name node is to just keep the hourly based backup of the Name node.
In this Mode,
 Hadoop is used for development and for debugging purposes both.
 Our HDFS(Hadoop Distributed File System ) is utilized for managing the
Input and Output processes.
 We need to change the configuration files mapred-site.xml, core-
site.xml, hdfs-site.xml for setting up the environment.
3. Fully Distributed Mode (Multi-Node Cluster)
 This is the most important one in which multiple nodes are
used few of them run the Master Daemon’s that are
Namenode and Resource Manager and the rest of them
run the Slave Daemon’s that are DataNode and Node
Manager. Here Hadoop will run on the clusters of Machine
or nodes. Here the data that is used is distributed across
different nodes. This is actually the Production Mode of
Hadoop let’s clarify or understand this Mode in a better
way in Physical Terminology.
 Once you download the Hadoop in a tar file format or zip
file format then you install it in your system and you run all
the processes in a single system but here in the fully
distributed mode we are extracting this tar or zip file to
each of the nodes in the Hadoop cluster and then we are
using a particular node for a particular process. Once you
distribute the process among the nodes then you’ll define
which nodes are working as a master or which one of them
Fully Distributed Mode
3.Implement the following file
management tasks in Hadoop:
 Adding files and directories
 Retrieving files
 Deleting files Hint: A typical Hadoop workflow creates data files
(such
 as log files) elsewhere and copies them into HDFS using one of
the
4.Run a basic Word Count Map Reduce program to understand
Map Reduce Paradigm.
MapReduce is a programming model and an
associated implementation for processing large
data sets Users specify a Map function that
processes a key/value pair to generate a set of
intermediate key/value pairs, and a Reduce function
that merges all intermediate values associated with
5. Write a Map Reduce program that mines weather data. Weather sensors collecting data every
hour at many locations across the globe gather a large volume of log data, which is a good
candidate for analysis with MapReduce,since it is semi structured and record-oriented.
 weather sensors are collecting weather information across the
globe in a large volume of log data. This weather data is semi-
structured and record-oriented.
 This data is stored in a line-oriented ASCII format, where each
row represents a single record. Each row has lots of fields like
longitude, latitude, daily max-min temperature, daily average
temperature, etc. for easiness, we will focus on the main
element, i.e. temperature. We will use the data from the National
Centres for Environmental Information(NCEI). It has a massive
amount of historical weather data that we can use for data
analysis.
 Hadoop MapReduce is a software framework for easily writing
applications which process big amounts of data in-parallel on
large clusters (thousands of nodes) of commodity hardware in a
reliable, fault-tolerant manner. The term MapReduce actually
refers to the following two different tasks that Hadoop programs
perform:
 The Map Task: This is the first task, which takes input
data and converts it into a set of data, where individual
elements are broken down into tuples (key/value
pairs).
 The Reduce Task: This task takes the output from a
map task as input and combines those data tuples into
a smaller set of tuples. The reduce task is always
performed after the map task.
6.Implement Matrix Multiplication with Hadoop Map
Reduce
MapReduce is a technique in which a huge program is
subdivided into small tasks and run parallelly to make
computation faster, save time, and mostly used in
distributed systems. It has 2 important parts:
 Mapper: It takes raw data input and organizes into key,
value pairs. For example, In a dictionary, you search for
the word “Data” and its associated meaning is “facts and
statistics collected together for reference or analysis”.
Here the Key is Data and the
 Value associated with is facts and statistics collected
together for reference or analysis.
 Reducer: It is responsible for processing data in parallel
and produce final output.
7.Install and Run Pig then write Pig Latin scripts to sort, group, join,
project,and filter your data
 Pig is a high-level programming language useful for analyzing
large data sets. Pig was a result of development effort at Yahoo!
 In a MapReduce framework, programs need to be translated into
a series of Map and Reduce stages. However, this is not a
programming model which data analysts are familiar with. So, in
order to bridge this gap, an abstraction called Pig was built on
top of Hadoop.
 A Pig Latin program consists of a series of operations or
transformations which are applied to the input data to produce
output. These operations describe a data flow which is translated
into an executable representation, by Hadoop Pig execution
environment. Underneath, results of these transformations are
series of MapReduce jobs which a programmer is unaware of.
So, in a way, Pig in Hadoop allows the programmer to focus on
data rather than the nature of execution.
 PigLatin is a relatively stiffened language which uses familiar
keywords from data processing e.g., Join, Group and Filter.

8.Install and Run Hive then use Hive to create, alter, and drop
databases,tables, views, functions, and indexes.
Apache Hive is a data warehouse software project built
on top of Apache Hadoop for providing data query and
analysis. Hive gives an SQL-like interface to query data
stored in various databases and file systems that
integrate with Hadoop
9.Solve some real life big data
problems.
Program -1 :Linked list in java
 Linked List is a part of the Collection
framework present in java.util package. This class
is an implementation of the LinkedList data
structure which is a linear data structure where
the elements are not stored in contiguous
locations and every element is a separate object
with a data part and address part. The elements
are linked using pointers and addresses. Each
element is known as a node. Due to the
dynamicity and ease of insertions and deletions,
they are preferred over the arrays. It also has few
disadvantages like the nodes cannot be accessed
directly instead we need to start from the head
and follow through the link to reach to a node we
wish to access.
create and use a linked list.
import java.util.*;
public class Test {
public static void main(String args[])
{ LinkedList<String> ll
= new LinkedList<String>();
// Adding elements to the linked list
ll.add("A");
ll.add("B");
ll.addLast("C");
ll.addFirst("D");
ll.add(2, "E");
System.out.println(ll);
ll.remove("B");
ll.remove(3);
ll.removeFirst();
ll.removeLast();
System.out.println(ll);
}
}
Performing Various Operations on LinkedList
1. Adding Elements: In order to add an element to
an ArrayList, we can use the add() method. This
method is overloaded to perform multiple
operations based on different parameters. They
are:
 add(Object): This method is used to add an
element at the end of the LinkedList.
 add(int index, Object): This method is used to
add an element at a specific index in the
LinkedList.
2. Changing Elements: After adding the elements, if we
wish to change the element, it can be done using the set()
method. Since a LinkedList is indexed, the element which
we wish to change is referenced by the index of the
element. Therefore, this method takes an index and the
updated element which needs to be inserted at that index.
 // Java program to change elements in a LinkedList
import java.util.*;
public class GFG {
public static void main(String args[])
{
LinkedList<String> ll = new LinkedList<>();
ll.add("Geeks");
ll.add("Geeks");
ll.add(1, "Geeks");
System.out.println("Initial LinkedList " + ll);
ll.set(1, "For");
System.out.println("Updated LinkedList " + ll);
}
}
3. Removing Elements: In order to remove an element
from a LinkedList, we can use the remove() method.
This method is overloaded to perform multiple
operations based on different parameters. They are:
remove(Object): This method is used to simply remove
an object from the LinkedList. If there are multiple
such objects, then the first occurrence of the object is
removed.
remove(int index): Since a LinkedList is indexed, this
method takes an integer value which simply removes
the element present at that specific index in the
LinkedList. After removing the element, all the
elements are moved to the left to fill the space and the
indices of the objects are updated.
// Java program to remove elements // in a LinkedList
import java.util.*;
public class GFG {
public static void main(String args[])
{
LinkedList<String> ll = new LinkedList<>();
ll.add("Geeks");
ll.add("Geeks");
ll.add(1, "For");
System.out.println(
"Initial LinkedList " + ll);
ll.remove(1);
System.out.println(
"After the Index Removal " + ll);
ll.remove("Geeks");
System.out.println(
"After the Object Removal " + ll);
}
}
 4. Iterating the LinkedList: There are multiple ways
to iterate through the LinkedList. The most famous
ways are by using the basic for loop in combination
with a get() method to get the element at a specific
index and the advanced for loop.
// Java program to iterate the elements in an LinkedList
import java.util.*;
public class GFG {
public static void main(String args[])
{
LinkedList<String> ll
= new LinkedList<>();
ll.add("Geeks");
ll.add("Geeks");
ll.add(1, "For");
// Using the Get method and the
// for loop
for (int i = 0; i < ll.size(); i++) {
System.out.print(ll.get(i) + " ");
}
System.out.println();
// Using the for each loop
for (String str : ll)
System.out.print(str + " ");
}
}

Más contenido relacionado

Similar a BD-zero lecture.pptx

Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf8840VinayShelke
 
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATADATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATAAishwarya Saseendran
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Rupak Roy
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 

Similar a BD-zero lecture.pptx (20)

Bigdata
BigdataBigdata
Bigdata
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf
 
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATADATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
 
hadoop.pptx
hadoop.pptxhadoop.pptx
hadoop.pptx
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Big data
Big dataBig data
Big data
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
hadoop.pptx
hadoop.pptxhadoop.pptx
hadoop.pptx
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
Unit 5
Unit  5Unit  5
Unit 5
 

Más de vishal choudhary (20)

SE-Lecture1.ppt
SE-Lecture1.pptSE-Lecture1.ppt
SE-Lecture1.ppt
 
SE-Testing.ppt
SE-Testing.pptSE-Testing.ppt
SE-Testing.ppt
 
SE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptSE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.ppt
 
SE-Lecture-7.pptx
SE-Lecture-7.pptxSE-Lecture-7.pptx
SE-Lecture-7.pptx
 
Se-Lecture-6.ppt
Se-Lecture-6.pptSe-Lecture-6.ppt
Se-Lecture-6.ppt
 
SE-Lecture-5.pptx
SE-Lecture-5.pptxSE-Lecture-5.pptx
SE-Lecture-5.pptx
 
XML.pptx
XML.pptxXML.pptx
XML.pptx
 
SE-Lecture-8.pptx
SE-Lecture-8.pptxSE-Lecture-8.pptx
SE-Lecture-8.pptx
 
SE-coupling and cohesion.ppt
SE-coupling and cohesion.pptSE-coupling and cohesion.ppt
SE-coupling and cohesion.ppt
 
SE-Lecture-2.pptx
SE-Lecture-2.pptxSE-Lecture-2.pptx
SE-Lecture-2.pptx
 
SE-software design.ppt
SE-software design.pptSE-software design.ppt
SE-software design.ppt
 
SE1.ppt
SE1.pptSE1.ppt
SE1.ppt
 
SE-Lecture-4.pptx
SE-Lecture-4.pptxSE-Lecture-4.pptx
SE-Lecture-4.pptx
 
SE-Lecture=3.pptx
SE-Lecture=3.pptxSE-Lecture=3.pptx
SE-Lecture=3.pptx
 
Multimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptxMultimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptx
 
MultimediaLecture5.pptx
MultimediaLecture5.pptxMultimediaLecture5.pptx
MultimediaLecture5.pptx
 
Multimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptxMultimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptx
 
MultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxMultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptx
 
Multimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptxMultimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptx
 
Multimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptxMultimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptx
 

Último

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Último (20)

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

BD-zero lecture.pptx

  • 1. Zero lecture Big Data Analytics Lab VISHAL CHOUDHARY
  • 2. 8CS4-21: Big Data Analytics Lab  Credit:2  Max. Marks: 50 (IA:30, ETE:20)  0L+0T+2P  End Term Exam: 2 Hours
  • 3. List of Experiments: 1. Implement the following Data structures in Java i) nked Lists ii) ii) Stacks iii) iii) Queues iv) iv) Set v) v) Map
  • 4. 2.Perform setting up and Installing Hadoop in its three operating modes: Standalone, Pseudodistributed, Fully distributed. Hadoop Mainly works on 3 different Modes:  Standalone Mode  Pseudo-distributed Mode  Fully-Distributed Mode 1. Standalone Mode In Standalone Mode none of the Daemon will run i.e. Namenode, Datanode, Secondary Name node, Job Tracker, and Task Tracker. We use job-tracker and task-tracker for processing purposes in Hadoop1. For Hadoop2 we use Resource Manager and Node Manager. Standalone Mode also means that we are installing Hadoop only in a single system. By default, Hadoop is made to run in this Standalone Mode or we can also call it as the Local mode. We mainly use Hadoop in this Mode for the Purpose of Learning, testing, and debugging.  Hadoop works very much Fastest in this mode among all of these 3 modes. As we all know HDFS (Hadoop distributed file system) is one of the major components for Hadoop which utilized for storage Permission is not utilized in this mode. You can think of HDFS as similar to the file system’s available for windows i.e. NTFS (New Technology File System) and FAT32(File Allocation Table which stores the data in the blocks of 32 bits ). when your Hadoop works in this mode there is no need to configure the files – hdfs-site.xml, mapred- site.xml, core-site.xml for Hadoop environment. In this Mode, all of your Processes will run on a single JVM(Java Virtual Machine) and this mode can only be used for small development purposes.
  • 5. 2. Pseudo Distributed Mode (Single Node Cluster) .  In Pseudo-distributed Mode we also use only a single node, but the main thing is that the cluster is simulated, which means that all the processes inside the cluster will run independently to each other. All the daemons that are Namenode, Datanode, Secondary Name node, Resource Manager, Node Manager, etc. will be running as a separate process on separate JVM(Java Virtual Machine) or we can say run on different java processes that is why it is called a Pseudo-distributed.  One thing we should remember that as we are using only the single node set up so all the Master and Slave processes are handled by the single system. Namenode and Resource Manager are used as Master and Datanode and Node Manager is used as a slave. A secondary name node is also used as a Master. The purpose of the Secondary Name node is to just keep the hourly based backup of the Name node. In this Mode,  Hadoop is used for development and for debugging purposes both.  Our HDFS(Hadoop Distributed File System ) is utilized for managing the Input and Output processes.  We need to change the configuration files mapred-site.xml, core- site.xml, hdfs-site.xml for setting up the environment.
  • 6.
  • 7. 3. Fully Distributed Mode (Multi-Node Cluster)  This is the most important one in which multiple nodes are used few of them run the Master Daemon’s that are Namenode and Resource Manager and the rest of them run the Slave Daemon’s that are DataNode and Node Manager. Here Hadoop will run on the clusters of Machine or nodes. Here the data that is used is distributed across different nodes. This is actually the Production Mode of Hadoop let’s clarify or understand this Mode in a better way in Physical Terminology.  Once you download the Hadoop in a tar file format or zip file format then you install it in your system and you run all the processes in a single system but here in the fully distributed mode we are extracting this tar or zip file to each of the nodes in the Hadoop cluster and then we are using a particular node for a particular process. Once you distribute the process among the nodes then you’ll define which nodes are working as a master or which one of them
  • 9. 3.Implement the following file management tasks in Hadoop:  Adding files and directories  Retrieving files  Deleting files Hint: A typical Hadoop workflow creates data files (such  as log files) elsewhere and copies them into HDFS using one of the
  • 10. 4.Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm. MapReduce is a programming model and an associated implementation for processing large data sets Users specify a Map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a Reduce function that merges all intermediate values associated with
  • 11. 5. Write a Map Reduce program that mines weather data. Weather sensors collecting data every hour at many locations across the globe gather a large volume of log data, which is a good candidate for analysis with MapReduce,since it is semi structured and record-oriented.  weather sensors are collecting weather information across the globe in a large volume of log data. This weather data is semi- structured and record-oriented.  This data is stored in a line-oriented ASCII format, where each row represents a single record. Each row has lots of fields like longitude, latitude, daily max-min temperature, daily average temperature, etc. for easiness, we will focus on the main element, i.e. temperature. We will use the data from the National Centres for Environmental Information(NCEI). It has a massive amount of historical weather data that we can use for data analysis.  Hadoop MapReduce is a software framework for easily writing applications which process big amounts of data in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. The term MapReduce actually refers to the following two different tasks that Hadoop programs perform:
  • 12.  The Map Task: This is the first task, which takes input data and converts it into a set of data, where individual elements are broken down into tuples (key/value pairs).  The Reduce Task: This task takes the output from a map task as input and combines those data tuples into a smaller set of tuples. The reduce task is always performed after the map task.
  • 13. 6.Implement Matrix Multiplication with Hadoop Map Reduce MapReduce is a technique in which a huge program is subdivided into small tasks and run parallelly to make computation faster, save time, and mostly used in distributed systems. It has 2 important parts:  Mapper: It takes raw data input and organizes into key, value pairs. For example, In a dictionary, you search for the word “Data” and its associated meaning is “facts and statistics collected together for reference or analysis”. Here the Key is Data and the  Value associated with is facts and statistics collected together for reference or analysis.  Reducer: It is responsible for processing data in parallel and produce final output.
  • 14. 7.Install and Run Pig then write Pig Latin scripts to sort, group, join, project,and filter your data  Pig is a high-level programming language useful for analyzing large data sets. Pig was a result of development effort at Yahoo!  In a MapReduce framework, programs need to be translated into a series of Map and Reduce stages. However, this is not a programming model which data analysts are familiar with. So, in order to bridge this gap, an abstraction called Pig was built on top of Hadoop.  A Pig Latin program consists of a series of operations or transformations which are applied to the input data to produce output. These operations describe a data flow which is translated into an executable representation, by Hadoop Pig execution environment. Underneath, results of these transformations are series of MapReduce jobs which a programmer is unaware of. So, in a way, Pig in Hadoop allows the programmer to focus on data rather than the nature of execution.  PigLatin is a relatively stiffened language which uses familiar keywords from data processing e.g., Join, Group and Filter. 
  • 15. 8.Install and Run Hive then use Hive to create, alter, and drop databases,tables, views, functions, and indexes. Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop
  • 16. 9.Solve some real life big data problems.
  • 17. Program -1 :Linked list in java  Linked List is a part of the Collection framework present in java.util package. This class is an implementation of the LinkedList data structure which is a linear data structure where the elements are not stored in contiguous locations and every element is a separate object with a data part and address part. The elements are linked using pointers and addresses. Each element is known as a node. Due to the dynamicity and ease of insertions and deletions, they are preferred over the arrays. It also has few disadvantages like the nodes cannot be accessed directly instead we need to start from the head and follow through the link to reach to a node we wish to access.
  • 18. create and use a linked list. import java.util.*; public class Test { public static void main(String args[]) { LinkedList<String> ll = new LinkedList<String>(); // Adding elements to the linked list ll.add("A"); ll.add("B"); ll.addLast("C"); ll.addFirst("D"); ll.add(2, "E"); System.out.println(ll); ll.remove("B"); ll.remove(3); ll.removeFirst(); ll.removeLast(); System.out.println(ll); } }
  • 19. Performing Various Operations on LinkedList 1. Adding Elements: In order to add an element to an ArrayList, we can use the add() method. This method is overloaded to perform multiple operations based on different parameters. They are:  add(Object): This method is used to add an element at the end of the LinkedList.  add(int index, Object): This method is used to add an element at a specific index in the LinkedList.
  • 20. 2. Changing Elements: After adding the elements, if we wish to change the element, it can be done using the set() method. Since a LinkedList is indexed, the element which we wish to change is referenced by the index of the element. Therefore, this method takes an index and the updated element which needs to be inserted at that index.
  • 21.  // Java program to change elements in a LinkedList import java.util.*; public class GFG { public static void main(String args[]) { LinkedList<String> ll = new LinkedList<>(); ll.add("Geeks"); ll.add("Geeks"); ll.add(1, "Geeks"); System.out.println("Initial LinkedList " + ll); ll.set(1, "For"); System.out.println("Updated LinkedList " + ll); } }
  • 22. 3. Removing Elements: In order to remove an element from a LinkedList, we can use the remove() method. This method is overloaded to perform multiple operations based on different parameters. They are: remove(Object): This method is used to simply remove an object from the LinkedList. If there are multiple such objects, then the first occurrence of the object is removed. remove(int index): Since a LinkedList is indexed, this method takes an integer value which simply removes the element present at that specific index in the LinkedList. After removing the element, all the elements are moved to the left to fill the space and the indices of the objects are updated.
  • 23. // Java program to remove elements // in a LinkedList import java.util.*; public class GFG { public static void main(String args[]) { LinkedList<String> ll = new LinkedList<>(); ll.add("Geeks"); ll.add("Geeks"); ll.add(1, "For"); System.out.println( "Initial LinkedList " + ll); ll.remove(1); System.out.println( "After the Index Removal " + ll); ll.remove("Geeks"); System.out.println( "After the Object Removal " + ll); } }
  • 24.  4. Iterating the LinkedList: There are multiple ways to iterate through the LinkedList. The most famous ways are by using the basic for loop in combination with a get() method to get the element at a specific index and the advanced for loop.
  • 25. // Java program to iterate the elements in an LinkedList import java.util.*; public class GFG { public static void main(String args[]) { LinkedList<String> ll = new LinkedList<>(); ll.add("Geeks"); ll.add("Geeks"); ll.add(1, "For"); // Using the Get method and the // for loop for (int i = 0; i < ll.size(); i++) { System.out.print(ll.get(i) + " "); } System.out.println(); // Using the for each loop for (String str : ll) System.out.print(str + " "); } }