SlideShare una empresa de Scribd logo
1 de 26
Big Data : Concept and Applications
Big data: Concept & Applications
Big data is the term for collection of dataset so large
and complex that it become difficult to process using
on hand database management tools or traditional
data processing applications.
The amount of data that is beyond the storage
and processing capabilities of single physical
machine then it is called Big data.
Big data ?
Large volume of data
Existing tools were not designed to handle such a huge data
.
Gigabyte  Terabyte  Petabyte  Exabyte  Zeta byte
Title : Big data : Concept & Applications
Amazon  collect social data ,log data , different flavor of data.
Walmart  handles more than 1 million customer transactions every hour.
Twitter  300000 tweets per minutes
Instagram  250000 upload new picture per minutes
Email  5 million messages (gmail)
WhatsApp  4,00,000 pictures per min
Google  5 millions search request per min
Facebook  2.5 millions contents per min
500 TB per day
Having data bigger it requires different approaches:
Techniques, Tools and Architecture
An aim to solve new problems or old problems in a better
way
Big Data generates value from the storage and processing of
very large quantities of digital information that cannot be
analyzed with traditional computing techniques.
Big data : 3V
•Variety
data coming from various sources
• Velocity
real time live streaming data
• Volume
in order of terabyte and petabyte
Title : Big data : Concept & Applications
Big Data are in everywhere.
Network Analysis
Social Network Web Graph
Bigdata : Volume
 Volume of data is increasing in every second
 Data will be measured in TB and ZB.
 Amount of data will be double in every two
years
 100 terabytes of data are uploaded
daily to Facebook
 100 hours of video uploaded in
every minute
 Research estimated 65% annual
growth in digital contents , mainly
unstructured data.
Gigabyte  Terabyte  Petabyte  Exabyte  Zetabyte
Data is created real
time
Internet of thing (IOT),
social media – major
contributor for the
speed at which the
data is generated.
In every minute
25 million queries on Google
 20 million photos are viewed on Flickr
 over 200 million emails are sent
Big Data : Velocity
Data are coming in all shape
structured,
semistructured, unstructured &
even complexed structure
90% of data generated is
‘unstructured’
starting from text to audio,
image or video data.
Big Data : Variety
Big Data Life Cycle
Storage
Capacity
2000 2018
Storage
MB
PB
2025
Processing
Speed
10
Solution : Big Data
11
Hadoop
Apache Hadoop is a framework for storing ,processing and
analyzing big data.
•Distributed
•Scalable
•Open Source
12
Why Hadoop?
• 1 TB data is processed
by 1 computer
• Each computer is
having 4 I/O channel
of 100 mbps.
• Total time required :
44 minutes
1 TB data is processed by 10
computers (same configuration)
parallel .
Total time required : 4.4 minutes
CASE 1 CASE 2
13
HDFS (Hadoop Distributed file System)
- Stores data on the cluster
HDFS is a file system written in Java
Provide storage for massive amount of data
- Scalable
- Fault Tolerance
- Support efficient processing in MR
14
Hadoop : How files are stored?
-Data files split into blocks and distributed to data nodes
- Each block is replicated in multiple nodes ( default 3x)
15
HDFS (Master/Slaves Architecture)
Master machine is Name Node
Slaves machine are Data Node
16
MAP REDUCE
Map Reduce is a framework for executing highly parallelizable and
distributable algorithms across huge datasets.
17
Map Reduce : Mappers run parallel
18
Mad Reduce : Analyzing data
19
Basic Cluster Configuration
20
HADOOP ECO-SYSTEM
21
HADOOP
Hadoop =HDFS + Map Reduce
Hadoop HDFS commands are similar
to unix command.
Map reduce is programming model
Hive  Data Manipulation (like SQL)
Pig  Data Manipulation using Script
Sqoop  Import and Export on HDFS
22
Import/Export using Sqoop and Flume
Sqoop : Transfers data between RDBMS and HDFS
Flume : A service to move large amounts of data in real Time
Applications
E-commerce : (Amazon)
Recommendation Engine
-User buy pattern
-Digital Marketing Analysis
Telecommunication
-Call drop Analysis
-Network Problem Optimization
Entertainment
-Content Analytics (Netflix)
Sports
-Fitness Management (fitbit)
Health Care
-Early Disease Detection (pfizer)
Applications
Technology: In the technology, it is used in the websites like eBay,
Amazon and Facebook and Google utilize it.
Private sector: The application of big data in the private sector includes
the retail, retail banking, and real estate.
Government: The big data is also utilized by the Indian government.
International development: The development in the big data analysis
furnishes cost-effective opportunities to enhance the decision in critical
advancement areas like health care, employment opportunities and crime,
security and natural disaster. Hence, in this way, the big data is helpful for
the international development.
References
1.www.google.com
2.www.wikipedia.com
3.www.hortonworks.com
Question
Thank You.

Más contenido relacionado

La actualidad más candente

Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou, MBA, PhD
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshersrajkamaltibacademy
 
Big Data Projects Research Ideas
Big Data Projects Research IdeasBig Data Projects Research Ideas
Big Data Projects Research IdeasMatlab Simulation
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworksAmal Targhi
 

La actualidad más candente (20)

A Brief History Of Data
A Brief History Of DataA Brief History Of Data
A Brief History Of Data
 
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Data mining on big data
Data mining on big dataData mining on big data
Data mining on big data
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data
Big dataBig data
Big data
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
Big Data Projects Research Ideas
Big Data Projects Research IdeasBig Data Projects Research Ideas
Big Data Projects Research Ideas
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 

Similar a Big Data (20)

Big data and hadoop introduction
Big data and hadoop introductionBig data and hadoop introduction
Big data and hadoop introduction
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big data
Big dataBig data
Big data
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
How Do I Learn Big Data
How Do I Learn Big DataHow Do I Learn Big Data
How Do I Learn Big Data
 
How Do I Learn Big Data
How Do I Learn Big DataHow Do I Learn Big Data
How Do I Learn Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big Data And Hadoop
Big Data And HadoopBig Data And Hadoop
Big Data And Hadoop
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 

Último

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Último (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Big Data

  • 1. Big Data : Concept and Applications
  • 2. Big data: Concept & Applications Big data is the term for collection of dataset so large and complex that it become difficult to process using on hand database management tools or traditional data processing applications. The amount of data that is beyond the storage and processing capabilities of single physical machine then it is called Big data. Big data ? Large volume of data Existing tools were not designed to handle such a huge data . Gigabyte  Terabyte  Petabyte  Exabyte  Zeta byte
  • 3. Title : Big data : Concept & Applications Amazon  collect social data ,log data , different flavor of data. Walmart  handles more than 1 million customer transactions every hour. Twitter  300000 tweets per minutes Instagram  250000 upload new picture per minutes Email  5 million messages (gmail) WhatsApp  4,00,000 pictures per min Google  5 millions search request per min Facebook  2.5 millions contents per min 500 TB per day Having data bigger it requires different approaches: Techniques, Tools and Architecture An aim to solve new problems or old problems in a better way Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.
  • 4. Big data : 3V •Variety data coming from various sources • Velocity real time live streaming data • Volume in order of terabyte and petabyte
  • 5. Title : Big data : Concept & Applications Big Data are in everywhere. Network Analysis Social Network Web Graph
  • 6. Bigdata : Volume  Volume of data is increasing in every second  Data will be measured in TB and ZB.  Amount of data will be double in every two years  100 terabytes of data are uploaded daily to Facebook  100 hours of video uploaded in every minute  Research estimated 65% annual growth in digital contents , mainly unstructured data. Gigabyte  Terabyte  Petabyte  Exabyte  Zetabyte
  • 7. Data is created real time Internet of thing (IOT), social media – major contributor for the speed at which the data is generated. In every minute 25 million queries on Google  20 million photos are viewed on Flickr  over 200 million emails are sent Big Data : Velocity
  • 8. Data are coming in all shape structured, semistructured, unstructured & even complexed structure 90% of data generated is ‘unstructured’ starting from text to audio, image or video data. Big Data : Variety
  • 9. Big Data Life Cycle Storage Capacity 2000 2018 Storage MB PB 2025 Processing Speed
  • 11. 11 Hadoop Apache Hadoop is a framework for storing ,processing and analyzing big data. •Distributed •Scalable •Open Source
  • 12. 12 Why Hadoop? • 1 TB data is processed by 1 computer • Each computer is having 4 I/O channel of 100 mbps. • Total time required : 44 minutes 1 TB data is processed by 10 computers (same configuration) parallel . Total time required : 4.4 minutes CASE 1 CASE 2
  • 13. 13 HDFS (Hadoop Distributed file System) - Stores data on the cluster HDFS is a file system written in Java Provide storage for massive amount of data - Scalable - Fault Tolerance - Support efficient processing in MR
  • 14. 14 Hadoop : How files are stored? -Data files split into blocks and distributed to data nodes - Each block is replicated in multiple nodes ( default 3x)
  • 15. 15 HDFS (Master/Slaves Architecture) Master machine is Name Node Slaves machine are Data Node
  • 16. 16 MAP REDUCE Map Reduce is a framework for executing highly parallelizable and distributable algorithms across huge datasets.
  • 17. 17 Map Reduce : Mappers run parallel
  • 18. 18 Mad Reduce : Analyzing data
  • 21. 21 HADOOP Hadoop =HDFS + Map Reduce Hadoop HDFS commands are similar to unix command. Map reduce is programming model Hive  Data Manipulation (like SQL) Pig  Data Manipulation using Script Sqoop  Import and Export on HDFS
  • 22. 22 Import/Export using Sqoop and Flume Sqoop : Transfers data between RDBMS and HDFS Flume : A service to move large amounts of data in real Time
  • 23. Applications E-commerce : (Amazon) Recommendation Engine -User buy pattern -Digital Marketing Analysis Telecommunication -Call drop Analysis -Network Problem Optimization Entertainment -Content Analytics (Netflix) Sports -Fitness Management (fitbit) Health Care -Early Disease Detection (pfizer)
  • 24. Applications Technology: In the technology, it is used in the websites like eBay, Amazon and Facebook and Google utilize it. Private sector: The application of big data in the private sector includes the retail, retail banking, and real estate. Government: The big data is also utilized by the Indian government. International development: The development in the big data analysis furnishes cost-effective opportunities to enhance the decision in critical advancement areas like health care, employment opportunities and crime, security and natural disaster. Hence, in this way, the big data is helpful for the international development.

Notas del editor

  1. INSTRUCTIONS: Standard technical results slide (2-slide version). Please keep this layout and subheadings. A template is at the end of this exemplar. Bar-Noy, Basu, Johnson, Ramanathan, “Minimum-cost Broadcast through Varying-size Neighborcast”, Algosensors 2011, Germany, Sept 2011 Johnson, Phelan, Bar-Noy, Basu, Ramanathan, “Minimum-cost Broadcast through Varying-size Neighborcast”, Draft for submission to IEEE ToN (ToN paper has some more hardness results, simulation study and comparisons) The problem of interest is to broadcast a message originating at a source node to all nodes in the network. Source node and relay node can multicast to a subset of their neighbors (and they may also perform multiple multicasts to disjoint sets of neighbors). If a node multicasts to a subset k of its neighbors, the incurred cost is 1 + A k^b, where A, b are non-negative constants; `1’ represents the normalized cost of the (first) transmission, and the second term the cost of ACKs and re-transmits. The work also considers the case where the second term is either a sub-linear or a super-linear function of k. The minimum cost problem is formulated as an integer programming problem, and is NP-hard for a range of b expressed as a function of A. The top line in the table is, in fact, a very important result: if b > g(A) := log2( 2 + 1/A), then multicast cannot outperform unicast; thus, the spanning tree is optimal. If b=0, problem reduces to the connected dominating set (CDS) problem for which approximability results are known; the approximation ratio is HΔ+ 2 If b=1, problem reduces to minimizing number of transmitters (equivalently the maximum leaf spanning tree); a polynomial time algorithm with approximation ratio 2 is known; the paper improves the approximation ratio by using a pruned CDS approach. For b > g(A), spanning tree is optimal For b < g(A), the problem is shown to be NP-hard For 1 < b < g(A), the paper shows that a spanning tree has very good approximation ratio (less than 2) For 0 < b < 1, a greedy algorithm is proposed and its approximation ratio derived. Note that the approximation ratio improves with larger b and smaller Δ Overall note that the approximation ratio becomes worse for smaller b Note: the network size ‘n’ plays a part in the `inapproximability’ results The model assumes a known cost function; but the exponent ‘b’ depends both upon the actual protocol as well as open the operating environment (e.g., congestion). Thus ‘b’ may vary and may be hard to estimate. How sensitive is the proposed algorithm when there are errors in estimating ‘b’? The figure on the right shows cost (as incurred by the proposed algorithm) vs. the actual ‘b’ of the underlying cost function; the black curve is the `optimal’ one – it uses the true value of ‘b’; the performance of the algorithm when ‘b’ is assumed to be fixed at some value is shown Here, Δ is the maximum node degree in the graph Hn = n-th Harmonic number = 1 + 1/2 + 1/3 + ¼ + … + 1/n ~= log (n) + \gamma + small constant Where \gamma is the Euler-Mascheroni constant, approximately 0.5772
  2. INSTRUCTIONS: Standard technical results slide (2-slide version). Please keep this layout and subheadings. A template is at the end of this exemplar. Bar-Noy, Basu, Johnson, Ramanathan, “Minimum-cost Broadcast through Varying-size Neighborcast”, Algosensors 2011, Germany, Sept 2011 Johnson, Phelan, Bar-Noy, Basu, Ramanathan, “Minimum-cost Broadcast through Varying-size Neighborcast”, Draft for submission to IEEE ToN (ToN paper has some more hardness results, simulation study and comparisons) The problem of interest is to broadcast a message originating at a source node to all nodes in the network. Source node and relay node can multicast to a subset of their neighbors (and they may also perform multiple multicasts to disjoint sets of neighbors). If a node multicasts to a subset k of its neighbors, the incurred cost is 1 + A k^b, where A, b are non-negative constants; `1’ represents the normalized cost of the (first) transmission, and the second term the cost of ACKs and re-transmits. The work also considers the case where the second term is either a sub-linear or a super-linear function of k. The minimum cost problem is formulated as an integer programming problem, and is NP-hard for a range of b expressed as a function of A. The top line in the table is, in fact, a very important result: if b > g(A) := log2( 2 + 1/A), then multicast cannot outperform unicast; thus, the spanning tree is optimal. If b=0, problem reduces to the connected dominating set (CDS) problem for which approximability results are known; the approximation ratio is HΔ+ 2 If b=1, problem reduces to minimizing number of transmitters (equivalently the maximum leaf spanning tree); a polynomial time algorithm with approximation ratio 2 is known; the paper improves the approximation ratio by using a pruned CDS approach. For b > g(A), spanning tree is optimal For b < g(A), the problem is shown to be NP-hard For 1 < b < g(A), the paper shows that a spanning tree has very good approximation ratio (less than 2) For 0 < b < 1, a greedy algorithm is proposed and its approximation ratio derived. Note that the approximation ratio improves with larger b and smaller Δ Overall note that the approximation ratio becomes worse for smaller b Note: the network size ‘n’ plays a part in the `inapproximability’ results The model assumes a known cost function; but the exponent ‘b’ depends both upon the actual protocol as well as open the operating environment (e.g., congestion). Thus ‘b’ may vary and may be hard to estimate. How sensitive is the proposed algorithm when there are errors in estimating ‘b’? The figure on the right shows cost (as incurred by the proposed algorithm) vs. the actual ‘b’ of the underlying cost function; the black curve is the `optimal’ one – it uses the true value of ‘b’; the performance of the algorithm when ‘b’ is assumed to be fixed at some value is shown Here, Δ is the maximum node degree in the graph Hn = n-th Harmonic number = 1 + 1/2 + 1/3 + ¼ + … + 1/n ~= log (n) + \gamma + small constant Where \gamma is the Euler-Mascheroni constant, approximately 0.5772