SlideShare a Scribd company logo
1 of 5
Top Big Data Terms
Term Definition
Hadoop Open-source software framework that supports the running of applications
on large clusters of commodity hardware. Hadoop is written in Java.
HDFS Stands for Hadoop Distributed File System. HDFS is a distributed file system
that stores large files across multiple machines. The system replicates data
across multiple machines and understand what data is being processed when
and by whom
MapReduce MapReduce is a programming model for processing large data sets with a
parallel, distributed algorithm on a cluster. Its Map() procedure filters and
sorts and its Reduce() procedure performs summary operations.
Hive A Data Warehouse infrastructure built on top of Hadoop for providing data
summarization, query, and analysis.
Hbase HBase is an open source, non-relational, distributed database and runs on
top of HDFS.
Cassandra Apache Cassandra is an open source distributed database management
system designed to handle very large amounts of data spread out across
many commodity servers.
Source: Wikipedia (mainly)
Sizes that Matter
Name Value Example
1 Bit = The smallest unit of data that a computer uses. It can be used
to represent two states of information, such as Yes or No.
1 Byte = 8 Bits. A Byte can represent 256 states of information. 1 Byte
could be equal to one character. 10 Bytes could be equal to a
word. 100 Bytes would equal an average sentence.
1 kilobyte (kB) 1024 bytes 1 Kilobyte would be equal to a paragraph.
1 megabyte (MB) 1024 kB 3-1/2 inch floppy disks can hold 1.44 Megabytes or the
equivalent of a small book. 600 Megabytes is about the
amount of data that will fit on a CD-ROM disk.
1 gigabyte (GB) 1024 MB 1GB could hold the contents of about 10 yards of books .
1 terabyte (TB) 1024 GB 1 TB could hold 1,000 copies of the Encyclopedia Britannica.
1 petabyte (PB) 1024 TB 500 million floppy disks
1 exabyte (EB) 1024 PB 5 Exabytes could = all of the words ever spoken by mankind.
1 zettabyte (ZB) 1024 PB ?
Source: http://www.whatsabyte.com/
TRY IT @ WWW.SISENSE.COM
Glossary of Big Data Terms

More Related Content

More from Bruno Aziza

More from Bruno Aziza (20)

AI Weekly - April 5, 2021
AI Weekly - April 5, 2021AI Weekly - April 5, 2021
AI Weekly - April 5, 2021
 
Ai Weekly - March 29, 2021
Ai Weekly - March 29, 2021Ai Weekly - March 29, 2021
Ai Weekly - March 29, 2021
 
AI Weekly - March 22, 2021
AI Weekly - March 22, 2021AI Weekly - March 22, 2021
AI Weekly - March 22, 2021
 
AI Weekly - March 7, 2021
AI Weekly - March 7, 2021AI Weekly - March 7, 2021
AI Weekly - March 7, 2021
 
AI Weekly - March 1, 2021
AI Weekly - March 1, 2021AI Weekly - March 1, 2021
AI Weekly - March 1, 2021
 
AI Weekly - February 22, 2021
AI Weekly - February 22, 2021AI Weekly - February 22, 2021
AI Weekly - February 22, 2021
 
AI Weekly February 7, 2021
AI Weekly February 7, 2021AI Weekly February 7, 2021
AI Weekly February 7, 2021
 
AI Weekly - January 30, 2021
AI Weekly - January 30, 2021AI Weekly - January 30, 2021
AI Weekly - January 30, 2021
 
AI Weekly - January 17, 2021
AI Weekly - January 17, 2021AI Weekly - January 17, 2021
AI Weekly - January 17, 2021
 
AI Weekly - January 11, 2021
AI Weekly - January 11, 2021AI Weekly - January 11, 2021
AI Weekly - January 11, 2021
 
AI Weekly - December 27, 2020
AI Weekly  - December 27, 2020AI Weekly  - December 27, 2020
AI Weekly - December 27, 2020
 
AI Weekly - December 7, 2020
AI Weekly - December 7, 2020AI Weekly - December 7, 2020
AI Weekly - December 7, 2020
 
AI Weekly - November 30, 2020
AI Weekly - November 30, 2020AI Weekly - November 30, 2020
AI Weekly - November 30, 2020
 
AI Weekly: Predictions for 2021
AI Weekly: Predictions for 2021AI Weekly: Predictions for 2021
AI Weekly: Predictions for 2021
 
AI Weekly November 8, 2020
AI Weekly  November 8, 2020AI Weekly  November 8, 2020
AI Weekly November 8, 2020
 
Ai Weekly - November 1, 2020
Ai Weekly - November 1, 2020Ai Weekly - November 1, 2020
Ai Weekly - November 1, 2020
 
AI Weekly - October 18, 2020
AI Weekly - October 18, 2020AI Weekly - October 18, 2020
AI Weekly - October 18, 2020
 
AI Weekly - July 26, 2020
AI Weekly - July 26, 2020AI Weekly - July 26, 2020
AI Weekly - July 26, 2020
 
AI Weekly - July 5, 2020
AI Weekly - July 5, 2020AI Weekly - July 5, 2020
AI Weekly - July 5, 2020
 
AI Weekly - June 15, 2020
AI Weekly - June 15, 2020AI Weekly - June 15, 2020
AI Weekly - June 15, 2020
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Glossary of Big Data Terms

  • 1.
  • 2. Top Big Data Terms Term Definition Hadoop Open-source software framework that supports the running of applications on large clusters of commodity hardware. Hadoop is written in Java. HDFS Stands for Hadoop Distributed File System. HDFS is a distributed file system that stores large files across multiple machines. The system replicates data across multiple machines and understand what data is being processed when and by whom MapReduce MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster. Its Map() procedure filters and sorts and its Reduce() procedure performs summary operations. Hive A Data Warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hbase HBase is an open source, non-relational, distributed database and runs on top of HDFS. Cassandra Apache Cassandra is an open source distributed database management system designed to handle very large amounts of data spread out across many commodity servers. Source: Wikipedia (mainly)
  • 3. Sizes that Matter Name Value Example 1 Bit = The smallest unit of data that a computer uses. It can be used to represent two states of information, such as Yes or No. 1 Byte = 8 Bits. A Byte can represent 256 states of information. 1 Byte could be equal to one character. 10 Bytes could be equal to a word. 100 Bytes would equal an average sentence. 1 kilobyte (kB) 1024 bytes 1 Kilobyte would be equal to a paragraph. 1 megabyte (MB) 1024 kB 3-1/2 inch floppy disks can hold 1.44 Megabytes or the equivalent of a small book. 600 Megabytes is about the amount of data that will fit on a CD-ROM disk. 1 gigabyte (GB) 1024 MB 1GB could hold the contents of about 10 yards of books . 1 terabyte (TB) 1024 GB 1 TB could hold 1,000 copies of the Encyclopedia Britannica. 1 petabyte (PB) 1024 TB 500 million floppy disks 1 exabyte (EB) 1024 PB 5 Exabytes could = all of the words ever spoken by mankind. 1 zettabyte (ZB) 1024 PB ? Source: http://www.whatsabyte.com/
  • 4. TRY IT @ WWW.SISENSE.COM