SlideShare una empresa de Scribd logo
1 de 29
No

NoSQL Databases
NoSQL DataBases
By:
Muluken Sholaye
(mulesho2490@gmail.com)
Sept,2021
CAP Theorem

Consistency, Availability, Partition Tolerance (CAP)

You can’t continually maintain perfect consistency,
availability, and partition tolerance simultaneously.

CAP is defined by:-

Consistency: all nodes see the same data at the same time

Availability: a guarantee that every request receives a
response about whether it

was successful or failed

Partition tolerance: the system continues to operate despite
arbitrary message loss
CAP Theorem

A distributed system can satisfy a maximum of two
of the following gurantees.


NoSQL databases are next generation databases mostly addressing
some of the points:

Being non-relational,

distributed,

open-source, and

horizontally scalable

Often more characteristics apply to NoSQL databases such as:
Schema-free, easy replication support, simple API, eventually
consistent/BASE (basically available, soft-state, eventual consistency

Not ACID but BASE
NoSQL Databases
Properties of NoSQL Databases

Non-relational

Distributed

Open-source

Horizontally scalable

Schema-free

Easy replication support

Simple API

BASE not ACID
The current number of NoSQL databases has more than 225.
NoSQL databases are widely used in many famous enterprises such as
Google, Yahoo, Facebook, Twitter, Taobao, Amazon, and so on
Categories of NoSQL Databases
●
Here are the four main types of NoSQL databases:
●
Document databases
●
Key-value stores
●
Column-oriented databases
●
Graph databases
●
According to the statistics of the DB-Engines
Ranking website, Apache Cassandra and Apache
HBase are the more widely discussed ones of the
wide column store databases.
Document based
●
A document database stores data in JSON, BSON ,
or XML documents.
●
In a document database, documents can be nested.
Particular elements can be indexed for faster
querying.
●
The most widely adopted document databases are
usually implemented with a scale-out architecture,
providing a clear path to scalability of both data
volumes and traffic.
●
Examples of document stores are MongoDB and
CouchDB.
Cont’d
●
A collection is a group of documents. The
documents within a collection are usually related
to the same subject, such as employees, products,
and so on.
●
A document is a set of ordered key-value pairs,
where key is a string used to reference a
particular value, and value can be either a string
or a document.
●
JSON (JavaScript Object Notation), BSON (Binary
JSON), and XML (eXtensible Markup Language) are
formats commonly used to define documents.
Cont’d
KEY-VALUE STORES
●
Key-value stores are the least complex of the NoSQL databases.
They are, as the name suggests, a collection of key-value pairs.
●
The data in this category of NoSQL databases is stored with the
format of “Key → Value” ,
●
where
●
Key is a string used to identify a unique value;
●
Value is an object whose value can be a simple string, numeric
value, or a complex BLOB JSON object, image, audio, and so
on;
●
According to the statistics of the DB-Engines Ranking Website,
both Redis and DynamoDB.
Cont’d
Graph Databases
●
The most complex one, geared toward storing
relations between entities in an efficient manner.
●
The graph database model (GDM) is composed of
vertices and edges [5], where
– A vertex is an entity instance, which is equivalent to a
tuple in RDM;
– An edge is used to define the relationship between
vertices;
– Each vertex and edge contains any number of attributes
that store the actual data value
●
Cont’d
Assignment
●
Hbase
●
CouchDB
●
Cassandra
●
Redis
●
MongoDB
●
Note:- Take One database from the list and study
– The basics of the database
– Installation and usage
– Demo
●
ETA = 5 Days
Columnar Databases
●
They are index based databases arranged into
columns.
●
Hbase is the most commonly used.
Bigdata Frameworks
Basics
●
The major challenges associated with big data are as follows
−
– Capturing data
– Curation
– Storage
– Searching
– Sharing
– Transfer
– Analysis
– Presentation
●
To fulfill the above challenges, organizations normally take
the help of enterprise Solutions of Layered Frameworks.
Hadoop Ecosystem
●
Apache Hadoop is an open source framework.
●
Hadoop provides businesses with the ability to distribute data storage,
parallel processing, and process data at higher volume, higher velocity,
variety, value, and veracity.
●
Hadoop Ecosystem is a platform or a suite which provides various
services to solve the big data problems. It includes Many Apache projects.
– HDFS: Hadoop Distributed File System
– YARN: Yet Another Resource Negotiator
– MapReduce: Programming based Data Processing
– Spark: In-Memory data processing
– PIG, HIVE: Query based processing of data services
– HBase: NoSQL Database
– Mahout, Spark MLLib: Machine Learning algorithm libraries
– Solar, Lucene: Searching and Indexing
– Zookeeper: Managing cluster
– Flume,Chukwa, Scribe, Kafka, Sqoop : Data collection
Cont’d
●
All these toolkits or components revolve around one term
i.e. Data.
●
That’s the beauty of Hadoop that it revolves around data
and hence making its synthesis easier.
●
There are four major elements
of Hadoop i.e.
– HDFS,
– MapReduce,
– YARN, and
– Hadoop Common.
●
Let’s study each in more detail.
HDFS
●
HDFS is is responsible for storing large data sets of structured
or unstructured data across various nodes and thereby
maintaining the metadata in the form of log files.
●
HDFS consists of two core components i.e.
– Name node
– Data Node
●
Name Node is the prime node which contains metadata (data
about data) requiring comparatively fewer resources than the data
nodes that stores the actual data.
●
These data nodes are commodity hardware in the distributed
environment. Undoubtedly, making Hadoop cost effective.
●
HDFS maintains all the coordination between the clusters and
hardware, thus working at the heart of the system.
MapReduce
●
By making the use of distributed and parallel algorithms,
MapReduce makes it possible to carry over the processing’s
logic and helps to write applications which transform big
data sets into a manageable one.
●
MapReduce makes the use of two functions i.e. Map()
and Reduce() whose task is:
– Map() performs sorting and filtering of data and thereby
organizing them in the form of group. Map generates a key-value
pair based result which is later on processed by the Reduce()
method.
– Reduce(), as the name suggests does the summarization by
aggregating the mapped data. In simple, Reduce() takes the output
generated by Map() as input and combines those tuples into
smaller set of tuples.
●
A Word Count Example of MapReduce
●
Let us understand, how a MapReduce works
by taking an example where I have a text file
called example.txt whose contents are as
follows:
●
Dear, Bear, River, Car, Car, River, Deer, Car
and Bear
●
Now, suppose, we have to perform a word
count on the sample.txt using MapReduce. So,
we will be finding unique words and the
number of occurrences of those unique words.
●
Example
Nosql
Nosql

Más contenido relacionado

La actualidad más candente

Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databasesAshwani Kumar
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsDATAVERSITY
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingDATAVERSITY
 

La actualidad más candente (20)

Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
NoSql
NoSqlNoSql
NoSql
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture Patterns
 
the rising no sql technology
the rising no sql technologythe rising no sql technology
the rising no sql technology
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
 

Similar a Nosql

NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3RojaT4
 
Assignment_4
Assignment_4Assignment_4
Assignment_4Kirti J
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Ahmed Rashwan
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBhavya Gulati
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014Stratebi
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMohan Rathour
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
 

Similar a Nosql (20)

NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Assignment_4
Assignment_4Assignment_4
Assignment_4
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Datastores
DatastoresDatastores
Datastores
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
No sql database
No sql databaseNo sql database
No sql database
 
HADOOP
HADOOPHADOOP
HADOOP
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
 

Último

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Último (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

Nosql

  • 3. CAP Theorem  Consistency, Availability, Partition Tolerance (CAP)  You can’t continually maintain perfect consistency, availability, and partition tolerance simultaneously.  CAP is defined by:-  Consistency: all nodes see the same data at the same time  Availability: a guarantee that every request receives a response about whether it  was successful or failed  Partition tolerance: the system continues to operate despite arbitrary message loss
  • 4. CAP Theorem  A distributed system can satisfy a maximum of two of the following gurantees. 
  • 5.  NoSQL databases are next generation databases mostly addressing some of the points:  Being non-relational,  distributed,  open-source, and  horizontally scalable  Often more characteristics apply to NoSQL databases such as: Schema-free, easy replication support, simple API, eventually consistent/BASE (basically available, soft-state, eventual consistency  Not ACID but BASE NoSQL Databases
  • 6. Properties of NoSQL Databases  Non-relational  Distributed  Open-source  Horizontally scalable  Schema-free  Easy replication support  Simple API  BASE not ACID The current number of NoSQL databases has more than 225. NoSQL databases are widely used in many famous enterprises such as Google, Yahoo, Facebook, Twitter, Taobao, Amazon, and so on
  • 7. Categories of NoSQL Databases ● Here are the four main types of NoSQL databases: ● Document databases ● Key-value stores ● Column-oriented databases ● Graph databases ● According to the statistics of the DB-Engines Ranking website, Apache Cassandra and Apache HBase are the more widely discussed ones of the wide column store databases.
  • 8. Document based ● A document database stores data in JSON, BSON , or XML documents. ● In a document database, documents can be nested. Particular elements can be indexed for faster querying. ● The most widely adopted document databases are usually implemented with a scale-out architecture, providing a clear path to scalability of both data volumes and traffic. ● Examples of document stores are MongoDB and CouchDB.
  • 9. Cont’d ● A collection is a group of documents. The documents within a collection are usually related to the same subject, such as employees, products, and so on. ● A document is a set of ordered key-value pairs, where key is a string used to reference a particular value, and value can be either a string or a document. ● JSON (JavaScript Object Notation), BSON (Binary JSON), and XML (eXtensible Markup Language) are formats commonly used to define documents.
  • 11. KEY-VALUE STORES ● Key-value stores are the least complex of the NoSQL databases. They are, as the name suggests, a collection of key-value pairs. ● The data in this category of NoSQL databases is stored with the format of “Key → Value” , ● where ● Key is a string used to identify a unique value; ● Value is an object whose value can be a simple string, numeric value, or a complex BLOB JSON object, image, audio, and so on; ● According to the statistics of the DB-Engines Ranking Website, both Redis and DynamoDB.
  • 13. Graph Databases ● The most complex one, geared toward storing relations between entities in an efficient manner. ● The graph database model (GDM) is composed of vertices and edges [5], where – A vertex is an entity instance, which is equivalent to a tuple in RDM; – An edge is used to define the relationship between vertices; – Each vertex and edge contains any number of attributes that store the actual data value ●
  • 15. Assignment ● Hbase ● CouchDB ● Cassandra ● Redis ● MongoDB ● Note:- Take One database from the list and study – The basics of the database – Installation and usage – Demo ● ETA = 5 Days
  • 16. Columnar Databases ● They are index based databases arranged into columns. ● Hbase is the most commonly used.
  • 18. Basics ● The major challenges associated with big data are as follows − – Capturing data – Curation – Storage – Searching – Sharing – Transfer – Analysis – Presentation ● To fulfill the above challenges, organizations normally take the help of enterprise Solutions of Layered Frameworks.
  • 19. Hadoop Ecosystem ● Apache Hadoop is an open source framework. ● Hadoop provides businesses with the ability to distribute data storage, parallel processing, and process data at higher volume, higher velocity, variety, value, and veracity. ● Hadoop Ecosystem is a platform or a suite which provides various services to solve the big data problems. It includes Many Apache projects. – HDFS: Hadoop Distributed File System – YARN: Yet Another Resource Negotiator – MapReduce: Programming based Data Processing – Spark: In-Memory data processing – PIG, HIVE: Query based processing of data services – HBase: NoSQL Database – Mahout, Spark MLLib: Machine Learning algorithm libraries – Solar, Lucene: Searching and Indexing – Zookeeper: Managing cluster – Flume,Chukwa, Scribe, Kafka, Sqoop : Data collection
  • 20.
  • 21. Cont’d ● All these toolkits or components revolve around one term i.e. Data. ● That’s the beauty of Hadoop that it revolves around data and hence making its synthesis easier. ● There are four major elements of Hadoop i.e. – HDFS, – MapReduce, – YARN, and – Hadoop Common. ● Let’s study each in more detail.
  • 22. HDFS ● HDFS is is responsible for storing large data sets of structured or unstructured data across various nodes and thereby maintaining the metadata in the form of log files. ● HDFS consists of two core components i.e. – Name node – Data Node ● Name Node is the prime node which contains metadata (data about data) requiring comparatively fewer resources than the data nodes that stores the actual data. ● These data nodes are commodity hardware in the distributed environment. Undoubtedly, making Hadoop cost effective. ● HDFS maintains all the coordination between the clusters and hardware, thus working at the heart of the system.
  • 23. MapReduce ● By making the use of distributed and parallel algorithms, MapReduce makes it possible to carry over the processing’s logic and helps to write applications which transform big data sets into a manageable one. ● MapReduce makes the use of two functions i.e. Map() and Reduce() whose task is: – Map() performs sorting and filtering of data and thereby organizing them in the form of group. Map generates a key-value pair based result which is later on processed by the Reduce() method. – Reduce(), as the name suggests does the summarization by aggregating the mapped data. In simple, Reduce() takes the output generated by Map() as input and combines those tuples into smaller set of tuples.
  • 24.
  • 25.
  • 26. ● A Word Count Example of MapReduce ● Let us understand, how a MapReduce works by taking an example where I have a text file called example.txt whose contents are as follows: ● Dear, Bear, River, Car, Car, River, Deer, Car and Bear ● Now, suppose, we have to perform a word count on the sample.txt using MapReduce. So, we will be finding unique words and the number of occurrences of those unique words. ●