CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

A
Asst.Prof. M.GokilavaniProfessor en KL university
CCS334 BIG DATAANALYTICS
(R-21 III (I Sem))
Department of Artificial Intelligence and Data Science )
Session 3
by
Asst.Prof.M.Gokilavani
NIET
9/19/2023 Department of AI & DS 1
TEXT BOOKS
• Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data,
Big Analytics: Emerging Business Intelligence and Analytic Trends for
Today's Businesses", Wiley, 2013.
• Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
• Sadalage, Pramod J. “NoSQL distilled”, 2013.
REFERENCES
• E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive",
O'Reilley, 2012.
• Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
• Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
9/19/2023 Department of AI & DS 2
Topics covered in Unit 2 session
9/19/2023 Department of AI & DS 3
UNIT II NOSQL DATA MANAGEMENT
Introduction to NoSQL – aggregate data models – key-value and
document data models – relationships – graph databases – schema
less databases – materialized views – distribution models – master-
slave replication – consistency - Cassandra – Cassandra data model
– Cassandra examples – Cassandra clients.
Distribution Models
• Depending on the distribution model the data store can give us the ability:
• To handle large Quantity of data
• To process a grater read or write traffic
• To have more availability in the case of network slowdowns of
breakages.
• There are two path for distribution:
• Sharding: Sharding distributes different data across multiple
servers, so each server acts as the single source for a subset of
data.
• Replication: Replication copies data across multiple servers,
so each bit of data can be found in multiple places.
9/19/2023 Department of AI & DS 4
Distributed Models
• Single server
• Sharding
• Master Slave Replication
• Peer-to-Peer Replication
• Combining Sharding & Replication
9/19/2023 Department of AI & DS 5
Single Server
• It is the first and simplest distribution option.
• Also if NoSQL database are designed to run on a cluster they can be
used in a single server application.
• This make sense if a NoSQL database is more suited for the
application data model.
• Graph database are the more obvious.
• If data usage is most about processing aggregates, than a key or a
document store may be useful.
9/19/2023 Department of AI & DS 6
Sharding
• Often, a data store is busy because different people are accessing
different part of the dataset.
• In this cases we can support horizontal scalability by putting different
part of the data onto different servers (Sharding)
• The concept of sharding is not new as a part of application logic.
• It consists in put all the customer with surname A-D on one shard and
E-G to another
• This complicates the programming model as the application code
needs to distributed the load across the shards.
• In the ideal setting we have each user to talk one server and the load is
balanced.
• Of course the ideal case is rare.
9/19/2023 Department of AI & DS 7
9/19/2023 Department of AI & DS 8
Sharding and NoSQL
• In general, many NoSQL databases offers auto- sharding.
• This can make much easier to use sharding in an application.
• Sharding is especially valuable for performance because it improves
read and write performances.
• It scales read and writes on the different nodes of the same cluster.
9/19/2023 Department of AI & DS 9
Master – Slave Replication
• In this setting one node is designated as the master, or primary ad the
other as slaves.
• The master is the authoritative source for the date and designed to
process updates and send them to slaves.
• The slaves are used for read operations.
• This allows us to scale in data intensive dataset.
9/19/2023 Department of AI & DS 10
9/19/2023 Department of AI & DS 11
Master – Slave Replication
• We can scale horizontally by adding more slaves.
• But, we are limited by the ability of the master in processing incoming
data.
An advantage is read resilience.
Disadvantage:
• Also if the master fails the slaves can still handle read requests.
• Anyway writes are not allowed until the master is not restored.
9/19/2023 Department of AI & DS 12
Master – Slave Replication
• Another characteristic is that a slave can be appointed as master.
• Masters can be appointed manually or automatically.
• In order to achieve resilience we need that read and write paths are
different.
• This is normally done using separate database connections.
• Disadvantage: Replication in master-slave have the analyzed
advantages but it come with the problem of inconsistency.
• The readers reading from the slaves can read data not updated.
9/19/2023 Department of AI & DS 13
Master – Slave Replication in MongoDB
9/19/2023 Department of AI & DS 14
Peer to Peer Replication
• Master-Slave replication helps with read scalability but has problems
on scalability of writes.
• Moreover, it provides resilience on read but not on writes.
• The master is still a single point of failure.
• Peer-to-Peer attacks these problems by not having a master.
• All the replica are equal (accept writes and reads).
• With a Peer-to-Peer we can have node failures without lose write
capability and losing data.
9/19/2023 Department of AI & DS 15
9/19/2023 Department of AI & DS 16
Combine Sharing
• We have multiple masters, but each data has a single master.
• Depending on the configuration we can decide the master for each
group of data.
• Peer-to-Peer and sharding is a common strategy for column-family
databases.
• This is commonly composed using replication of the shards.
9/19/2023 Department of AI & DS 17
Topics to be covered in next session 4
• Cassandra data model
9/19/2023 Department of AI & DS 18
Thank you!!!
1 de 18

Recomendados

HBase in Practice por
HBase in Practice HBase in Practice
HBase in Practice DataWorks Summit/Hadoop Summit
5.4K vistas46 diapositivas
LLAP: Building Cloud First BI por
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
1.1K vistas59 diapositivas
Introduction to Redis por
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
11.1K vistas31 diapositivas
Using all of the high availability options in MariaDB por
Using all of the high availability options in MariaDBUsing all of the high availability options in MariaDB
Using all of the high availability options in MariaDBMariaDB plc
2.9K vistas31 diapositivas
MariaDB Performance Tuning and Optimization por
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB plc
11.3K vistas51 diapositivas
Redis introduction por
Redis introductionRedis introduction
Redis introductionFederico Daniel Colombo Gennarelli
5.9K vistas22 diapositivas

Más contenido relacionado

La actualidad más candente

Understanding the architecture of MariaDB ColumnStore por
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
2.8K vistas40 diapositivas
Building Event Streaming Architectures on Scylla and Kafka por
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
868 vistas56 diapositivas
Introduction to redis por
Introduction to redisIntroduction to redis
Introduction to redisNexThoughts Technologies
989 vistas21 diapositivas
Cassandra por
CassandraCassandra
CassandraUpaang Saxena
2.9K vistas28 diapositivas
MariaDB Server Performance Tuning & Optimization por
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB plc
8.1K vistas43 diapositivas
Introduction To HBase por
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
87.9K vistas18 diapositivas

La actualidad más candente(20)

Understanding the architecture of MariaDB ColumnStore por MariaDB plc
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
MariaDB plc2.8K vistas
Building Event Streaming Architectures on Scylla and Kafka por ScyllaDB
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
ScyllaDB868 vistas
MariaDB Server Performance Tuning & Optimization por MariaDB plc
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc8.1K vistas
Introduction To HBase por Anil Gupta
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta87.9K vistas
Top 5 Mistakes to Avoid When Writing Apache Spark Applications por Cloudera, Inc.
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.127.8K vistas
Understanding Data Consistency in Apache Cassandra por DataStax
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
DataStax26.5K vistas
Chapter1: NoSQL: It’s about making intelligent choices por Maynooth University
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
Maynooth University2.2K vistas
MySQL InnoDB Cluster - A complete High Availability solution for MySQL por Olivier DASINI
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
Olivier DASINI7.4K vistas
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka por Edureka!
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!1.5K vistas
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer por Andrejs Karpovs
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Andrejs Karpovs3.1K vistas
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ... por Simplilearn
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn10.1K vistas
Introduction to Apache Spark por Rahul Jain
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain24.8K vistas
How to use WebKitGtk+ por Joone Hur
How to use WebKitGtk+How to use WebKitGtk+
How to use WebKitGtk+
Joone Hur5.6K vistas
The Parquet Format and Performance Optimization Opportunities por Databricks
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks8.2K vistas

Similar a CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

Session 1 Introduction to NoSQL.pptx por
 Session 1 Introduction to NoSQL.pptx Session 1 Introduction to NoSQL.pptx
Session 1 Introduction to NoSQL.pptxAsst.Prof. M.Gokilavani
14 vistas18 diapositivas
Report 2.0.docx por
Report 2.0.docxReport 2.0.docx
Report 2.0.docxpinstechwork
7 vistas6 diapositivas
Introduction to NoSQL and MongoDB por
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
158 vistas26 diapositivas
NoSQLDatabases por
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
395 vistas39 diapositivas
No sql database por
No sql databaseNo sql database
No sql databasevishal gupta
365 vistas25 diapositivas
6 Data Modeling for NoSQL 2/2 por
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
12.2K vistas45 diapositivas

Similar a CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx(20)

Introduction to NoSQL and MongoDB por Ahmed Farag
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
Ahmed Farag158 vistas
NoSQLDatabases por Adi Challa
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa395 vistas
6 Data Modeling for NoSQL 2/2 por Fabio Fumarola
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola12.2K vistas
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt... por PascalDesmarets1
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
PascalDesmarets159 vistas
A survey on data mining and analysis in hadoop and mongo db por Alexander Decker
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker307 vistas
A survey on data mining and analysis in hadoop and mongo db por Alexander Decker
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker542 vistas
Cloud Computing: The Hard Problems Never Go Away por ZendCon
Cloud Computing: The Hard Problems Never Go AwayCloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go Away
ZendCon2.4K vistas
Relational and non relational database 7 por abdulrahmanhelan
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan230 vistas
Objectivity/DB: A Multipurpose NoSQL Database por InfiniteGraph
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL Database
InfiniteGraph5.1K vistas
NoSQL Seminer por Partha Das
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
Partha Das760 vistas

Más de Asst.Prof. M.Gokilavani

AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdf por
AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdfAI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdf
AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdfAsst.Prof. M.Gokilavani
11 vistas16 diapositivas
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdf por
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdfAI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdf
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdfAsst.Prof. M.Gokilavani
21 vistas34 diapositivas
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdf por
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdfAI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdf
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdfAsst.Prof. M.Gokilavani
14 vistas33 diapositivas
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ... por
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...Asst.Prof. M.Gokilavani
10 vistas17 diapositivas
AI3391 Session 12 Local search in continious space.pptx por
AI3391 Session 12 Local search in continious space.pptxAI3391 Session 12 Local search in continious space.pptx
AI3391 Session 12 Local search in continious space.pptxAsst.Prof. M.Gokilavani
7 vistas16 diapositivas
AI3391 Session 11 Hill climbing algorithm.pptx por
AI3391 Session 11 Hill climbing algorithm.pptxAI3391 Session 11 Hill climbing algorithm.pptx
AI3391 Session 11 Hill climbing algorithm.pptxAsst.Prof. M.Gokilavani
4 vistas22 diapositivas

Más de Asst.Prof. M.Gokilavani(20)

AI3391 Session 13 searching with Non-Deterministic Actions and partial observ... por Asst.Prof. M.Gokilavani
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect... por Asst.Prof. M.Gokilavani
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...
AI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptx por Asst.Prof. M.Gokilavani
AI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptxAI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptx
AI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptx
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f... por Asst.Prof. M.Gokilavani
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...
AI3391 ARTIFICIAL INTELLIGENCE Session 1A Agents and Enviroments.pptx por Asst.Prof. M.Gokilavani
AI3391 ARTIFICIAL INTELLIGENCE Session 1A Agents and Enviroments.pptxAI3391 ARTIFICIAL INTELLIGENCE Session 1A Agents and Enviroments.pptx
AI3391 ARTIFICIAL INTELLIGENCE Session 1A Agents and Enviroments.pptx

Último

PIT Interpretation - Use only PIT-W EN.ppt por
PIT Interpretation - Use only PIT-W EN.pptPIT Interpretation - Use only PIT-W EN.ppt
PIT Interpretation - Use only PIT-W EN.pptRabindra Shrestha
5 vistas45 diapositivas
CCNA_questions_2021.pdf por
CCNA_questions_2021.pdfCCNA_questions_2021.pdf
CCNA_questions_2021.pdfVUPHUONGTHAO9
7 vistas196 diapositivas
Renewal Projects in Seismic Construction por
Renewal Projects in Seismic ConstructionRenewal Projects in Seismic Construction
Renewal Projects in Seismic ConstructionEngineering & Seismic Construction
8 vistas8 diapositivas
Programmable Logic Devices : SPLD and CPLD por
Programmable Logic Devices : SPLD and CPLDProgrammable Logic Devices : SPLD and CPLD
Programmable Logic Devices : SPLD and CPLDUsha Mehta
27 vistas54 diapositivas
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R... por
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...IJCNCJournal
5 vistas25 diapositivas
Solution Challenge Introduction.pptx por
Solution Challenge Introduction.pptxSolution Challenge Introduction.pptx
Solution Challenge Introduction.pptxGDSCCEC
13 vistas16 diapositivas

Último(20)

Programmable Logic Devices : SPLD and CPLD por Usha Mehta
Programmable Logic Devices : SPLD and CPLDProgrammable Logic Devices : SPLD and CPLD
Programmable Logic Devices : SPLD and CPLD
Usha Mehta27 vistas
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R... por IJCNCJournal
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
IJCNCJournal5 vistas
Solution Challenge Introduction.pptx por GDSCCEC
Solution Challenge Introduction.pptxSolution Challenge Introduction.pptx
Solution Challenge Introduction.pptx
GDSCCEC13 vistas
MongoDB.pdf por ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR351 vistas
Design_Discover_Develop_Campaign.pptx por ShivanshSeth6
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptx
ShivanshSeth656 vistas
Unlocking Research Visibility.pdf por KhatirNaima
Unlocking Research Visibility.pdfUnlocking Research Visibility.pdf
Unlocking Research Visibility.pdf
KhatirNaima11 vistas
taylor-2005-classical-mechanics.pdf por ArturoArreola10
taylor-2005-classical-mechanics.pdftaylor-2005-classical-mechanics.pdf
taylor-2005-classical-mechanics.pdf
ArturoArreola1037 vistas
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth por Innomantra
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth
Innomantra 22 vistas
2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf por Philipp Daum
2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf
2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf
Philipp Daum5 vistas
Basic Design Flow for Field Programmable Gate Arrays por Usha Mehta
Basic Design Flow for Field Programmable Gate ArraysBasic Design Flow for Field Programmable Gate Arrays
Basic Design Flow for Field Programmable Gate Arrays
Usha Mehta10 vistas
Building source code level profiler for C++.pdf por ssuser28de9e
Building source code level profiler for C++.pdfBuilding source code level profiler for C++.pdf
Building source code level profiler for C++.pdf
ssuser28de9e10 vistas
AWS Certified Solutions Architect Associate Exam Guide_published .pdf por Kiran Kumar Malik
AWS Certified Solutions Architect Associate Exam Guide_published .pdfAWS Certified Solutions Architect Associate Exam Guide_published .pdf
AWS Certified Solutions Architect Associate Exam Guide_published .pdf
Different type of computer networks .pptx por nazmul1514788
Different  type of computer networks .pptxDifferent  type of computer networks .pptx
Different type of computer networks .pptx
nazmul151478819 vistas

CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

  • 1. CCS334 BIG DATAANALYTICS (R-21 III (I Sem)) Department of Artificial Intelligence and Data Science ) Session 3 by Asst.Prof.M.Gokilavani NIET 9/19/2023 Department of AI & DS 1
  • 2. TEXT BOOKS • Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. • Eric Sammer, "Hadoop Operations", O'Reilley, 2012. • Sadalage, Pramod J. “NoSQL distilled”, 2013. REFERENCES • E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012. • Lars George, "HBase: The Definitive Guide", O'Reilley, 2011. • Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010. 9/19/2023 Department of AI & DS 2
  • 3. Topics covered in Unit 2 session 9/19/2023 Department of AI & DS 3 UNIT II NOSQL DATA MANAGEMENT Introduction to NoSQL – aggregate data models – key-value and document data models – relationships – graph databases – schema less databases – materialized views – distribution models – master- slave replication – consistency - Cassandra – Cassandra data model – Cassandra examples – Cassandra clients.
  • 4. Distribution Models • Depending on the distribution model the data store can give us the ability: • To handle large Quantity of data • To process a grater read or write traffic • To have more availability in the case of network slowdowns of breakages. • There are two path for distribution: • Sharding: Sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data. • Replication: Replication copies data across multiple servers, so each bit of data can be found in multiple places. 9/19/2023 Department of AI & DS 4
  • 5. Distributed Models • Single server • Sharding • Master Slave Replication • Peer-to-Peer Replication • Combining Sharding & Replication 9/19/2023 Department of AI & DS 5
  • 6. Single Server • It is the first and simplest distribution option. • Also if NoSQL database are designed to run on a cluster they can be used in a single server application. • This make sense if a NoSQL database is more suited for the application data model. • Graph database are the more obvious. • If data usage is most about processing aggregates, than a key or a document store may be useful. 9/19/2023 Department of AI & DS 6
  • 7. Sharding • Often, a data store is busy because different people are accessing different part of the dataset. • In this cases we can support horizontal scalability by putting different part of the data onto different servers (Sharding) • The concept of sharding is not new as a part of application logic. • It consists in put all the customer with surname A-D on one shard and E-G to another • This complicates the programming model as the application code needs to distributed the load across the shards. • In the ideal setting we have each user to talk one server and the load is balanced. • Of course the ideal case is rare. 9/19/2023 Department of AI & DS 7
  • 9. Sharding and NoSQL • In general, many NoSQL databases offers auto- sharding. • This can make much easier to use sharding in an application. • Sharding is especially valuable for performance because it improves read and write performances. • It scales read and writes on the different nodes of the same cluster. 9/19/2023 Department of AI & DS 9
  • 10. Master – Slave Replication • In this setting one node is designated as the master, or primary ad the other as slaves. • The master is the authoritative source for the date and designed to process updates and send them to slaves. • The slaves are used for read operations. • This allows us to scale in data intensive dataset. 9/19/2023 Department of AI & DS 10
  • 12. Master – Slave Replication • We can scale horizontally by adding more slaves. • But, we are limited by the ability of the master in processing incoming data. An advantage is read resilience. Disadvantage: • Also if the master fails the slaves can still handle read requests. • Anyway writes are not allowed until the master is not restored. 9/19/2023 Department of AI & DS 12
  • 13. Master – Slave Replication • Another characteristic is that a slave can be appointed as master. • Masters can be appointed manually or automatically. • In order to achieve resilience we need that read and write paths are different. • This is normally done using separate database connections. • Disadvantage: Replication in master-slave have the analyzed advantages but it come with the problem of inconsistency. • The readers reading from the slaves can read data not updated. 9/19/2023 Department of AI & DS 13
  • 14. Master – Slave Replication in MongoDB 9/19/2023 Department of AI & DS 14
  • 15. Peer to Peer Replication • Master-Slave replication helps with read scalability but has problems on scalability of writes. • Moreover, it provides resilience on read but not on writes. • The master is still a single point of failure. • Peer-to-Peer attacks these problems by not having a master. • All the replica are equal (accept writes and reads). • With a Peer-to-Peer we can have node failures without lose write capability and losing data. 9/19/2023 Department of AI & DS 15
  • 17. Combine Sharing • We have multiple masters, but each data has a single master. • Depending on the configuration we can decide the master for each group of data. • Peer-to-Peer and sharding is a common strategy for column-family databases. • This is commonly composed using replication of the shards. 9/19/2023 Department of AI & DS 17
  • 18. Topics to be covered in next session 4 • Cassandra data model 9/19/2023 Department of AI & DS 18 Thank you!!!