SlideShare una empresa de Scribd logo
1 de 36
Big Data trends and the rising importance of NOSQL Abhijit Sharma, Architect,  Innovation & Incubation Lab, BMC Software
Trends in cloud, web, and even enterprise scale apps Unprecedented growth in - Data set sizes which need to be stored, analyzed Big Data - Cloud scale services generate TB’s > PB’s – FB, eBay, Digg, Foursquare Connectedness and democratization of data social networks, feeds, blogs, wiki, tags, semantic web  Data API’s - mash up data - use  Twitter, FB, Flickr API’s Semi structured or unstructured data Performance requirements of these apps Humongous R/W Scalability  High Availability  Trading consistency for availability – ACID not mandatory
RDBMS woes Challenge - Storing and scaling humongous amounts of data and remaining highly available Vertical scaling mostly - upper limit & expensive Horizontal scaling – no automatic sharding, no rebalancing – no infrastructure Distributed transactions & joins due to normalization inhibit performance, availability Schema less data models – rigid schema – alter table, null columns  Deeply connected data – not designed for this
NOSQL is  NOT  No SQL The NOSQL Alternative
NOSQL is  simply Not only SQL The NOSQL Alternative
NOSQL – So what else is it? “One size fits all” RDBMS is not working  NOSQL alternatives are polyglot solutions that better fit the new requirements thrown up by the trends. They can be categorized along these axes - Data Model - simple to complex Scalability – single to horizontal Persistence
NOSQL categories Graph Databases Based on Graph theory Data model – graph,  nodes, edges, properties Scalability – single node – high performance Persistence – On disk data structures Examples – Neo4J,  AllegroGraph Document Databases Based loosely on documents/Lotus Notes Data model – collections of documents Scalability – horizontal,  auto-sharding & replication Persistence – B-Tree Examples – mongoDB, CouchDB
NOSQL categories Column Stores Based on Google’s BigTable design Data model - big table, column families Scalability – horizontal, auto-sharding & replication Persistence – Memory + File (on DFS) Examples – HBase, Cassandra Key Value Stores Based on DHT,  Amazon’s Dynamo design Data model – collection of key value pairs Scalability – horizontal, auto-sharding & replication Persistence – Memory or File  Examples – Redis,  Amazon Dynamo, Voldemort
Graph Databases
Graph oriented data Graphs are ubiquitous – Social networks, wikis, the web, recommendation engines et. al. Deep trees, complex networks Graph traversal - apt for expressing graph related problems (shortest path, network size etc.)
LinkedIn Social Graph
Why not RDBMS for large scale graphs? Difficult to model and traverse graphs in RDBMS recursive approaches - slow SQL queries that span many table joins Hacks like storing paths for trees
Graph Databases Designed for efficient storage & traversal of large scale graphs Natural modeling of graph network - nodes, relationships and their properties Neo4J is a leading graph db Supports billions of nodes/edges, traverses depths of 1000 levels in ms, 1000x of RDBMS Handle large graphs that don't fit in memory - persistent transactional store optimized for graphs REST API and various language bindings Graph pattern matching,  Cypher Query language, Indexer – Lucene
Graph basics
All Paths & My Network size
Shortest path between …
Is connected to?
You may know…
Mining your network Centrality Algorithms  Closeness  – who has the most followers on twitter  Betweenness – who has more influential people following them Eigenvector – PageRank
Document Databases
Flexible document oriented data Document style unstructured data - schema less – e.g. JSON documents No alter table needed like in an RDBMS, de-normalized data Useful for iterative/agile development Humongous scale - billions of documents, R/W traffic – millions/sec,  horizontal scalability,  availability mongoDB is a leading document database
Document Database – Use cases Use cases : Archiving of historic data which has undergone many schema changes Flexible set of performance metrics – web site page views, unique visitors  etc.  - change over time – no need to update existing JSON documents Track near real time metrics - optimized increment of perf counters Geo Loc based mobile and gaming apps (Geospatial indices can be key here)
Craigslist Archival Database Premium service to customers allowed search over their  historical postings Archival (no purging) of 10 years of postings - billions of documents Schema changes across versions MySQL based archival database  ALTER TABLE took a month to complete
Foursquare ,[object Object]
Geo : Optimized for geo location queries – Find Starbucks near my current GPS location,[object Object]
mongoDB Features JSON documents, collection oriented storage Rich, document-based queries Indexes on document attributes Fast in-place updates Scalability features	 Horizontal scalability Configurable replication and high-availability Auto-sharding & rebalancing Language specific drivers – Java, Scala, Ruby etc.
Column Stores
Column Store Reasonably rich data model –  sparse, distributed, persistent multi-dimensional sorted map Sorted row keys, columns Use cases - Large scale data storage and analysis like -  Time series data along with associated dimension data  Row keys are timestamps and thus sorted – helps time range queries Google analytics Provides aggregate statistics, # unique visitors/day, page views/URL/day Raw click table has a row for each URL + user session time ~200 TB – ensures contiguous URLs chronologically sorted  Data Cube - CPU OS Time DC
Column Store Performance Excellent R/W performance – large storage – PB’s High scalability - horizontal scaling,  auto-sharding High Availability - transparent replication of data HBase is a leading column store on – built on Hadoop HDFS as the underlying persistence
Column Store - HBase Table defines  Column Families  -  groups similar attributes ,  vertical partitioning  (Table, Row, ColumnFamily: Column, Timestamp) tuple maps to a cell - value  Table is split into multiple equal distributed regions each of which is a range of sorted keys (partitioned automatically by the key) Ordered Rows by key, Ordered columns in a Column Family Rows can have different number of columns  Columns have value and versions (any number) Row range & column range and key range queries
HBase Architecture
Key Value Stores
Key Value Stores Simplest possible data model Caching a user’s personalized, rendered page – avoid DB S3 bucket storage for blob data against a unique id Range of KV stores Distributed, scaleable persistent key-value storage – Dynamo,  Voldemort Auto-Partitioned key space  Replicated KV Highly Available Largely in-memory KV stores – Redis, memcached Redis blazing fast for cache and other interesting operations
Redis In memory KV store  Blazing fast – 100 K/sec R/W Async snapshot to disk More than KV store – a data structure store –  Supports lists, queues, sets and operations on them Sorted list range operations Set operations UNION,  INTERSECTION,  DIFF
Redis – Use Cases Web session caching with EXPIRE set for session expiry Live real time bit.ly URL stats like clicks etc – fast increments of counters Auto Complete – Type first few characters – maps to a sort list and a range query is fired Publish / Subscribe – fan out a message to subscribers Set operations – My Twitter <Followers INTERSECTION Followees> - tells me who all I follow but they don’t follow me back
Thanks Email : abhijit.sharma@gmail.comTwitter : sharmaabhijitBlog : abhijitsharma.blogspot.com

Más contenido relacionado

La actualidad más candente

Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - FacebookHadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - FacebookCloudera, Inc.
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911Ines Sombra
 
From big data to big value : Infrastructure need and Huawei best practise
From big data to big value : Infrastructure need and Huawei best practise From big data to big value : Infrastructure need and Huawei best practise
From big data to big value : Infrastructure need and Huawei best practise BSP Media Group
 
SDMA-FDMA-TDMA-fixed TDM
SDMA-FDMA-TDMA-fixed TDMSDMA-FDMA-TDMA-fixed TDM
SDMA-FDMA-TDMA-fixed TDMSanSan149
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalknzhang
 
Introduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEIntroduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEUlises Fasoli
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseRishabh Dugar
 
Geek Sync | Successfully Migrating Existing Databases to Azure SQL Database
Geek Sync | Successfully Migrating Existing Databases to Azure SQL DatabaseGeek Sync | Successfully Migrating Existing Databases to Azure SQL Database
Geek Sync | Successfully Migrating Existing Databases to Azure SQL DatabaseIDERA Software
 

La actualidad más candente (13)

Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - FacebookHadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
From big data to big value : Infrastructure need and Huawei best practise
From big data to big value : Infrastructure need and Huawei best practise From big data to big value : Infrastructure need and Huawei best practise
From big data to big value : Infrastructure need and Huawei best practise
 
SDMA-FDMA-TDMA-fixed TDM
SDMA-FDMA-TDMA-fixed TDMSDMA-FDMA-TDMA-fixed TDM
SDMA-FDMA-TDMA-fixed TDM
 
Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalk
 
Semantic web
Semantic webSemantic web
Semantic web
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEIntroduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSE
 
Preparing yourdataforcloud
Preparing yourdataforcloudPreparing yourdataforcloud
Preparing yourdataforcloud
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
Geek Sync | Successfully Migrating Existing Databases to Azure SQL Database
Geek Sync | Successfully Migrating Existing Databases to Azure SQL DatabaseGeek Sync | Successfully Migrating Existing Databases to Azure SQL Database
Geek Sync | Successfully Migrating Existing Databases to Azure SQL Database
 

Destacado

2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...Michel Dumontier
 
Combining Quantitative & Qualitative Data in a Single Large scale User Resear...
Combining Quantitative & Qualitative Data in a Single Large scale User Resear...Combining Quantitative & Qualitative Data in a Single Large scale User Resear...
Combining Quantitative & Qualitative Data in a Single Large scale User Resear...UserZoom
 
Catalogo tony tallarin
Catalogo tony tallarinCatalogo tony tallarin
Catalogo tony tallarinAndres Garces
 
Charity and Email
Charity and EmailCharity and Email
Charity and Emailraneez
 
Balaur.ro - Cristian George Strat
Balaur.ro - Cristian George StratBalaur.ro - Cristian George Strat
Balaur.ro - Cristian George StratGeekMeet
 
Design for Innovation (D4I) Improvement Process
Design for Innovation (D4I) Improvement ProcessDesign for Innovation (D4I) Improvement Process
Design for Innovation (D4I) Improvement ProcessIain Sanders
 
iPhone and Appstore
iPhone and AppstoreiPhone and Appstore
iPhone and AppstoreHome
 
Large scale data-parsing with Hadoop in Bioinformatics
Large scale data-parsing with Hadoop in BioinformaticsLarge scale data-parsing with Hadoop in Bioinformatics
Large scale data-parsing with Hadoop in BioinformaticsNtino Krampis
 
"mettiamoci sempre dove si prende"
"mettiamoci sempre dove si prende""mettiamoci sempre dove si prende"
"mettiamoci sempre dove si prende"Denis Ferraretti
 
Generell presentasjon
Generell presentasjonGenerell presentasjon
Generell presentasjonGlenn Melby
 
UserZoom & Key Lime Interactive Healthcare Webinar
UserZoom & Key Lime Interactive Healthcare WebinarUserZoom & Key Lime Interactive Healthcare Webinar
UserZoom & Key Lime Interactive Healthcare WebinarUserZoom
 
Hermeneus Euskaraz - Jakintza Librea
Hermeneus Euskaraz - Jakintza LibreaHermeneus Euskaraz - Jakintza Librea
Hermeneus Euskaraz - Jakintza LibreaXabi del Rey
 
Social media analysis for toronto 2010 mayoral election
Social media analysis for toronto 2010 mayoral electionSocial media analysis for toronto 2010 mayoral election
Social media analysis for toronto 2010 mayoral electionPatrick Gladney
 
Part 4: New HIV Treatment Pipeline
Part 4: New HIV Treatment PipelinePart 4: New HIV Treatment Pipeline
Part 4: New HIV Treatment PipelineNAPWA
 

Destacado (20)

2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
Combining Quantitative & Qualitative Data in a Single Large scale User Resear...
Combining Quantitative & Qualitative Data in a Single Large scale User Resear...Combining Quantitative & Qualitative Data in a Single Large scale User Resear...
Combining Quantitative & Qualitative Data in a Single Large scale User Resear...
 
Catalogo tony tallarin
Catalogo tony tallarinCatalogo tony tallarin
Catalogo tony tallarin
 
Charity and Email
Charity and EmailCharity and Email
Charity and Email
 
Balaur.ro - Cristian George Strat
Balaur.ro - Cristian George StratBalaur.ro - Cristian George Strat
Balaur.ro - Cristian George Strat
 
Design for Innovation (D4I) Improvement Process
Design for Innovation (D4I) Improvement ProcessDesign for Innovation (D4I) Improvement Process
Design for Innovation (D4I) Improvement Process
 
iPhone and Appstore
iPhone and AppstoreiPhone and Appstore
iPhone and Appstore
 
Propostadedecretplurilingisme2011
Propostadedecretplurilingisme2011Propostadedecretplurilingisme2011
Propostadedecretplurilingisme2011
 
Cda esm waste oil disposal application part 2
Cda esm waste oil disposal application part 2Cda esm waste oil disposal application part 2
Cda esm waste oil disposal application part 2
 
Large scale data-parsing with Hadoop in Bioinformatics
Large scale data-parsing with Hadoop in BioinformaticsLarge scale data-parsing with Hadoop in Bioinformatics
Large scale data-parsing with Hadoop in Bioinformatics
 
Free Software
Free SoftwareFree Software
Free Software
 
The Beatles
The BeatlesThe Beatles
The Beatles
 
Tamk Conference Finished 2008
Tamk Conference Finished 2008Tamk Conference Finished 2008
Tamk Conference Finished 2008
 
"mettiamoci sempre dove si prende"
"mettiamoci sempre dove si prende""mettiamoci sempre dove si prende"
"mettiamoci sempre dove si prende"
 
Generell presentasjon
Generell presentasjonGenerell presentasjon
Generell presentasjon
 
UserZoom & Key Lime Interactive Healthcare Webinar
UserZoom & Key Lime Interactive Healthcare WebinarUserZoom & Key Lime Interactive Healthcare Webinar
UserZoom & Key Lime Interactive Healthcare Webinar
 
Hermeneus Euskaraz - Jakintza Librea
Hermeneus Euskaraz - Jakintza LibreaHermeneus Euskaraz - Jakintza Librea
Hermeneus Euskaraz - Jakintza Librea
 
Social media analysis for toronto 2010 mayoral election
Social media analysis for toronto 2010 mayoral electionSocial media analysis for toronto 2010 mayoral election
Social media analysis for toronto 2010 mayoral election
 
2011 CANARIE User's Forum
2011 CANARIE User's Forum2011 CANARIE User's Forum
2011 CANARIE User's Forum
 
Part 4: New HIV Treatment Pipeline
Part 4: New HIV Treatment PipelinePart 4: New HIV Treatment Pipeline
Part 4: New HIV Treatment Pipeline
 

Similar a Big Data and the growing relevance of NoSQL

NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
No sql landscape_nosqltips
No sql landscape_nosqltipsNo sql landscape_nosqltips
No sql landscape_nosqltipsimarcticblue
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Rukmani Gopalan
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
Big data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaBig data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaKaushik Dutta
 
Building Data Solutions with Azure
Building Data Solutions with AzureBuilding Data Solutions with Azure
Building Data Solutions with AzureDinusha Kumarasiri
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 

Similar a Big Data and the growing relevance of NoSQL (20)

NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
No sql landscape_nosqltips
No sql landscape_nosqltipsNo sql landscape_nosqltips
No sql landscape_nosqltips
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Big data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaBig data Intro by Kaushik Dutta
Big data Intro by Kaushik Dutta
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Building Data Solutions with Azure
Building Data Solutions with AzureBuilding Data Solutions with Azure
Building Data Solutions with Azure
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 

Último

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Big Data and the growing relevance of NoSQL

  • 1. Big Data trends and the rising importance of NOSQL Abhijit Sharma, Architect, Innovation & Incubation Lab, BMC Software
  • 2. Trends in cloud, web, and even enterprise scale apps Unprecedented growth in - Data set sizes which need to be stored, analyzed Big Data - Cloud scale services generate TB’s > PB’s – FB, eBay, Digg, Foursquare Connectedness and democratization of data social networks, feeds, blogs, wiki, tags, semantic web Data API’s - mash up data - use Twitter, FB, Flickr API’s Semi structured or unstructured data Performance requirements of these apps Humongous R/W Scalability High Availability Trading consistency for availability – ACID not mandatory
  • 3. RDBMS woes Challenge - Storing and scaling humongous amounts of data and remaining highly available Vertical scaling mostly - upper limit & expensive Horizontal scaling – no automatic sharding, no rebalancing – no infrastructure Distributed transactions & joins due to normalization inhibit performance, availability Schema less data models – rigid schema – alter table, null columns Deeply connected data – not designed for this
  • 4. NOSQL is NOT No SQL The NOSQL Alternative
  • 5. NOSQL is simply Not only SQL The NOSQL Alternative
  • 6. NOSQL – So what else is it? “One size fits all” RDBMS is not working NOSQL alternatives are polyglot solutions that better fit the new requirements thrown up by the trends. They can be categorized along these axes - Data Model - simple to complex Scalability – single to horizontal Persistence
  • 7. NOSQL categories Graph Databases Based on Graph theory Data model – graph, nodes, edges, properties Scalability – single node – high performance Persistence – On disk data structures Examples – Neo4J, AllegroGraph Document Databases Based loosely on documents/Lotus Notes Data model – collections of documents Scalability – horizontal, auto-sharding & replication Persistence – B-Tree Examples – mongoDB, CouchDB
  • 8. NOSQL categories Column Stores Based on Google’s BigTable design Data model - big table, column families Scalability – horizontal, auto-sharding & replication Persistence – Memory + File (on DFS) Examples – HBase, Cassandra Key Value Stores Based on DHT, Amazon’s Dynamo design Data model – collection of key value pairs Scalability – horizontal, auto-sharding & replication Persistence – Memory or File Examples – Redis, Amazon Dynamo, Voldemort
  • 10. Graph oriented data Graphs are ubiquitous – Social networks, wikis, the web, recommendation engines et. al. Deep trees, complex networks Graph traversal - apt for expressing graph related problems (shortest path, network size etc.)
  • 12. Why not RDBMS for large scale graphs? Difficult to model and traverse graphs in RDBMS recursive approaches - slow SQL queries that span many table joins Hacks like storing paths for trees
  • 13. Graph Databases Designed for efficient storage & traversal of large scale graphs Natural modeling of graph network - nodes, relationships and their properties Neo4J is a leading graph db Supports billions of nodes/edges, traverses depths of 1000 levels in ms, 1000x of RDBMS Handle large graphs that don't fit in memory - persistent transactional store optimized for graphs REST API and various language bindings Graph pattern matching, Cypher Query language, Indexer – Lucene
  • 15. All Paths & My Network size
  • 19. Mining your network Centrality Algorithms Closeness – who has the most followers on twitter Betweenness – who has more influential people following them Eigenvector – PageRank
  • 21. Flexible document oriented data Document style unstructured data - schema less – e.g. JSON documents No alter table needed like in an RDBMS, de-normalized data Useful for iterative/agile development Humongous scale - billions of documents, R/W traffic – millions/sec, horizontal scalability, availability mongoDB is a leading document database
  • 22. Document Database – Use cases Use cases : Archiving of historic data which has undergone many schema changes Flexible set of performance metrics – web site page views, unique visitors etc. - change over time – no need to update existing JSON documents Track near real time metrics - optimized increment of perf counters Geo Loc based mobile and gaming apps (Geospatial indices can be key here)
  • 23. Craigslist Archival Database Premium service to customers allowed search over their historical postings Archival (no purging) of 10 years of postings - billions of documents Schema changes across versions MySQL based archival database ALTER TABLE took a month to complete
  • 24.
  • 25.
  • 26. mongoDB Features JSON documents, collection oriented storage Rich, document-based queries Indexes on document attributes Fast in-place updates Scalability features Horizontal scalability Configurable replication and high-availability Auto-sharding & rebalancing Language specific drivers – Java, Scala, Ruby etc.
  • 28. Column Store Reasonably rich data model – sparse, distributed, persistent multi-dimensional sorted map Sorted row keys, columns Use cases - Large scale data storage and analysis like - Time series data along with associated dimension data Row keys are timestamps and thus sorted – helps time range queries Google analytics Provides aggregate statistics, # unique visitors/day, page views/URL/day Raw click table has a row for each URL + user session time ~200 TB – ensures contiguous URLs chronologically sorted Data Cube - CPU OS Time DC
  • 29. Column Store Performance Excellent R/W performance – large storage – PB’s High scalability - horizontal scaling, auto-sharding High Availability - transparent replication of data HBase is a leading column store on – built on Hadoop HDFS as the underlying persistence
  • 30. Column Store - HBase Table defines Column Families - groups similar attributes , vertical partitioning (Table, Row, ColumnFamily: Column, Timestamp) tuple maps to a cell - value  Table is split into multiple equal distributed regions each of which is a range of sorted keys (partitioned automatically by the key) Ordered Rows by key, Ordered columns in a Column Family Rows can have different number of columns Columns have value and versions (any number) Row range & column range and key range queries
  • 33. Key Value Stores Simplest possible data model Caching a user’s personalized, rendered page – avoid DB S3 bucket storage for blob data against a unique id Range of KV stores Distributed, scaleable persistent key-value storage – Dynamo, Voldemort Auto-Partitioned key space Replicated KV Highly Available Largely in-memory KV stores – Redis, memcached Redis blazing fast for cache and other interesting operations
  • 34. Redis In memory KV store Blazing fast – 100 K/sec R/W Async snapshot to disk More than KV store – a data structure store – Supports lists, queues, sets and operations on them Sorted list range operations Set operations UNION, INTERSECTION, DIFF
  • 35. Redis – Use Cases Web session caching with EXPIRE set for session expiry Live real time bit.ly URL stats like clicks etc – fast increments of counters Auto Complete – Type first few characters – maps to a sort list and a range query is fired Publish / Subscribe – fan out a message to subscribers Set operations – My Twitter <Followers INTERSECTION Followees> - tells me who all I follow but they don’t follow me back
  • 36. Thanks Email : abhijit.sharma@gmail.comTwitter : sharmaabhijitBlog : abhijitsharma.blogspot.com