SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
Search data store for the world's largest
                            biometric identity system


                    Regunath Balasubramanian         Shashikant Soni
                      regunathb@gmail.com      soni.shashikant@gmail.com
                       twitter @regunathb




CONFIDENTIAL: For limited circulation only                                 Slide 1
India
● 1.2 billion residents
   ● 640,000 villages, ~60% lives under $2/day
   ● ~75% literacy, <3% pays Income Tax, <20% banking
   ● ~800 million mobile, ~200-300 mn migrant workers

● Govt. spends about $25-40B on direct subsidies
   ● Residents have no standard identity document
   ● Most programs plagued with ghost and multiple identities causing
     leakage of 30-40%




                                                                        Slide 2
Aadhaar
● Create a common ‘national identity’ for every ‘resident’
   ●Biometric backed identity to eliminate duplicates
   ●‘Verifiable online identity’ for portability
● Applications ecosystem using open APIs
   ●Aadhaar enabled bank account and payment platform
   ●Aadhaar enabled electronic, paperless KYC (Know Your
     Customer)




                                                             Slide 3
Search Requirements
● Multi-attribute query like:
   name contains ‘regunath’ AND city = ‘bangalore’ AND
   address contains ‘J P Nagar’ AND YearOfBirth = ……


● Search 1.2B resident data with photo, history
   ●35Kb - Average record size
● Response times in milliseconds
● Open scale out


                                                         Slide 4
Why MongoDB
● Auto-sharding
● Replication
● Failover
   … Essentially an AP (slaveOk) data store in CAP parlance

● Evolving schema
● Map-Reduce for analysis
● Full text search
   ●Compound (or) multi-keys


                                                              Slide 5
Design

               { _id:123456789, name: ‘abcde’, year:1980, ….. }
    MongoDB         2

                                             Search API                                  Client App
                                                                  Name=‘abcde’
    Solr            1
                                                                  Address=‘some place’
  Indexes     Name: ‘abcde’                                       Year= 1980
              Address: ‘some place’
              year: 1980



● Read/Search
   ●Sharded Solr indexes for search
   ●Keyed document read from MongoDB
● Write
   ●Eventual consistency (across data sources) driven by
    application
   ●Composite MongodDB-Solr app persistence handler                                                   Slide 6
Implementation and Deployment
   ● Start - 4M records in 2 shards
   Current - 250M records in 8 shards ( 8 x ~2 TB x 3 replicas)
   ● Performance , Reliability & Durability
      ●SlaveOk
      ●getLastError, Write Concern: availability vs durability
          j = journaling
          w = nodes-to-write
   ● Replica-sets / Shards – how?
            RS 1                RS 1              RS 1
            Rs 2                                  RS 2              RS 2

Primary
                     Config 1          Config 2          Config 3
Secondary

Arbiter               Router           Router            Router
                                                                           Slide 7
Monitoring and Troubleshooting
● Monitoring tools evaluated
   ●MMS
   ●munin
● Manual approach - daily ritual
   ●RS, DB, config, router - health and stats
● Problem analysis stats
   ●mongostat, iostat, currentOps, logs
   ●Client connections
● Stats for storage, shards addition
   ●Data file size
   ●Shard data distribution
   ●Replication
                                                Slide 8
Key Learnings on MongoDB
● Indexing 32 fields
   ●Compound indexes
   ●Multi-keys indexes
       {…"indexes" : [{ "email":"john.doe@email.com", "phone":"123456789“ }] }
       db.coll.find ({ "indexes.email" : "john.doe@email.com" })
   ●Indexes use b-tree
   ●Many fields to index
   ●Performs well upto 1-2M documents
   ●Best if index fits in memory
● Data replication, RS failover
   ●Rollback when RS goes out of sync
       Manual restore (physical data copy)
       Restarting a very stale node
                                                                            Slide 9
Questions?



                    Regunath Balasubramanian               Shashikant Soni
                      regunathb@gmail.com            soni.shashikant@gmail.com
                       twitter @regunathb




CONFIDENTIAL: For limited circulation only                                       Slide 10

Más contenido relacionado

La actualidad más candente

Top Trends in Retail Banking: 2020
Top Trends in Retail Banking: 2020Top Trends in Retail Banking: 2020
Top Trends in Retail Banking: 2020Capgemini
 
Online student portal
Online student portalOnline student portal
Online student portalZobia Zafar
 
Non perfoming assets @ uti bank project report mba finance
Non perfoming assets  @ uti bank project report mba financeNon perfoming assets  @ uti bank project report mba finance
Non perfoming assets @ uti bank project report mba financeBabasab Patil
 
Artificial intelligence in fintech
Artificial intelligence in fintechArtificial intelligence in fintech
Artificial intelligence in fintechSpotle.ai
 
Management Information System of SBI
Management Information System of SBIManagement Information System of SBI
Management Information System of SBISomendra Singh
 
It in banking industry
It in banking industryIt in banking industry
It in banking industryDharmik
 
Project report | Major Project | Engineering | Devansh Koolwal
Project report | Major Project | Engineering | Devansh KoolwalProject report | Major Project | Engineering | Devansh Koolwal
Project report | Major Project | Engineering | Devansh KoolwalDevansh Koolwal
 
Banking-as-a-Service 2.0 - Executive Summary
Banking-as-a-Service 2.0 - Executive SummaryBanking-as-a-Service 2.0 - Executive Summary
Banking-as-a-Service 2.0 - Executive SummaryMEDICI Inner Circle
 
Presentation on House Rent Management System
Presentation on House Rent Management SystemPresentation on House Rent Management System
Presentation on House Rent Management SystemRihab Rahman
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...Fatima Qayyum
 
Neo and challenger bank ppt
Neo and challenger bank pptNeo and challenger bank ppt
Neo and challenger bank pptKetaki V
 
Micro-finance
Micro-finance Micro-finance
Micro-finance Heer Shah
 
Scalable deployment options in WSO2 API Manager
Scalable deployment options in WSO2 API ManagerScalable deployment options in WSO2 API Manager
Scalable deployment options in WSO2 API ManagerWSO2
 
The Rise & Rise of Digital Lending
The Rise & Rise of Digital LendingThe Rise & Rise of Digital Lending
The Rise & Rise of Digital Lendingicenet Limited
 
PayMaster Seed Plus Pitch Deck
PayMaster Seed Plus Pitch DeckPayMaster Seed Plus Pitch Deck
PayMaster Seed Plus Pitch DeckRansikaDeSilva
 
Peer to Peer lending
Peer to Peer lendingPeer to Peer lending
Peer to Peer lendingRishab Maloo
 

La actualidad más candente (20)

Online second hand book store project report
Online second hand book store project reportOnline second hand book store project report
Online second hand book store project report
 
Top Trends in Retail Banking: 2020
Top Trends in Retail Banking: 2020Top Trends in Retail Banking: 2020
Top Trends in Retail Banking: 2020
 
Online student portal
Online student portalOnline student portal
Online student portal
 
Non perfoming assets @ uti bank project report mba finance
Non perfoming assets  @ uti bank project report mba financeNon perfoming assets  @ uti bank project report mba finance
Non perfoming assets @ uti bank project report mba finance
 
Artificial intelligence in fintech
Artificial intelligence in fintechArtificial intelligence in fintech
Artificial intelligence in fintech
 
Management Information System of SBI
Management Information System of SBIManagement Information System of SBI
Management Information System of SBI
 
It in banking industry
It in banking industryIt in banking industry
It in banking industry
 
Synopsis for Gpay
Synopsis for GpaySynopsis for Gpay
Synopsis for Gpay
 
Project report | Major Project | Engineering | Devansh Koolwal
Project report | Major Project | Engineering | Devansh KoolwalProject report | Major Project | Engineering | Devansh Koolwal
Project report | Major Project | Engineering | Devansh Koolwal
 
Banking-as-a-Service 2.0 - Executive Summary
Banking-as-a-Service 2.0 - Executive SummaryBanking-as-a-Service 2.0 - Executive Summary
Banking-as-a-Service 2.0 - Executive Summary
 
Presentation on House Rent Management System
Presentation on House Rent Management SystemPresentation on House Rent Management System
Presentation on House Rent Management System
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
 
Neo and challenger bank ppt
Neo and challenger bank pptNeo and challenger bank ppt
Neo and challenger bank ppt
 
Micro-finance
Micro-finance Micro-finance
Micro-finance
 
Scalable deployment options in WSO2 API Manager
Scalable deployment options in WSO2 API ManagerScalable deployment options in WSO2 API Manager
Scalable deployment options in WSO2 API Manager
 
The Rise & Rise of Digital Lending
The Rise & Rise of Digital LendingThe Rise & Rise of Digital Lending
The Rise & Rise of Digital Lending
 
SRS for Google Pay(Gpay)
SRS for Google Pay(Gpay)SRS for Google Pay(Gpay)
SRS for Google Pay(Gpay)
 
grocery management system
grocery  management systemgrocery  management system
grocery management system
 
PayMaster Seed Plus Pitch Deck
PayMaster Seed Plus Pitch DeckPayMaster Seed Plus Pitch Deck
PayMaster Seed Plus Pitch Deck
 
Peer to Peer lending
Peer to Peer lendingPeer to Peer lending
Peer to Peer lending
 

Destacado

MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaarRegunath B
 
Knowledge Engines and AI – Applications Beyond Gaming
Knowledge Engines and AI – Applications Beyond GamingKnowledge Engines and AI – Applications Beyond Gaming
Knowledge Engines and AI – Applications Beyond GamingMecklerMedia
 
AADHAR SECURE TRAVEL IDENETITY
AADHAR SECURE TRAVEL IDENETITYAADHAR SECURE TRAVEL IDENETITY
AADHAR SECURE TRAVEL IDENETITYPratik Vipul
 
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBayStoring eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBayMongoDB
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Step by-step presentation on digital payments
Step by-step presentation on digital paymentsStep by-step presentation on digital payments
Step by-step presentation on digital paymentsMahantesh Biradar
 
India - A Cashless Economy (NPCI/UPI)
India - A Cashless Economy (NPCI/UPI)India - A Cashless Economy (NPCI/UPI)
India - A Cashless Economy (NPCI/UPI)Aravind Krishnaswamy
 
ppt on aadhar card project
ppt on aadhar card projectppt on aadhar card project
ppt on aadhar card projectPooja Verma
 

Destacado (11)

MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Knowledge Engines and AI – Applications Beyond Gaming
Knowledge Engines and AI – Applications Beyond GamingKnowledge Engines and AI – Applications Beyond Gaming
Knowledge Engines and AI – Applications Beyond Gaming
 
AADHAR SECURE TRAVEL IDENETITY
AADHAR SECURE TRAVEL IDENETITYAADHAR SECURE TRAVEL IDENETITY
AADHAR SECURE TRAVEL IDENETITY
 
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBayStoring eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Step by-step presentation on digital payments
Step by-step presentation on digital paymentsStep by-step presentation on digital payments
Step by-step presentation on digital payments
 
India - A Cashless Economy (NPCI/UPI)
India - A Cashless Economy (NPCI/UPI)India - A Cashless Economy (NPCI/UPI)
India - A Cashless Economy (NPCI/UPI)
 
ppt on aadhar card project
ppt on aadhar card projectppt on aadhar card project
ppt on aadhar card project
 

Similar a Mongo for aadhaar

Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Alexey Zinoviev
 
MongoDB and Web Scrapping with the Gyes Platform
MongoDB and Web Scrapping with the Gyes PlatformMongoDB and Web Scrapping with the Gyes Platform
MongoDB and Web Scrapping with the Gyes PlatformMongoDB
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014OpenExpoES
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDBCésar Trigo
 
Graph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBGraph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBAndrei KUCHARAVY
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleScyllaDB
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleSaurabh Verma
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesIvan Kruglov
 
There is Javascript in my SQL
There is Javascript in my SQLThere is Javascript in my SQL
There is Javascript in my SQLPGConf APAC
 
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typePostgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typeJumping Bean
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Back to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLBack to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLJoe Drumgoole
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDBPatrick Stokes
 
What Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoWhat Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoScyllaDB
 

Similar a Mongo for aadhaar (20)

MongoDB Basics Unileon
MongoDB Basics UnileonMongoDB Basics Unileon
MongoDB Basics Unileon
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
MongoDB and Web Scrapping with the Gyes Platform
MongoDB and Web Scrapping with the Gyes PlatformMongoDB and Web Scrapping with the Gyes Platform
MongoDB and Web Scrapping with the Gyes Platform
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Graph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBGraph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDB
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searches
 
There is Javascript in my SQL
There is Javascript in my SQLThere is Javascript in my SQL
There is Javascript in my SQL
 
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typePostgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Back to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLBack to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQL
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDB
 
What Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoWhat Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and Go
 

Más de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Más de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Mongo for aadhaar

  • 1. Search data store for the world's largest biometric identity system Regunath Balasubramanian Shashikant Soni regunathb@gmail.com soni.shashikant@gmail.com twitter @regunathb CONFIDENTIAL: For limited circulation only Slide 1
  • 2. India ● 1.2 billion residents ● 640,000 villages, ~60% lives under $2/day ● ~75% literacy, <3% pays Income Tax, <20% banking ● ~800 million mobile, ~200-300 mn migrant workers ● Govt. spends about $25-40B on direct subsidies ● Residents have no standard identity document ● Most programs plagued with ghost and multiple identities causing leakage of 30-40% Slide 2
  • 3. Aadhaar ● Create a common ‘national identity’ for every ‘resident’ ●Biometric backed identity to eliminate duplicates ●‘Verifiable online identity’ for portability ● Applications ecosystem using open APIs ●Aadhaar enabled bank account and payment platform ●Aadhaar enabled electronic, paperless KYC (Know Your Customer) Slide 3
  • 4. Search Requirements ● Multi-attribute query like: name contains ‘regunath’ AND city = ‘bangalore’ AND address contains ‘J P Nagar’ AND YearOfBirth = …… ● Search 1.2B resident data with photo, history ●35Kb - Average record size ● Response times in milliseconds ● Open scale out Slide 4
  • 5. Why MongoDB ● Auto-sharding ● Replication ● Failover … Essentially an AP (slaveOk) data store in CAP parlance ● Evolving schema ● Map-Reduce for analysis ● Full text search ●Compound (or) multi-keys Slide 5
  • 6. Design { _id:123456789, name: ‘abcde’, year:1980, ….. } MongoDB 2 Search API Client App Name=‘abcde’ Solr 1 Address=‘some place’ Indexes Name: ‘abcde’ Year= 1980 Address: ‘some place’ year: 1980 ● Read/Search ●Sharded Solr indexes for search ●Keyed document read from MongoDB ● Write ●Eventual consistency (across data sources) driven by application ●Composite MongodDB-Solr app persistence handler Slide 6
  • 7. Implementation and Deployment ● Start - 4M records in 2 shards Current - 250M records in 8 shards ( 8 x ~2 TB x 3 replicas) ● Performance , Reliability & Durability ●SlaveOk ●getLastError, Write Concern: availability vs durability  j = journaling  w = nodes-to-write ● Replica-sets / Shards – how? RS 1 RS 1 RS 1 Rs 2 RS 2 RS 2 Primary Config 1 Config 2 Config 3 Secondary Arbiter Router Router Router Slide 7
  • 8. Monitoring and Troubleshooting ● Monitoring tools evaluated ●MMS ●munin ● Manual approach - daily ritual ●RS, DB, config, router - health and stats ● Problem analysis stats ●mongostat, iostat, currentOps, logs ●Client connections ● Stats for storage, shards addition ●Data file size ●Shard data distribution ●Replication Slide 8
  • 9. Key Learnings on MongoDB ● Indexing 32 fields ●Compound indexes ●Multi-keys indexes  {…"indexes" : [{ "email":"john.doe@email.com", "phone":"123456789“ }] }  db.coll.find ({ "indexes.email" : "john.doe@email.com" }) ●Indexes use b-tree ●Many fields to index ●Performs well upto 1-2M documents ●Best if index fits in memory ● Data replication, RS failover ●Rollback when RS goes out of sync  Manual restore (physical data copy)  Restarting a very stale node Slide 9
  • 10. Questions? Regunath Balasubramanian Shashikant Soni regunathb@gmail.com soni.shashikant@gmail.com twitter @regunathb CONFIDENTIAL: For limited circulation only Slide 10