Más contenido relacionado


Database Architecture - Case Study - SMS Gyan.pdf

  1. Database Architecture Case Study: SMS Gyan
  2. About Me Shyam Anand Senior Software Engineer at Google. Previously worked with several startups. Works on distributed systems, system architecture, etc.
  3. We’ll talk about databases. With a case-study of SMS Gyan. ● SMS Gyan was launched by Innoz in 2008. ● An SMS based answering engine, came to be known as “Internet on SMS”. SMS Introduction airtel SMS Gyan HTTP
  4. Database Systems Data modelling is perhaps the most important part of developing software. Decision on how to structure, store, and retrieve data can affect the entire application, throughout its life. There are several factors to consider while choosing a database, such as, ● Structure of the data ● Expected data volume ● Performance requirements
  5. Relational vs Non-Relational Databases Relational For structured data. Stores data in tables that may share information (and hence, “Relational”). Uses JOIN queries to access data in different tables. Performance tuning becomes necessary with large volumes of data. Relatively difficult to scale out. Lacks flexibility in how data is stored. Atomicity, Consistency, Isolation, and Durability (ACID) guarantees. Non-Relational For unstructured data (documents). No concept of tables, fields/columns. MongoDB, And Elasticsearch store data as JSON-like documents. Supports data locality. Can easily support very large volumes of data. Easier to scale out, because of native support for replication, sharding, etc. Can support changes to the structure of data stored, making it easier to modify the application layer. No transactions (typically), so no ACID guarantees. Some provide Eventual Consistency.
  6. Consistency or Availability? ● Network partitions will inevitably happen in a distributed system. ● Choosing between a relational vs non-relational db can boil down to this question.
  7. ● The first version was a simple PHP app with a MySQL database. ● Supported a few hundred users and a few hundred queries a day. SMS Gyan Telecom Operator mysql <network> backend smsgyan app
  8. MySQL ● A Relational Database Management System (RDBMS). ● One of the most popular databases. ● Free and open-source, easy to get started. ● Reliable and scalable.
  9. Data Modelling Need to store ● The queries from users ● The answers to the queries (as a local cache) ● User details (network operator, whether a subscriber, etc) phone network is_subscribed query result source query_ts 9876543210 airtel 1 MySQL MySQL is … wikipedia 2009-11-10 12:00:00
  10. Schema phone query query_ts 9876543210 MySQL 2009-11-10 12:00:00 phone network is_subscribed last_active 9876543210 airtel 1 2009-11-10 queries users query result source MySQL MySQL is an open-source ... wikipedia knowledge_base
  11. High volume of airtel 121 requests ● The application was receiving a large number of requests (> 1000 qps). ● Caused the database to become slow, and the requests to fail (SLA violation). Scaling App DB Airtel X X
  12. MySQL Replication smsgyan smsgyan/ users smsgyan /users Writes Reads Reads smsgyan app network services
  13. Architecture Details smsgyan /users smsgyan db smsgyan app Replication network services smsgyan /users
  14. ● MySQL FULLTEXT index was used. ● The results were sometimes not accurate, especially for queries that are sentences or phrases. ● MySQL performance was deteriorating as the data volume was increasing. Improving search results query result source MySQL MySQL is an open-source... wikipedia queries
  15. ● Designed for really fast text searches. Supports stemming, ranking, etc. ● Data is stored as documents. Provides REST APIs to read and write data. ● Highly available, scalable, and (relatively) easy to configure. ● Natively supports sharding and replication. Elasticsearch smsgyan app Elasticsearch cluster
  16. Elasticsearch Cluster: Consists of one or more nodes. Node: An instance of ES. Index: A logical namespace, maps to one or more primary shards, and can have 0 or more replica shards. Document: A record stored in ES. Shard: A single low-level worker unit managed by ES. Primary Shard: Each document is stored in a primary shard. Replica Shard: A copy of a Primary shard. Each primary shard can have 0 or more replicas. Replicas help distribute ES’s load, and can help in failover if a primary shard is unavailable.
  17. Caching Pagination of results ● SMS replies put a limit on the length of content, so a whole wikipedia article would be returned as several pages. ● Users need to send SMS to retrieve each page.
  18. Redis ● A distributed, in-memory data structure store. ○ Can store simple key values, Sets, Lists, Ordered Lists, etc., and can perform operations such as Set union/intersection, push/pop to/from Lists, etc. ● Can be used as an in-memory key-value db, cache, and message broker. ● Durability is optional. ● Different function from the databases discussed earlier.
  19. Redis In SMS Gyan 1. Fetch query result (database, or source on internet) 2. Write the entire result into cache, with user’s phone number as key. 3. Extract a page (upto 240 characters) and send to user, remove the served page from the content in cache. 4. If user requests more pages, do step 3. 5. Clear the key if a. The user sends a different query, or b. There is no request from user for a specific period of time.
  20. Thank you!