CUBRID has many optimizations for SNS. In this presentation CUBRID architect explains the characteristics of Social Networking Services and how CUBRID architecture is designed to meet these demands.
2. 46 CUBRID Reference Architecture for Social Networking Service 2 /
3. Abstract 46 CUBRID Reference Architecture for Social Networking Service 3 / The top ranked facebook celebrity has 44 million fans. The top ranked twitter user has 11 million followers. There are over 900 million objects in the facebook site and 140 million tweets people send per day. Needless to say, these facts heavily impact on database they have. Thus, best practice in database architecture is important. Online social networking (OSN) services have rapidly proliferated and changed the way data is stored and served. Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a view of those small objects customized to a specific viewers at a specific time. Typically, the view is aggregation of events connected by social graph which is changing constantly with users' realtime interaction. Even though the Dunbar's number shows that the number of people with whom one gets stable social relationship is relatively small as 150, in OSN site celebs have a large number of followers so that the social graph is very huge. These properties of the data lead to new challenges, and demands new database architecture to handle them. The main considerations of database architecture for OSN are about scale-out and performance in addition to high availability as mandatory. the main characteristics of OSN service in terms of data are power-law scaling, data feeding frenzy and Zipfian distribution access. Data being delivered are exponentially growing according to the popularity of the service. Cost-effective database scale-out architecture is important to business requirement as well as to technical issues. In this presentation, CUBRID Reference Architecture for social networking service will be shown. The presented architectures are based on best practices developed from real business cases of NHN, biggest portal service provider in Korea. Described are the helpful features to support the database architecture demands for OSN service. For example, index scan with top-k sorting technique is developed for fast feed aggregation. Also, HA, automatic sharding and clustering features of the CUBRID will be explained. Finally, the nStore, a distributed database system based on the CUBRID, will be introduced. Concept of the nStore is similar to Amazon Dynamo but different in that it support SQL.
10. Contents 46 CUBRID Reference Architecture for Social Networking Service 6 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service Business demands and system requirements Main considerations of database architecture for OSN service Scale-out, performance, and high availability
11. Contents 46 CUBRID Reference Architecture for Social Networking Service 7 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID unique features CUBRID reference architecture for social networking service Index scan with top-k sorting technique High availability feature Automatic sharding component CUBRID Cluster System nStore, a distributed database system based on the CUBRID
12. Contents 46 CUBRID Reference Architecture for Social Networking Service 8 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service CUBRID Web Reference Architecture CUBRID SNS Reference Architecture
13. 46 CUBRID Reference Architecture for Social Networking Service 9 / Characteristics of online social networking service
14. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 10 / The history and evolution of OSN are made in last 10 years. Source http://blog.skloog.com/history-social-media-history-social-media-bookmarking/
15. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 11 / 500 million Facebook users, 106 million Twitter users Social networks with user bases larger than the population of most countries Source http://www.digitalsurgeons.com/facebook-vs-twitter-infographic/
16. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 12 / The top ranked twitter user, Lady Gaga, has 11 million followers. About 55 million Tweets per day. Twitter gets about 600 million queries every day. (http://twitaholic.com) Source http://www.digitalbuzzblog.com/infographic-twitter-statistics-facts-figures/
17. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 13 / The most followed person, Eminem, has more than 44 million fans. More than 5 billion pieces of content shared each week. 2,716,000 messages, 1,587,000 wall posts, 10,208,000 comments in 20 minutes on Facebook. (http://www.independent.co.uk) Source http://www.digitalbuzzblog.com/facebook-statistics-facts-figures-for-2010/ Source http://www.digitalbuzzblog.com/facebook-statistics-stats-facts-2011/
18. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 14 / Have we reached a world of infinite information? In a similar manner to our universe, the Internet is expanding at an incredibly rapid pace, reaching new levels of information storage and content creation every second. By 2020, roughly 25x1018 (quintillion) information containers Every minute, 24 hours of video The growth gap between the digital contents created and the available storage Sourcehttp://www.flowtown.com/blog/have-we-reached-a-world-of-infinite-information
19. Statistics of Facebook and Twitter 46 CUBRID Reference Architecture for Social Networking Service 15 / 140 million; the average number of Tweets people sent per day. 6,939;current TPSrecord. More than 750 million active users. There are over 900 million objects that people interact with (pages, groups, events and community pages) Source http://www.facebook.com/press/info.php?statistics Source http://blog.twitter.com/2011/03/numbers.html
20. Statistics of Me2Day 46 CUBRID Reference Architecture for Social Networking Service 16 / Postings per day: 278,461 Total postings: 123,456,727 Total photos: 10,638,089
21. Online social networking service 46 CUBRID Reference Architecture for Social Networking Service 17 / Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a aggregation of events connected by social graph which is changing constantly with users' realtimeinteraction.
22. Feed Following Works 46 CUBRID Reference Architecture for Social Networking Service 18 / Feeds Following Contents (comment, photo, tag, …) Follower News Feeds (personalized feeds) Application Layer Outbox Inbox Delivery & Aggregation Engine Content Management Layer Cache Database Database Data Storage Layer
25. Highly variable and somewhat bit fan-out of the follows graph makes data feeding difficult to implement and requires high cost to operate.Online social networks have properties of significant clustering, small diameter, and power-law degrees. Zipfiandistribution access Data feeding frenzy Twitter Activity 5% of users account for 75% of all activity, 10% account for 86% of activity, and the top 30% account for 97.4%.
26. 46 CUBRID Reference Architecture for Social Networking Service 20 / Challenges and demands on database architecture
27.
28. Today social media generates more information in a short period of time than was previously available in the entire world a few generations ago.
29. Not only the exponential growth of Facebook, Google+, Twitter, but also the use of more and more rich media such as user-generated video from smart phone, is surely driving big data.Source http://www.itu.int/net/itunews/issues/2010/06/35.aspx
30. Social media now produces massive amounts of data. Facebook’s network, for instance, consists of 100 million entities generating tens of millions of events per second. Twitter, meanwhile, funnels 140 million public tweets a day. [GigaOM research notes] With enterprise data volumes moving past terabytes to tens of petabytes and more, business and IT leaders face significant opportunities and challenges from big data. For a large enterprise, big data may be in the petabytes or more; for a small or mid-size enterprise, data volumes that grow into tens of terabytes may become challenging to analyze and manage. When an application is being designed, software architects need to plan for much greater application load to avoid major redesigns in the future. While scaling out web servers can be done quite easily, properly scaling out database servers is far more challenging and happens. Challenge and Demands on Database Architecture 46 CUBRID Reference Architecture for Social Networking Service 22 / Managing user generated socialinteraction data! Coping with explosion in data volume! Cost-effective scale-out to meet rapidly growing demands!
31. 46 CUBRID Reference Architecture for Social Networking Service 23 / CUBRID unique features
32. CUBRID 46 CUBRID Reference Architecture for Social Networking Service 24 / Free open source is the choice of the modern world Powerful clean architecture with rich functionality for competitive performance Enterprise unique features for stability and reliability
41. Full SQL function supportCUBRID became an open source project. CUBRID 2008 R1.1 stable was released. November, 2008 First internal release CUBRID 2008 R1.0 October, 2008 The development of CUBRID DBMS started. 2011 2006 2007 2008 2009 2010 2012
42. CUBRID Index Scan with Top-k Sorting Technique 46 CUBRID Reference Architecture for Social Networking Service 26 / CUBRID does multi-range index scan. My friends’ newest twenty comments SELECT post_no FROM postsWHERE id IN (4, 15, 36, …) AND registered_date < 20000 ORDER BY registered_date DESC LIMIT 20 Multi-range scan Single range scan with key filter Disk I/O ?! # of leaf pages accessed > # of keys of scan result # of leaf pages accessed = # of keys of scan result Filter out On the fly sorting during scan Sort after scan (4,10001) (4,9999) (4,875) … (4,10001) (4,9999) (4,875) … (36,947) (36,120) (36,3) … (36,947) (36,120) (36,3) … (15, 10000) (15,9999) (15, 7467) … (15, 10000) (15,9999) (15, 7467) …
43. CUBRID Index Scan with Top-k Sorting Technique 46 CUBRID Reference Architecture for Social Networking Service 27 / SELECT * FROM tbl WHERE a IN (2, 4, 5) AND b < ‘K’ ORDER BY b LIMIT 3; SELECT * FROM tbl WHERE a = 2 AND b < ‘K’ ORDER BY b LIMIT 3;
44. CUBRID Test Results 46 CUBRID Reference Architecture for Social Networking Service 28 / Refer http://www.cubrid.org/cubrid_mysql_sns_benchmark_test Test case 1: user group 1 only Test case 2: user group 2 only Test case 3: 40% of user group 1, 50% of user group 2, 10% of user group 3 Test case 4: 10% of user group 1, 50% of user group 2, 40% of user group 3 User group 1: users with 50 or less friends User group 2: users with 51~2000 friends User group 3: users with friends up to tens of thousands
48. Various acess modes (read-write, read-only)Application CUBRID Driver CUBRID Driver UPDATE SELECT UPDATE Broker Active Broker Backup Broker automatic switch-over Read-Only Mode Read-Write Mode Standby-2 Server @Remote IDC Standby-1 Server automatic fail-over/fail-back Active Server Database Server Slave DB Master DB Slave DB
49. CUBRID High Availability Feature 46 CUBRID Reference Architecture for Social Networking Service 30 / UPDATE SELECT Heartbeat Heartbeat Log Applying Log Applying Log Shipping (synchronous) Log Writer Log Applier CUBRID Server Log Writer Log Applier CUBRID Server Slave DB Replication Log Replication Log Transaction Log Transaction Log Master DB S1-Node Standby Server Node A-Node Active Server Node Log Shipping (asynchronous) Heartbeat SELECT Log Applying HA feature is based on database replication with transaction log multiplication technique. Slave DB Replication Log Transaction Log Statement-based replication could cause data inconsistency. S2-Node
56. Additionally, linear scalabilityApplication SELECT * FROM gtable WHERE part_key=2 AND … INSERT INTO gtable … Broker load balancing global schema / distributed partition gtable part_01 part_05 gtable part_02 part_06 gtable part_03 part_07 gtable part_04 part_08 Node #1 Node #2 Node #3 Node #4 Cluster Server
57. CUBRID Cluster System 46 CUBRID Reference Architecture for Social Networking Service 33 / The global schemais a single representation or a global view of all nodes where each node has its own database and schema. SELECT * FROM contents WHERE auth = (SELECT name FROM author WHERE …) Local Schema User Global Schema User UPDATE local … SELECT * FROM contents WHERE … SELECT * FROM info, code WHERE info.id = code.id INSERT INTO contents… info contents author Global Schema author code level local contents contents contents info Local Schema #4 Local Schema #3 Local Schema #2 Local Schema #1 The users can access any databases through a single schema regardless of and without knowing the location of the distributed data. Database #1 Database #2 Database #3 Database #4
58. CUBRID Cluster 46 CUBRID Reference Architecture for Social Networking Service 34 / Global Schema Data System Catalog Logical View Logical View Index Physical View Physical View Schema Schema Data System Catalog System Catalog Data Index Index
59. CUBRID Cluster 46 CUBRID Reference Architecture for Social Networking Service 35 / The distributed partition maps global schema onto table partitioning. Partitions are resident in different nodes but accessed through global schema. SELECT * FROM gtable, info WHERE gtable.part_key=02 AND info.id = gtable.id gtable – PARTITION BY HASH (part_key) info part_01 part_02 part_03 part_04 Global Schema part_05 part_06 part_07 part_08 Partition Data Partition Data Partition Data Partition Data part_02 part_03 part_03 part_01 info part_06 part_07 part_08 part_05 Database #1 Database #2 Database #3 Database #4
60.
61.
62.
63. nStore, a distributed database system based on the CUBRID 46 CUBRID Reference Architecture for Social Networking Service 38 / Application Container Server Container (ckey=iamyaw) nStore Equi-join REST API Table A Table B Container Server Table C Indexed Column Equi-join Container Server Container Server Global Table G Management Node Indexed Column Container (ckey=kieun_park) Equi-join Container Server Table A Table B Tables Table C Indexed Column Distribution layer RDBMS Indexed Column
64. nStore Test Results 46 CUBRID Reference Architecture for Social Networking Service 39 / Tested using YCSB (http://research.yahoo.com/Web_Information_Management/YCSB) INSERT: 50,000,000 records (1K size) READ: Zifian distribution READ w/ compaction: after SSTable compaction (Cassandra, Hbase) READ/UPDATE: 50:50 (50,000,000 records DB) READ/INSERT: 50:50 (50,000,000 records DB)
65. 46 CUBRID Reference Architecture for Social Networking Service 40 / CUBRID referencearchitecture for social networking service
66. CUBRID Web Reference Architecture 46 CUBRID Reference Architecture for Social Networking Service 41 / Mid-size web service Web Server (User Interface) Small-size web service Web Application Server (Business Logic) Cache Server Web Server RW RO DB Sharding master master master master CUNITOR master slave slave slave slave slave CUBRID HA CUBRID HA
67. Social Networking Service Architecture 46 CUBRID Reference Architecture for Social Networking Service 42 / Web Servers (User Interface) Cache Layer Web Application Servers (Business Logic) Social Query Engine Aggregation Engine Delivery Engine Search Engine Recommendation Engine User Profile DB Social Relation DB Analytics DB Feed Outbox DB Feed Inbox DB Search Index
68. CUBRID SNS Reference Architecture 46 CUBRID Reference Architecture for Social Networking Service 43 / Analytic DB partitioned for OLAP Application servers ETL Cache server farm node #2 node #n node #1 CUBRID Cluster User profile DB sharded by user-id Social relation DB sharded by user-id Inbox/Outbox storage distributed according to user-id OAM RW RO RW RO broker broker DB Sharding container container DB Sharding container container management slave slave slave slave monitoring server container container nStore w/ CUBRID CUNITOR master master master master CUBRID HA CUBRID HA
69. Best Practices 46 CUBRID Reference Architecture for Social Networking Service 44 / High available database architecture is the basic business requirements and not technical barrier anymore. Automatic shardingis an effective way to scale-out DB system storing relational model data. nStore is a solution for peta-byte scale data with benefits of high available and scalable distributed store.