3. BACKGROUND OSN
● Huge amount of data, diverse and changing over time. Likes,
sharing, comments, logins, page-views, search queries.
● New approaches to manipulate it.
● Distributed Databases, NoSQL.
● How to retrieve the data (Search relevance,
Recommendations, Security against abusive behavior,
Newsfeed features)
● Goal: massive scaling of demand: Unstructured, Semi-
Facebook Twitter LinkedIn
2.7M likes &
comments/da
y
500M
tweets/day
300+ M.Users(2 new/s).
200 group
conversation/min
4. STORING AND QUERYING AT TWITTER
● Storage:
o MySQL used as key-value store.
o FlockDB to Twitter Social Graph.
● Desired queries:
o TrendingTopics
o Breaking news
o Sentiment
7. TAO
o Architecture and Data Model:
Objects: (id) → (otype, (key ? value)∗)
Associations: (id1, atype, id2) → (time, (key ? value)∗)
o MySQL to the Storage Layer.
o Main challenges:
Efficience scale.
Very fast response time.
High Read Availability.
8. Professional Social
Network
Data Driven Features:
● Recomendation System
(people you may know)
● People Search (Jobs
search - candidates)
● Who view your profile?
● Events you may be
9. STORAGE - VOLDEMORT
Highly Available Distrib. KV
Store
10 Voldemort Clusters
(+100 nodes) - 9 of BDB
Layered Design
All layers – single interface:
-Put/Delete/Get
-Flexible
-Every layer->decorates
next one
10. STORAGE - VOLDEMORT
Voldemort provides:
•High available
•Low latency
•Distributed
Like a Distrib. Hash Table
(DHT).
Storage Data engine on
nodes:
•Compact index
•Data files
12. SUMMARY
Problem Solved Main Advantages
EarlyBird
Real time search Fast indexing,
concurrence Management
TAO
Storing Facebook
Social Graph
Very fast response time.
High read availability.
Voldemort
Simple Data
Partitioning to
meet scalability
needs
High Scalable, Seamless
replication
13. CONCLUSION
• The selection of the database systems depends on the
needs of the applications and the primary type of
information of the social network.
• Many OSN have developed their own solutions to cope with
the ever growing nature of big data and its challenges.
• Summarizing, the main features that the data solutions
should have are:
• Storage huge amount of data.
• Fast read and low latency.
• Processing of big data (meaningful results)
• Streaming and indexing are critical.
14. EXTREMELY DIFFICULT QUESTIONS
1.Why did LinkedIn needed to build their own
solution Voldemort?
2.How does TAO resolve the challenges it was built
for?
3.How the real time search service works at
twitter?
15. REFERENCES
● Auradkar, A., Botev, C., Das, S., De Maagd, D., Feinberg, A., Ganti, P., … Zhang, J.
(2012). Data Infrastructure at LinkedIn. In Data Engineering (ICDE), 2012 IEEE 28th
International Conference on (pp. 1370–1381).
● N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo,
S. Kulkarni, and H. Li, “Tao: Facebook’s distributed data store for the social graph,” in
USENIX ATC, 2013.
● N. Ruflin, H. Burkhart, and S. Rizzotti, “Social-data storage-systems,” Databases Soc.
Networks - DBSocial ’11, pp. 7–12, 2011.
● A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H.
Liu, “Data warehousing and analytics infrastructure at facebook,” Proceedings of the
2010 ACM SIGMOD International Conference on Management of data. ACM,
Indianapolis, Indiana, USA, pp. 1013–1020, 2010.
● D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel, “Finding a Needle in Haystack:
Facebook’s Photo Storage,” in OSDI, 2010, vol. 2010, pp. 47–60.
● M. Busch, K. Gade, B. Larson, P. Lok, S. Luckenbill, and J. Lin, “Earlybird: Real-Time
Search at Twitter,” Proceedings of the 2012 IEEE 28th International Conference on Data
Engineering. IEEE Computer Society, pp. 1360–1369, 2012.
16. REFERENCES (II)
● D. Borthakur, J. Gray, J. Sen Sarma, K. Muthukkaruppan, N. Spiegelberg, H. Kuang, K.
Ranganathan, D. Molkov, A. Menon, S. Rash, R. Schmidt, and A. Aiyer, “Apache hadoop goes
realtime at Facebook,” Proceedings of the 2011 ACM SIGMOD International Conference on
Management of data. ACM, Athens, Greece, pp. 1071–1080, 2011.
● A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu, “Data
warehousing and analytics infrastructure at facebook,” Proceedings of the 2010 ACM SIGMOD
International Conference on Management of data. ACM, Indianapolis, Indiana, USA, pp. 1013–
1020, 2010.
● C. Chen, F. Li, B. C. Ooi, and S. Wu, “TI: an efficient indexing mechanism for real-time search
on tweets,” Proceedings of the 2011 ACM SIGMOD International Conference on Management of
data. ACM, Athens, Greece, pp. 649–660, 2011.
● G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin, “Fast data in the era of big data: Twitter’s real-
time related query suggestion architecture,” Proceedings of the 2013 ACM SIGMOD
International Conference on Management of Data. ACM, New York, New York, USA, pp. 1147–
1158, 2013.
● S. Cohen and B. Kimelfeld, “A Social Network Database that Learns How to Answer Queries ∗,”
2013.
Web applications and online social network (OSN) produce a huge amount of real time content. Within these, there are social data that is generated for example by users of Facebook, Twitter, and LinkedIn. So, this content grows apace and becomes a rising need to get new approaches to manipulate it, because the data becomes difficult to capture and impossible to storage in conventional databases systems. i.e face.,twi. it is like 7 TB per day(2+ PB per year)
Also, the diverse and variety social data changes over the time and makes the structure a challenge for storage. The new community NoSQL creates different types of storage system for recording.
Event data includes (1) user activity events corresponding to logins, page-views, clicks, “likes”, sharing, comments, and search queries; (2) operational metrics. This data now require online consumption for (1) search relevance, (2) recommendations which may be driven by item popularity or co-occurrence in the activity stream, (3) security applications that protect against abusive behaviours such as spam or unauthorized data scraping, (4) newsfeed features that aggregate user status updates or actions for their “friends” or “connections” to read, and (5) real time dashboards of various service metrics.
How to retrieve the data (Search relevance, Recommendations (Eg: Popularity Driven), Security against abusive behavior (spam), Newsfeed features)
Graph: Nodes (people) and Edges (interactions/relationships).
Storage i.e., column store could be a good option for social data because it is almost scalable as key-value store is and also is able to store semi-structured data with simple indices which allows doing simple queries. Twitter uses MySQL in the way of a key-value store.
Hadoop: Distributed file system(Automatic replication, fault tolerance). MapReduce- based in parallel computation (key value based computation). Powerful (sorted data very fast). Open source. Scalable
A user queries a social network in pursuit of a desired outcome. People search Twitter to find temporally relevant information (e.g., breaking news, real-time content, and popular trends) and information related to people (e.g., content directed at the searcher, information about people of interest, and general sentiment and opinion). Twitter queries are shorter and popular.(tweets are short, frequent , and do not change after being posted)
The systems incorporate abstract predicates relevant to social networks as primitive building blocks in the query language, uses machine learning as an integral part of the query processor, to select and improve upon the predicate implementations.
i.e Twitter: allows users to search by words, people,places,and tweet properties. i.e. social network query language: SoQL and SociQL, based on SQL; SNQL, etc.
Earlybird: (retrieval engine that lies at the core of Twitter’s real-time search service) it is specifically designed to handle tweets. It maintains an inverted index, manages concurrency and uses a ranking function that combines relevance signals and the user’s local social graph to compute a personalized relevance score for each tweet.
Ingested tweets first fill up a segment before proceeding to the next one. At any given time, there is at most one index segment actively being modified, whereas the remaining segments are read-only. Once an index segment ceases to accept new tweets, it converts from a write-friendly structure into an optimized read-only structure. The highest-ranking, most-recent tweets are returned to the Blender, which merges and re-ranks the results before returning them to the user.
LinkedIn has a number of data-driven features, including People You May Know, Jobs you may be interested in, and LinkedIn Skills.
Building these features involves 2 phases:
offline computation
Online serving.
As an early adopter of Hadoop, they were able to scale the offline computation phase successfully.
Difficult part has been bulk loading the output of this computation phase into the online serving system without causing performance degradation.
Batch computed algorithms (with map reduce with hadoop).
To make all this offline data available to the live site, we've developed a multi-terabyte scale data pipeline from Hadoop to our online serving layer, Project Voldemort.
How do we serve these massive outputs to our 300 million members?
RDBMS: ORACLE
Expresso: Doc-oriented Data store with hierarchical indexing.
Kafka: High Volume Low Latency Messaging System
Collecting & delivering Event Data.
Uses a messaging API to support real time and offline consumption.
Publisher/Consumer scheme
10 B message writes/day.
Solves Real time log processing!
Kafka solves this problem: Moves arround large amounts of data in a robust & escalable manner.
Streaming: Databus (DB stream replication) & Kafka ((pub/sub user activitity and logs)
Databus: Timeline – Consistent Change Data Capture
Kafka:
Provides feeds of data to Applications & Other Data Systems
Enable near real-time processing of Data. (newsfeed & other asynch
Transport of updates to subscriber systems.
Kafka alone is not sufficient, as it lacks a processing engine and the ability to persist data over long spans
Simple Data Partitioning to meet scalability needs.
Primary goals: High performance & Availability. DB supports only the most minimal Schema.
Schema in JSON.
Speed and availability -> Distributed key-Value system.
Bulk load massive data sets -> Offload index construction to processing system
Port: 666
Storage is secondary, is really about distributing & recovering data across a set of nodes.
Main Cluster has 60% read 40% writes
Techniques: Decentralized -> no Master
Data partitioned and replicated via consistent hashing
Multinode read and writes for redundancy
Versioning 4 consistency… Vector clocks for this. Non locking optimistic locking (nodeID, counter) tuples on each node. Each object has a vector clock, updated on each write and examined on each read.
Pluggable persistence (BDB, MySQL, HADOOP (RO), MongoDB
We currently house roughly ten clusters, spanning more than a hundred nodes, holding several hundred stores (database tables).
Berkeley DB (BDB) is a software library that provides a high-performance embedded databasefor key/value data.
Data is cached in oracle storage
Voldemort is a distributed Key value storage system 4 high capacity storage. Data is stored under a key and partition and replicated among multiple servers. In case of failure conflicts are resolved using versioning. Voldemort is not a relational DB.
Voldemort is a: Big, distributed, persistent, fault tolerant hash table”.
API involve 3 operations only: Get, Put and Delete. Keys & Values can be complex objects.
Client allows pluggable interfaces.
Stack
Conflict resolution: Multiple reads & writes, multiple versions. Latest version hides most of the difficulties of getting last version.
Serialization:
Network: You can choose client side or server side.
Client connects to the cluster and get metadata describing how the nodes and the hashing are set
Data is replicated automaticaly over multiple servers. Storage is pluggable on disk using BerkeleyDB or MySQL.
Can be used with Hadoop.
Info is stored in HDFS By several map reduce jobs by hadoop and the is pulled by Voldemort.
A store is like a DB table, each store is split into partitions and this partitions into chunk sets.
One reducer = one chunk set
Chunk Set = Index + data file
storage engine is made up of “chunk sets”—multiple
pairs of index and data files. The index file is a compact structure
containing a hash of the key followed by the offset to the corresponding value in the data file
Node independence: Each node is independent of other nodes with no central point of failure or coordination
For versioning and Conflict Resolution:
Vector Clocks (support optimistic locking in the client).
We currently house roughly ten clusters, spanning more than a hundred nodes, holding several hundred stores (database tables).
Its similar to share nothing architectures in Distributed systems, which can grow indefinetely.
Berkeley DB (BDB) is a software library that provides a high-performance embedded databasefor key/value data.
Also use Oracle.
it has been designed for frequent transient and short- term failures
Each store maps to a single cluster, with the store partitioned over all nodes in the cluster.
Lookups O(1).
Voldemort uses a technique of consistent hashing to distribute load evenly on the servers.
Key hashes to a point on fixed circular space.
Avoids problem of linear hashing when you remove a bucket, all your keys has to move!
Hash space (32 bit number)
Fix boundaries. Can resize buckets, very efficient for rebalancing.
In case of server failure, its keys will be distributed equally to other servers in cluster.
Ring node structure of size Q partitions, larger than number of nodes S. Each node is assigned Q/S partitions, it appears multiple times in a ring. Keys are held by primary node as well as its K unique successors in clockwise direction.
Advantage of predictable performance compare to SQL queries in Relational DB. Get operation returns all values associated with a key organized into lists.