Social databases - A brief overview

SOCIAL DATA AND DBS
IVAN SANCHEZ
JULIO SALINAS
MARLENE ROBLES

CONTENTS
•Background
•Study Cases:
•Twitter: Real time search-Earlybird
•Facebook: Storage.
•LinkedIn: Storage (Voldemort)
•Conclusion
•References

BACKGROUND OSN
● Huge amount of data, diverse and changing over time. Likes,
sharing, comments, logins, page-views, search queries.
● New approaches to manipulate it.
● Distributed Databases, NoSQL.
● How to retrieve the data (Search relevance,
Recommendations, Security against abusive behavior,
Newsfeed features)
● Goal: massive scaling of demand: Unstructured, Semi-
Facebook Twitter LinkedIn
2.7M likes &
comments/da
y
500M
tweets/day
300+ M.Users(2 new/s).
200 group
conversation/min

STORING AND QUERYING AT TWITTER
● Storage:
o MySQL used as key-value store.
o FlockDB to Twitter Social Graph.
● Desired queries:
o TrendingTopics
o Breaking news
o Sentiment

REAL TIME SEARCH AT TWITTER: EARLYBIRD

TAO AND THE FACEBOOK SOCIAL GRAPH

TAO
o Architecture and Data Model:
 Objects: (id) → (otype, (key ? value)∗)
 Associations: (id1, atype, id2) → (time, (key ? value)∗)
o MySQL to the Storage Layer.
o Main challenges:
 Efficience scale.
 Very fast response time.
 High Read Availability.

Professional Social
Network
Data Driven Features:
● Recomendation System
(people you may know)
● People Search (Jobs
search - candidates)
● Who view your profile?
● Events you may be

STORAGE - VOLDEMORT
Highly Available Distrib. KV
Store
10 Voldemort Clusters
(+100 nodes) - 9 of BDB
Layered Design
All layers – single interface:
-Put/Delete/Get
-Flexible
-Every layer->decorates
next one

STORAGE - VOLDEMORT
Voldemort provides:
•High available
•Low latency
•Distributed
Like a Distrib. Hash Table
(DHT).
Storage Data engine on
nodes:
•Compact index
•Data files

DISTRIBUTED HASHING ALGORITHM
This slide is from Roshan Sumbaly & Jay Kreps! (thanks Rosh & Jay)

SUMMARY
Problem Solved Main Advantages
EarlyBird
Real time search Fast indexing,
concurrence Management
TAO
Storing Facebook
Social Graph
Very fast response time.
High read availability.
Voldemort
Simple Data
Partitioning to
meet scalability
needs
High Scalable, Seamless
replication

CONCLUSION
• The selection of the database systems depends on the
needs of the applications and the primary type of
information of the social network.
• Many OSN have developed their own solutions to cope with
the ever growing nature of big data and its challenges.
• Summarizing, the main features that the data solutions
should have are:
• Storage huge amount of data.
• Fast read and low latency.
• Processing of big data (meaningful results)
• Streaming and indexing are critical.

EXTREMELY DIFFICULT QUESTIONS
1.Why did LinkedIn needed to build their own
solution Voldemort?
2.How does TAO resolve the challenges it was built
for?
3.How the real time search service works at
twitter?

REFERENCES
● Auradkar, A., Botev, C., Das, S., De Maagd, D., Feinberg, A., Ganti, P., … Zhang, J.
(2012). Data Infrastructure at LinkedIn. In Data Engineering (ICDE), 2012 IEEE 28th
International Conference on (pp. 1370–1381).
● N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo,
S. Kulkarni, and H. Li, “Tao: Facebook’s distributed data store for the social graph,” in
USENIX ATC, 2013.
● N. Ruflin, H. Burkhart, and S. Rizzotti, “Social-data storage-systems,” Databases Soc.
Networks - DBSocial ’11, pp. 7–12, 2011.
● A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H.
Liu, “Data warehousing and analytics infrastructure at facebook,” Proceedings of the
2010 ACM SIGMOD International Conference on Management of data. ACM,
Indianapolis, Indiana, USA, pp. 1013–1020, 2010.
● D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel, “Finding a Needle in Haystack:
Facebook’s Photo Storage,” in OSDI, 2010, vol. 2010, pp. 47–60.
● M. Busch, K. Gade, B. Larson, P. Lok, S. Luckenbill, and J. Lin, “Earlybird: Real-Time
Search at Twitter,” Proceedings of the 2012 IEEE 28th International Conference on Data
Engineering. IEEE Computer Society, pp. 1360–1369, 2012.

REFERENCES (II)
● D. Borthakur, J. Gray, J. Sen Sarma, K. Muthukkaruppan, N. Spiegelberg, H. Kuang, K.
Ranganathan, D. Molkov, A. Menon, S. Rash, R. Schmidt, and A. Aiyer, “Apache hadoop goes
realtime at Facebook,” Proceedings of the 2011 ACM SIGMOD International Conference on
Management of data. ACM, Athens, Greece, pp. 1071–1080, 2011.
● A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu, “Data
warehousing and analytics infrastructure at facebook,” Proceedings of the 2010 ACM SIGMOD
International Conference on Management of data. ACM, Indianapolis, Indiana, USA, pp. 1013–
1020, 2010.
● C. Chen, F. Li, B. C. Ooi, and S. Wu, “TI: an efficient indexing mechanism for real-time search
on tweets,” Proceedings of the 2011 ACM SIGMOD International Conference on Management of
data. ACM, Athens, Greece, pp. 649–660, 2011.
● G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin, “Fast data in the era of big data: Twitter’s real-
time related query suggestion architecture,” Proceedings of the 2013 ACM SIGMOD
International Conference on Management of Data. ACM, New York, New York, USA, pp. 1147–
1158, 2013.
● S. Cohen and B. Kimelfeld, “A Social Network Database that Learns How to Answer Queries ∗,”
2013.

LINKS
● https://www.usenix.org/conference/atc13/technical-
sessions/presentation/bronson
● http://www-
conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1105
_DhrubaBorthakur.pdf
● http://www.slideshare.net/linkedin/jay-kreps-on-project-
voldemort-scaling-simple-storage-at-linkedin
● http://data.linkedin.com/
● http://www.infoq.com/presentations/Project-Voldemort-at-
Gilt-Groupe

Social databases - A brief overview

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Social databases - A brief overview

Similar a Social databases - A brief overview (20)

Más de Iván Sanchez Vera

Más de Iván Sanchez Vera (20)

Último

Último (20)

Social databases - A brief overview

Notas del editor