The BotNet benchmark is a benchmark for SPARQL query engines based on social network scenario's. This presentation first gives an overview of the benchmark and discusses the limitations of the current version. Then it gives the aspects in which we want to improve the benchmark and work that has already been done in this direction.
The BotNetBenchmark presentation was presented by Ying Zhang (CWI) at the PlanetData project Meeting on February 28 - March 4, 2011 in Innsbruck, Austria.
1. BotNetBM
A Benchmark for Social Network
CWI
Project Meeting@Innsbruck
Feb 28 - Mar 04, 2011
Wednesday, March 02, 2011
2. Motivation
— Highly linked data
— No (good) benchmark yet for social
networks
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
3. BotNetBM
— A benchmark for social networks
— Simulates an RDF OLTP backend
— Simulates random activities of large #users
— Simulates on-site “analyst” ➠ weekly
“analytic report”
— One parameter: scale (#user accounts to
start BM)
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
4. BotNetBM Queries
— SPARQL 1.1 + SPARUL
— User Actions
◦ Interactive queries (80%)
◦ Update transactions (20%)
— Measurement: successful #clicks/min.
◦ Transactions commit, penalty for > 3 sec.
◦ Interactive queries response time < 3 sec.
— Analytic queries (must finish within simulated weekend)
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
5. Limitations
— Data generator: too uniform, not realistic for social networks
◦ 10 operations / user / simulated day
◦ all users are equally active
◦ some queries have no “meaningful” relation to each other
◦ read/write contention unrealistically frequent
◦ ...
— Query mix:
◦ Does not exploit SPARQL 1.1 advanced features
◦ No link to other RDF datasets
— Queries do not run with the open source ed. of Virtuoso Server
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
6. Our Goals
— Exploit SPARQL1.1 features in queries
◦ “Property Path Expressions”
— Add links to well-know RDF data sets into the queries
◦ DBpedia
— Use real-life analysis info (e.g., twitter)
◦ redesign data generator
◦ distribution of interactive/update queries
— Use real-life social network data
◦ twitter, facebook, orkut, MySpace, ...
— Migration to MonetDB
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
7. Done
— Loaded into the Virtuoso Server (commercial ed.)
— Design of new query mix
— Twitter datasets
◦ http://infochimps.com/collections/twitter-census
◦ http://an.kaist.ac.kr/traces/WWW2010.html
◦ http://snap.stanford.edu/data/twitter7.html
◦ http://twitter.mpi-sws.org/
— Analysis information
◦ “The Man Your Man Could Smell Like: Twitter Analytics Report”
◦ “Characterizing user behavior in online social networks”
◦ “User Interactions in Social Networks and their Implications”
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
8. Interactive & Analytic Queries
Q1 - Q8: Information of Profiles & Friends
1. Find all users whose first names contain a particular string, e.g., “Minh”.
2. Return the names of people who study in the same school and have the same age as a user. These
people can be the classmates of the user.
3. Find people studied from the same school that connect with you by a path of friend relationship. (Use
the “Property Path Expression” in SPARQL 1.1 with arbitrary length path)
4. Find all friends who like an action movie whose actor is Tom Cruise. (Use the information from dbpedia
for the movie and actor Tom Cruise)
5. Find all people living in a specific location, e.g., Amsterdam, that can be reached from a user by at most
3 steps friend relationship.
6. Show all the friends of yours who are living in Europe. (Use the information from dbpedia. For example,
Amsterdam is a city in Europe, London is a city in Europe)
7. Find top-10 suggested friends for a user: those people that are currently not your friend but are friends
of many of your friends. (Get all friends of your friends, order them by the number of people in your
friends list connecting to them)
8. Return all users that have not joined a specific group but more than 5 friends of theirs joined the group.
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
9. Interactive & Analytic Queries
Q9 - Q14: Posts or Tweets
9. Show 10 latest posts/tweets from your friends or the friends of them. (Order by posting time)
10. Show active posts/tweets - the 10 latest commented posts/tweets from your friends. (Order
by the timestamp of the last comments on the posts)
11. Return top-10 most interesting posts from your friends - First order by the number of
“like” (or in Twitter, the number of “re-tweet” posts) on the posts from your friends, then
order by the number of comments.
12. Return all posts about an event (e.g., Unrest in Tunisia) in 10 recent days. (Based on the
hash tags if they are available. In case no tag appears in the post, check whether the content
of the post contains the terms in the searching event.)
13. Show all posts about a specific location, e.g., Egypt, in 10 recent days. (Use the information
from DBpedia for identify the location of the post. For example, Cairo is the capital of Egypt,
Tahrir square is in Cairo.)
14. Find number of inactive user: all users activated for at least 30 days but did not have any
post or all users that do not have any more post for 60 days.
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
10. Interactive & Analytic Queries
Q15 - Q17: Hash tags
15.Show all photos posted by my friends that I was tagged.
16.Find top-10 friends or all friends of friends of you that have
common interest. (Based on the similarity between the tags in
your posts and tags in their posts)
17.What are the current hottest events/problems? (Get the hash tags
from posts and order by the number of their appearances in 10
recent days)
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
11. Interactive & Analytic Queries
Q18 - Q19: other information
18.Which area is the most active area? (Order by the total number of
posts in each location in 5 recent days.)
19.Return the top-10 locations that have the fastest growth in the
number of users. (Count the number of people joined before 10
days and those joined during the 10 recent days, and then,
compute the developing rate.)
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
12. SPARQL/Update Queries
1. Update user profile
2. Posts/Tweets:
2.1. Add a posts (Popularity: high)
2.2. Remove a posts (Popularity: low)
2.3. Add tags for your friends
2.4. Add/Remove a comment
3. Friends
3.1. Add a friend (Popularity: high)
3.2. Remove a friend (Popularity: low)
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011
13. SPARQL/Update Queries
4. Group, Event
4.1. Join/Leave a group/event
4.2. Add/Delete post in the group/event
5. Photos
5.1. Add/Delete a photo
5.2. Add/Remove tags in the photo
5.3. Add/Remove a comment
5.4. Remove tags to me from all the pictures of my friends
Feb 28 - Mar 4, 2011 ProjectMeeting@Innsbruck
Wednesday, March 02, 2011