SlideShare una empresa de Scribd logo
1 de 5
2126
PAPER
Scalable and Adaptive Graph Querying with MapReduce
Song-Hyon KIM†
, Kyong-Ha LEE††
, Inchul SONG†††
, Hyebong CHOI††††
, and Yoon-Joon LEE††††
SUMMARY We address the problem of processing multiple graph
queries over a massive set of data graphs in this letter. As the number
of data graphs is growing rapidly, it is often hard to process graph queries
with serial algorithms in a timely manner. We propose a distributed graph
querying algorithm, which employs feature-based comparison and a filter-
and-verify scheme working on the MapReduce framework. Moreover, we
devise an efficient scheme that adaptively tunes a proper feature size at
runtime by sampling data graphs. With various experiments, we show that
the proposed method outperforms conventional algorithms in terms of both
scalability and efficiency.
key words: graph query, parallel processing, MapReduce, adaptive tuning
1. Introduction
Graphs are widely used to model complex structures such
as protein-to-protein interactions and Web data [1]. Graph
data are growing rapidly and massively, e.g., the PubChem
project now serves more than 40 million graphs that repre-
sent chemical compounds. The overall size of the graphs
now hits tens of terabytes [2]. A graph pattern match-
ing query problem is to find all data graphs that contain a
given query graph from a set of data graphs. A graph pat-
tern matching query requires to process expensive subgraph
isomorphism test, which is known to be NP-complete [3].
Thus, many studies have been reported in the literature to cut
down the cost of graph query processing [4–7]. Although
the algorithms are prominent, they are not scalable enough
to handle a massive set of data graphs like PubChem since
they all are serial, designed to work on a single machine.
Meanwhile, it is important to set a proper feature size
in filter-and-verify scheme since it dominantly determines
the overall performance of query processing (refer to Sec-
tion 6). However, It is hard to find a proper feature size
before query processing since it depends on the statistics
about both query graphs and data graphs. If the feature size
is too large, the overhead of extracting and storing features
increases sharply. On the other hand, many false positives
are produced at filtering stage if the feature size is too small,
involving many obsolete subgraph isomorphism tests.
To tackle the problems, we propose MR-Graph, a dis-
tributed graph query processing algorithm based on MapRe-
†
††
†††
††††
Manuscript received April 8, 2013.
The author is with Korea Air Force Academy, Korea.
The author is with KISTI, Korea.
The author is with SAIT, Samsung Electronics, Korea.
The authors are with CS Dept., KAIST, Korea.
duce [8]. MR-Graph tunes the feature size at runtime. It
determines the most appropriate feature size by sampling
some portions of data graphs. MR-Graph processes multi-
ple graph queries over a massive set of a data graph simulta-
neously . In addition, MR-Graph does not require any index
building and the modification of MapReduce internals. MR-
Graph is sophisticatedly tailored to work in harmony with
MapReduces programming model.
2. Related Work
A major issue in the graph pattern matching query problem
is to reduce the number of pairwise subgraph isomorphism
testing. Most approaches to the issue are based on filter-
and-verify scheme with index techniques [4–7]. They ex-
tract substructures called features from data graphs and then
filter irrelevant graphs before actual subgraph isomorphism
testing. It greatly reduces the number of actual subgraph
isomorphism tests. However, all the algorithms are serial so
that they are not scalable enough to process a massive set
of data graphs. MapReduce is a programming model that
enables to process a massive volume of data in parallel with
a large cluster of commodity PCs [8]. It provides a simple
but powerful method that enables automatic parallelization
and distribution of large-scale computation. Recently, Luo
et al. [9] proposed a index-based method for parallel graph
query processing on MapReduce. They build an edge index
over data graphs, which maps each distinct edge to the list of
data graphs that include the edge. However, their approach
has a restricted assumption that every edge in a query graph
must be uniquely identified by labels of its endpoints and
itself. Thus it requires a subsequent verification phase to
process general graph pattern matching queries. There are
some studies about processing graphs with MapReduce. Lin
et al. [10] introduce several design patterns suitable for iter-
ative graph algorithms such as PageRank. Pregel [11] and
Pegasus [12] are systems devised for processing large-scale
graphs. However, they are different from our work in that
they focused on processing a single large graph such as Web
data or social network.
3. Preliminaries
We denote a graph by a tuple g = (V, E, L, l), where V is
a set of vertices and E is a set of undirected edges such that
E ⊆ V × V. L denotes a set of labels for both vertices and
Preprint submitted to the IEICE Transactions on Information and Systems
KIM et al.: SCALABLE AND ADAPTIVE GRAPH QUERYING WITH MAPREDUCE
2127
1 KAIST Database Lab. ©shkim
1
0
Example
A
BB
q2
A
BA B
B
AB
C
g4
A
CB
g1q1
A
D
g2
B
A
C
B
A
B
C
g3
b
b
b
2
C(q1) = {1,2}, C(q2) = {1,3,4}
A(q1) = {1} , A(q2) = {1}
A
BB
q2
A
BA B
B
AB
C
g4
A
CB
g1q1
A
D
g2
B
A
C
B
A
B
C
g3
2
a
a
a
0
1
2
0
1
2
0
1
3 20
1
3
4
2
0
1
3
2
0
1 3
b c b
b
b c b
b
b
b b
b
c c
c
c
d
d
0
1 2
(a) Query graph set
1 KAIST Database Lab. ©shkim
b1
0
xample
A
BB
q2
A
BA B
B
AB
C
g4
A
CB
g1q1
A
D
g2
B
A
C
B
A
A
C
g3
b
b
b
2
C(q1) = {1,2}, C(q2) = {1,3,4}
A(q1) = {1,2}, A(q2) = {1}
A
BB
q2
A
BA B
B
AB
C
g4
A
CB
g1q1
A
D
g2
B
A
C
B
A
B
C
g3
2
a
a
a
0
1
2
0
1
2
0
1
3 20
1
3
4
2
0
1
3
2
0
1 3
b c b b c b
b
b
b b
b
c c
c
c
d
d
0
1 2
c
(b) Data graph set
Fig. 1 An example of query and data graph set
edges. l denotes a labeling function: (V ∪ E) → L. We also
define the size of a graph g, denoted by |g|, as the number of
edges in g, i.e., |g| = |E(g)|.
Definition 1. Given two graphs g = (V, E, L, l) and g =
(V , E , L , l ), g is subgraph isomorphic to g , denoted
by g ⊆s
g , if and only if there exists an injective function
φ : V → V such that
1. ∀u ∈ V, φ(u) ∈ V and l(u) = l (φ(u)),
2. ∀(u, v) ∈ E, (φ(u), φ(v)) ∈ E and l(u, v) = l (φ(u), φ(v)).
Problem statement Let D = {g1, g2, · · · , gn} be a set of data
graphs and Q = {q1, q2, · · · , qm} be a set of query graphs,
where m << n. For each query q ∈ Q, we find all graphs Dq
such that Dq = {gi | q ⊆s
gi, gi ∈ D}.
Figure 1 shows an example which will be used through-
out this paper. In the example, answers for two query graphs
are Dq1
= {g1, g2, g4} and Dq2
= {g1}, where Dqi
means a set
of answers for qi.
Graph representation In MapReduce, all input data
are represented by key-value pairs. To parallelize graph
query processing with MapReduce, we first encode each
graph as a single key-value pair. A graph g is encoded
as a pair of (ID, CODE) where ID is a unique identifier for
each graph and CODE is a serialized version of g. CODE
is composed of the number of vertices, the number of
edges, a list of vertex labels, and a list of edges. In the
format, each edge is represented by a triple that consists
of from-vertex ID, to-vertex ID, and edge label.For exam-
ple, the q2 query graph in Fig. 1(a) is encoded as (q2,
‘3,3,A,B,B,0,1,b,0,2,b,1,2,b’).
Features in graph query processing A feature is a loosely
defined term for a substructure of a graph. There are various
kinds of features such as path, subtree, and subgraph [4–
7]. In this letter, we use a subgraph as feature because it
exhibits the best filtering power among them. However, any
other substructures can be used in MR-Graph without loss
of generality.
4. Basic algorithm
A MapReduce-based algorithm works in two stages: map
and reduce. In MapReduce, input data are first partitioned
into equal-sized blocks and each block is assigned to a map-
per at map stage and mapped outputs are then stored into
local disks. Finally, the outputs are shuffled and pulled
to reducers for further processing at reduce stage. Algo-
rithm 1 is the basic algorithm of MR-Graph. Each mapper
processes graph queries over data graphs and each reducer
groups mapped outputs by query graph’s ID.
Algorithm 1: Basic MR-Graph
input: D: a set of data graphs, Q: a set of query
graphs
output: (IDq, a list of IDg) pairs
1 init () // initializing a mapper
2
−→
F[] = ExtractFeatures (Q)
3 map (String IDg, String CODEg) // mapper
4
−→
Fg = ExtractFeatures (CODEg)
5 foreach (IDq, CODEq) ∈ Q do
6 if CheckContainment (
−→
F[IDq],
−→
Fg) then
7 if TestSubIso (CODEq, CODEg) then
8 emit (IDq, IDg)
9 reduce (String IDq, Iterator
−−→
IDg) // reducer
10 answer = Enlist (
−−→
IDg)
11 emit (IDq, answer)
Algorithm 2: AdaptiveTune
input: kinit: initial feature size, Vk: trial values for k,
α: number of trials for each feature size in Vk
output: k: the best appropriate feature size
1. The initial feature size kinit is set into DCS.
2. If a mapper is a trial mapper, choose one value from Vk.
Otherwise, use the feature size stored in DCS.
3. The trial mapper updates DCS.
4. DCS notifies its update to all running mappers.
5. For each running mapper,
a. Estimate the remaining time left till the mapper ends,
denoted Tcurrent, and the remaining time expected if the
mapper uses the best feature size, denoted Tbest.
b. If Tbest < Tcurrent, change the feature size with one in
DCS.
c. Process the remaining data graphs in the input split.
At map stage, we extract features from a query graph
and a data graph by different rules (line 2 and 4). graph
query processing is performed in filter-and-verify scheme.
For each data graph, MR-Graph tests whether the data graph
contains all the features that compose a certain query graph
in a query set Q (line 6). If it is true, the algorithm pairs data
graph with the query graph and tests subgraph isomorphism
(line 7). At reduce stage, result data graphs are grouped by a
query graph ID and emitted. MR-Graph greatly reduces the
number of graphs which must be tested subgraph isomor-
phism based on this filter-and-verify scheme.
When it comes to feature extraction, we extract
min(k, |q|)-sized features for a query graph q where |q| is
the size of query graph q. Note that we do not use features
whose sizes are less than min(k, |q|). We explain a rationale
behind this. Suppose that a query graph q has two features
2128
IEICE TRANS. INF. & SYST., VOL.XX, NO.XX SEPTEMBER 2013
fx and fy, where fx ⊆s
fy and the size of feature fy, de-
noted |fy|, is set to min(k, |q|). It is straightforward that a
data graph g contains fx if it contains fy. Thus, testing if
g contains fx is unnecessary when g is confirmed to con-
tain fy. Meanwhile, a data graph is processed with a set
of various sized query graphs. For a data graph g, we ex-
tract features whose sizes are up to min(k, |g|) each. In other
words, the size of feature f for a data graph g must satisfy
the condition, 1 ≤ | f| ≤ min(k, |g|). The ExtractFeatures
function in Algorithm 1 extracts features from both query
graphs and data graphs with the feature size following the
rules described thus far.
5. AdaptiveTune: tuning the feature size at runtime
Note that the feature size k in Algorithm 1 is determined by
a user before runtime. However, feature size decision is not
easy since the best feature size varies according to the char-
acteristics of query graphs and data graphs. To address the
issue, we devise an adaptive feature size tuning technique.
AdaptiveTune examines a set of trial values for k, denoted
by Vk, in the first few mappers, called trial mappers, then
determines a proper feature size. Note that the number of
trial mappers is set to be α times of |Vk|. The larger α, the
more data graphs are tested to determine k. This tuning pro-
cess requires information about other mappers to determine
the best appropriate feature size and the remaining time of
each mapper. We format the information as a 4-tuple rela-
tion (k, T, N, O), where each tuple item represents ‘a feature
size’, ‘elapsed time’, ‘the number of data graphs processed’,
and ‘time overhead for feature extraction (lines 1–2 in Algo-
rithm 1)’, respectively. For example, (3, 10, 5, 4) means that
some mapper processes 5 data graphs in 10 seconds by us-
ing feature size 3 and the time taken to extract features from
query graphs is 4 seconds We exploit a distributed coordina-
tion system (DCS) to share the information among mappers.
Algorithm 2 describes AdaptiveTune. Note that this al-
gorithm begins with feature size kinit, initially set into DCS
by a user. After third step in Algorithm 2, DCS stores 4-
tuple relation (k, TB, NB, OB) that means some trial mapper
finishes its input split that contains NB data graphs in TB
time, while it takes OB time to extract features from a query
graph set. The Tcurrent and Tbest in the algorithm are esti-
mated as follows:
Tcurrent =
TC
NC
× (NB − NC) (1)
Tbest =
TB
NB
× (NB − NC) + OB (2)
where TC is the elapsed time when with a given k and TB
is the elapsed time when k is the best. NC is the number
of data graphs processed with the current feature size. We
estimate the remaining input size of a mapper by NB − NC.
Note that since NB is computed when a mapper ends, it also
represents the total number of data graphs in the mapper’s
input split. As AdaptiveTune runs α times with a trial value,
NB is expected to converge to the average number of data
3 KAIST Database Lab. ©shkim
k1SLOT 1
Timet1
k2
k2SLOT 2
k3SLOT 3 k3
kinitSLOT 4 k2
kinit , …
t0
DCS
k2 , …t1
t0
Task T1
T2
T3
T4
Fig. 2 An example of AdaptiveTune
graphs in an input split. We also estimate execution time per
data graph by TC
NC
or TB
NB
.
Example 1. Figure 2 shows an example of AdaptiveTune
that works with 4 mappers. Suppose that Vk = {k1, k2, k3}
and α = 1, then the number of trial mappers is 3. Tasks
T1, T2, and T3 are trial mappers and they use one of Vk,
while T4 uses the feature size stored in DCS, kinit. After fin-
ishing its input split, T2 updates DCS with ‘(k2, ...)’ at time
t1. DCS notifies this tuple to the other mappers. Then the
mappers determine whether they keep or change their fea-
ture size, and process remaining input splits with the feature
size. finally, T1 and T4 change the current feature size to k2.
6. Experiments
Experimental setup We performed our experiments with
Hadoop version 1.0.3 working on a 9-node cluster. Each
data node is equipped with an Intel i5-2500 3.3GHz pro-
cessor, 8GB memory, and a 7200RPM HDD, running on
CentOS 6.2. All nodes are connected via a Gigabit switch-
ing hub. The number of task slots is 32, which indicates
how many tasks can run simultaneoulsy on Hadoop. The to-
tal number of map and reduce tasks were set to be 8 times
(256) and 2 (64) times as many as the number of task slots.
We tested two datasets introduced in [7], i.e., PubChem for a
real-world dataset and synthetic.E30.D3.L50 for a synthetic
dataset. In addition, we varied the number of data graphs or
query graphs. Table 1 presents the statistics of our datasets
where the |V|, |E|, and |L| values are in average. We set α
to 1 and Vk to {1, 2, . . . , 6}. The statistics of input splits is
shown in Table 2.
We compared our method with a non-filtering ap-
proach, denoted Basic, and Luo et al.’s approach, denoted
Luo [9]. All algorithms are written in C++ and the verifica-
tion phase was attached into Luo for fair comparison (details
are deferred to Sec. 2). In our system, we used VF2 [13] for
subgraph isomorphism test, gSpan [14] for feature extrac-
tion, and ZooKeeper [15] for distributed coordination.
Result analysis We first tested how the feature size k affects
the overall performance of graph query processing. Fig. 3
shows that the execution time, depicted by bars, is the fastest
when k is set to 4 whereas the average number of candidates,
depicted by lines, is the least when k is 6 for the real dataset.
The reason is that the cost of feature extraction and feature
comparison offsets the benefit of fine filtering effect. The
best value for k on the synthetic dataset is 2 as shown in
Fig. 4 Note that we omitted some results which exceeded
KIM et al.: SCALABLE AND ADAPTIVE GRAPH QUERYING WITH MAPREDUCE
2129
Table 1 Statistics of graph datasets
Dataset D/Q # |V| |E| |L| size
Real
D 10M 23.98 25.76 78 2.31 GB
Q 1000 14.26 14.00 11 131 KB
Synthetic
D 10M 14.23 30.53 47 2.62 GB
Q 1000 10.25 12.00 47 120 KB
Table 2 Statistics of input splits
Real Synthetic
# of input splits 256 256
avg. # of data graphs per input split 39062.5 39062.5
Standard deviation 6416.4 48.6
0
2500
5000
7500
10000
12500
15000
17500
1 2 3 4 5 6
0
0.5
1
1.5
2
2.5
3
3.5
Executiontime(sec)
#ofcandidates(×106)
k
Fig. 3 Execution time (real)
0
5000
10000
15000
20000
25000
1 2 3 4 5 6
0
0.05
0.1
0.15
0.2
0.25
Executiontime(sec)
#ofcandidates(×106)
k
>24hrs
Fig. 4 Execution time (synthetic)
0.1
1
10
100
1000
1 2 3 4 5 6
Featuresetsize(GB)
k
synthetic
real
Fig. 5 The size of feature sets
0.1
1
10
100
1000
1 2 3 4 5 6
Avg.extractiontime(ms)
k
synthetic
real
Fig. 6 Feature extraction time
0
500
1000
1500
2000
2500
3000
Preset(3) Best(4) AT
Executiontime(sec)
k
2690
1764 1804
Fig. 7 AdaptiveTune (real)
0
200
400
600
800
1000
1200
1400
Preset(3) Best(2) AT
Executiontime(sec)
k
1209
851
931
Fig. 8 AdaptiveTune (synthetic)
0.1
1
10
100
1000
1 10 100 1000
Executiontime/|Q|(sec)
|Q|
Luo
Basic
MR-Graph
Fig. 9 Comparison of methods (real, 1M)
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
2 4 8
Executiontime(sec)
# Nodes
Basic
MR-Graph
>24hrs
>24hrs
Fig. 10 Scalability (real)
24 hours in our figures. The results also confirmed that the
best k can be different according to the characteristics of
given data and query graph sets. Fig. 5 and 6 show how a
feature size impacted on the overall performance. As the
feature size increases, both the total number of feature sets
and the average elased time for feature extraction for a data
graph increase exponentially. This causes not only space
overhead to store the features, but a lot of I/Os during query
processing on Hadoop.
We now evaluate AdaptiveTune. Fig. 7 and 8 show the
experimental results. In the figures, the ‘Preset’ value is ini-
tially set whereas the ‘Best’ value is the best feature size
computed after finishing all graph queries. In both cases,
AdaptiveTune(AT) shows good execution time even com-
pared to the best feature size. With the real dataset, Adap-
tiveTune was successful to reduce the execution time 52.5%.
We also compared our algorithm with conventional al-
gorithms. We used 1 million real dataset rather than 10 mil-
lion graphs in Table 1, since Luo exhibited limitation in scal-
ability. As shown in Fig. 9, our algorithm is always better
than the others in terms of execution time per query except
in the case of processing a single graph query. As the num-
ber of query graphs becomes larger, MR-Graph outperforms
two other algorithms more. We also noticed that the perfor-
mance of Luo was even poorer than our Basic algorithm.
The reason is that Luo suffered from frequent accesses to
the storage of index files and a skewness problem; value
distribution in the indexes is highly skewed [16]. Finally,
we evaluate scalability of MR-Graph. As shown in Fig. 10,
MR-Graph is scalable to the size of a cluster. Specifically,
MR-Graph records 25.3 times faster than Basic in execution
time when we use 8 data nodes.
7. Conclusions
We proposed an adaptive and parallel algorithm, MR-
Graph, which efficiently processes multiple query graphs
over a massive set of data graphs based on MapReduce.
AdaptiveTune improves query processing as it adaptively
tunes a feature size at runtime, keeping the system in a bal-
ance between the effectiveness of graph filtering and the cost
of feature extraction. Our experiments show that Adaptive-
Tune performs with very little overhead in execution time.
Acknowledgement This reseach was partly supported by
NRF grant funded by Korea Government (No. 2011-
0016282). It was also partly funded by the MSIP(Ministry
of Science, ICT & Future Planning), Korea in the ICT R&D
program 2013.
References
[1] C. Aggarwal and H. Wang, Managing and mining graph data,
Springer, 2010.
[2] “The pubchem project.” http://pubchem.ncbi.nlm.nih.gov.
[3] M.R. Garey and D.S. Johnson, Computers and Intractability; A
Guide to the Theory of NP-Completeness, W. H. Freeman & Co.,
New York, NY, USA, 1990.
[4] X. Yan, P. Yu, and J. Han, “Graph indexing: A frequent structure-
based approach,” Proc. SIGMOD, pp.335–346, 2004.
[5] P. Zhao et al., “Graph indexing: tree + delta >= graph,” Proc.
VLDB, pp.938–949, VLDB Endowment, 2007.
[6] J. Cheng et al., “Fg-index: towards verification-free query process-
ing on graph databases,” Proc. SIGMOD, pp.857–872, 2007.
[7] W. Han et al., “iGraph: a framework for comparisons of disk-based
graph indexing techniques,” PVLDB, vol.3, no.1, pp.449–459, 2010.
[8] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing
on large clusters,” Commun. ACM, vol.51, no.1, pp.107–113, 2008.
[9] Y. Luo et al., “Towards efficient subgraph search in cloud computing
environments,” DASFAA Workshops, pp.2–13, Springer, 2011.
[10] J. Lin and M. Schatz, “Design patterns for efficient graph algorithms
in mapreduce,” Proc. MLG, pp.78–85, ACM, 2010.
2130
IEICE TRANS. INF. & SYST., VOL.XX, NO.XX SEPTEMBER 2013
[11] G. Malewicz et al., “Pregel: a system for large-scale graph process-
ing,” Proc. SIGMOD, pp.135–146, ACM, 2010.
[12] U. Kang et al., “Pegasus: mining peta-scale graphs,” Knowl. Inf.
Syst., vol.27, no.2, pp.303–325, 2011.
[13] L. Cordella et al., “A (sub) graph isomorphism algorithm for match-
ing large graphs,” IEEE Trans. Pattern Anal. Mach. Intell., vol.26,
no.10, pp.1367–1372, 2004.
[14] X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Min-
ing,” Proc. ICDM, pp.721–724, 2002.
[15] “Apache zookeeper.” http://zookeeper.apache.org.
[16] Y. Kwon et al., “SkewTune: mitigating skew in mapreduce applica-
tions,” Proc. SIGMOD, pp.25–36, ACM, 2012.

Más contenido relacionado

La actualidad más candente

A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...newmooxx
 
Relaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataRelaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataAlessandro Adamou
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in aNexgen Technology
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsSubhajit Sahu
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...Xiao Qin
 
A Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesA Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesIJMER
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2maikroeder
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...Big Data Spain
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsBita Kazemi
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsNECST Lab @ Politecnico di Milano
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 

La actualidad más candente (17)

A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...
 
Relaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataRelaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked data
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in a
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
 
A Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesA Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph Databases
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2
 
NUPHBP13483
NUPHBP13483NUPHBP13483
NUPHBP13483
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
 
Isam2_v1_2
Isam2_v1_2Isam2_v1_2
Isam2_v1_2
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 

Destacado

Seurattavuus ja raportointi myynnin johtamisessa
Seurattavuus ja raportointi myynnin johtamisessaSeurattavuus ja raportointi myynnin johtamisessa
Seurattavuus ja raportointi myynnin johtamisessaValueFrame Oy
 
Myyntisuppilo on rikki - uusi myynti datan avulla
Myyntisuppilo on rikki - uusi myynti datan avullaMyyntisuppilo on rikki - uusi myynti datan avulla
Myyntisuppilo on rikki - uusi myynti datan avullaTom Nickels
 
Tulevaisuuden minä rakentuu verkossa
Tulevaisuuden minä rakentuu verkossaTulevaisuuden minä rakentuu verkossa
Tulevaisuuden minä rakentuu verkossaJarno M. Koponen
 
Infecciones gram negativos y terapia antibiotica inicial
Infecciones gram negativos y terapia antibiotica inicialInfecciones gram negativos y terapia antibiotica inicial
Infecciones gram negativos y terapia antibiotica inicialAlex Castañeda-Sabogal
 
Happy birthday presentation
Happy birthday presentationHappy birthday presentation
Happy birthday presentationivanabrabcova
 
Ms Power Point
Ms Power PointMs Power Point
Ms Power Pointtuuli25
 
Omdømmelistene og sammendrag av funn - Karrierebarometeret 2014
Omdømmelistene og sammendrag av funn - Karrierebarometeret 2014Omdømmelistene og sammendrag av funn - Karrierebarometeret 2014
Omdømmelistene og sammendrag av funn - Karrierebarometeret 2014Arve Kvalsvik
 
Presentation from Maxime Flament at parallel session on FOTs
Presentation from Maxime Flament at parallel session on  FOTsPresentation from Maxime Flament at parallel session on  FOTs
Presentation from Maxime Flament at parallel session on FOTseuroFOT
 
Ms Presentation Student, Parent
Ms Presentation   Student, ParentMs Presentation   Student, Parent
Ms Presentation Student, ParentMausom
 
Aalto-yliopiston ylioppilaskunnan aikaansaannoksia 2010
Aalto-yliopiston ylioppilaskunnan aikaansaannoksia 2010Aalto-yliopiston ylioppilaskunnan aikaansaannoksia 2010
Aalto-yliopiston ylioppilaskunnan aikaansaannoksia 2010valtsi
 
Intro to Orthomolecular Medicine
Intro to Orthomolecular MedicineIntro to Orthomolecular Medicine
Intro to Orthomolecular MedicineYafa Sakkejha
 
Czech Christmas - eTwinning project "Look at book"
Czech Christmas - eTwinning project "Look at book"Czech Christmas - eTwinning project "Look at book"
Czech Christmas - eTwinning project "Look at book"ivanabrabcova
 
Luottoriskit 16042010
Luottoriskit 16042010Luottoriskit 16042010
Luottoriskit 16042010ValueFrame Oy
 
10 Taldea Euskaraz
10 Taldea Euskaraz10 Taldea Euskaraz
10 Taldea Euskarazeraik3
 

Destacado (20)

Seurattavuus ja raportointi myynnin johtamisessa
Seurattavuus ja raportointi myynnin johtamisessaSeurattavuus ja raportointi myynnin johtamisessa
Seurattavuus ja raportointi myynnin johtamisessa
 
Myyntisuppilo on rikki - uusi myynti datan avulla
Myyntisuppilo on rikki - uusi myynti datan avullaMyyntisuppilo on rikki - uusi myynti datan avulla
Myyntisuppilo on rikki - uusi myynti datan avulla
 
Tulevaisuuden minä rakentuu verkossa
Tulevaisuuden minä rakentuu verkossaTulevaisuuden minä rakentuu verkossa
Tulevaisuuden minä rakentuu verkossa
 
Ruins of Detroit
Ruins of DetroitRuins of Detroit
Ruins of Detroit
 
Infecciones gram negativos y terapia antibiotica inicial
Infecciones gram negativos y terapia antibiotica inicialInfecciones gram negativos y terapia antibiotica inicial
Infecciones gram negativos y terapia antibiotica inicial
 
Happy birthday presentation
Happy birthday presentationHappy birthday presentation
Happy birthday presentation
 
Adobe Indesign CS 5.5 (Corso Base)
Adobe Indesign CS 5.5 (Corso Base)Adobe Indesign CS 5.5 (Corso Base)
Adobe Indesign CS 5.5 (Corso Base)
 
Ms Power Point
Ms Power PointMs Power Point
Ms Power Point
 
Omdømmelistene og sammendrag av funn - Karrierebarometeret 2014
Omdømmelistene og sammendrag av funn - Karrierebarometeret 2014Omdømmelistene og sammendrag av funn - Karrierebarometeret 2014
Omdømmelistene og sammendrag av funn - Karrierebarometeret 2014
 
Presentation from Maxime Flament at parallel session on FOTs
Presentation from Maxime Flament at parallel session on  FOTsPresentation from Maxime Flament at parallel session on  FOTs
Presentation from Maxime Flament at parallel session on FOTs
 
Ms Presentation Student, Parent
Ms Presentation   Student, ParentMs Presentation   Student, Parent
Ms Presentation Student, Parent
 
Aalto-yliopiston ylioppilaskunnan aikaansaannoksia 2010
Aalto-yliopiston ylioppilaskunnan aikaansaannoksia 2010Aalto-yliopiston ylioppilaskunnan aikaansaannoksia 2010
Aalto-yliopiston ylioppilaskunnan aikaansaannoksia 2010
 
Intro to Orthomolecular Medicine
Intro to Orthomolecular MedicineIntro to Orthomolecular Medicine
Intro to Orthomolecular Medicine
 
Czech Christmas - eTwinning project "Look at book"
Czech Christmas - eTwinning project "Look at book"Czech Christmas - eTwinning project "Look at book"
Czech Christmas - eTwinning project "Look at book"
 
Presentacio nova línia de suport a les formacions musicals estables
Presentacio nova línia de suport a les formacions musicals establesPresentacio nova línia de suport a les formacions musicals estables
Presentacio nova línia de suport a les formacions musicals estables
 
5 etwinning
5 etwinning5 etwinning
5 etwinning
 
Presentation1
Presentation1Presentation1
Presentation1
 
Luottoriskit 16042010
Luottoriskit 16042010Luottoriskit 16042010
Luottoriskit 16042010
 
Q Set
Q SetQ Set
Q Set
 
10 Taldea Euskaraz
10 Taldea Euskaraz10 Taldea Euskaraz
10 Taldea Euskaraz
 

Similar a Scalable and Adaptive Graph Querying with MapReduce

Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Design and implementation of Parallel Prefix Adders using FPGAs
Design and implementation of Parallel Prefix Adders using FPGAsDesign and implementation of Parallel Prefix Adders using FPGAs
Design and implementation of Parallel Prefix Adders using FPGAsIOSR Journals
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in anexgentech15
 
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...Nexgen Technology
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...Kyong-Ha Lee
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphsChirag Jain
 
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...IJERA Editor
 
GRAPH PARTITIONING FOR IMAGE SEGMENTATION USING ISOPERIMETRIC APPROACH: A REVIEW
GRAPH PARTITIONING FOR IMAGE SEGMENTATION USING ISOPERIMETRIC APPROACH: A REVIEWGRAPH PARTITIONING FOR IMAGE SEGMENTATION USING ISOPERIMETRIC APPROACH: A REVIEW
GRAPH PARTITIONING FOR IMAGE SEGMENTATION USING ISOPERIMETRIC APPROACH: A REVIEWDrm Kapoor
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...Subhajit Sahu
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAijun Zhang
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...IRJET Journal
 
Graph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphGraph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphAndrew Yongjoon Kong
 
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...IJMER
 
Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...
Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...
Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...IRJET Journal
 
Wms Performance Tests Map Server Vs Geo Server
Wms Performance Tests Map Server Vs Geo ServerWms Performance Tests Map Server Vs Geo Server
Wms Performance Tests Map Server Vs Geo ServerDonnyV
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORSCOMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORSIRJET Journal
 
Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterStockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterACSG Section Montréal
 

Similar a Scalable and Adaptive Graph Querying with MapReduce (20)

Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
I017425763
I017425763I017425763
I017425763
 
Design and implementation of Parallel Prefix Adders using FPGAs
Design and implementation of Parallel Prefix Adders using FPGAsDesign and implementation of Parallel Prefix Adders using FPGAs
Design and implementation of Parallel Prefix Adders using FPGAs
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in a
 
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphs
 
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
 
GRAPH PARTITIONING FOR IMAGE SEGMENTATION USING ISOPERIMETRIC APPROACH: A REVIEW
GRAPH PARTITIONING FOR IMAGE SEGMENTATION USING ISOPERIMETRIC APPROACH: A REVIEWGRAPH PARTITIONING FOR IMAGE SEGMENTATION USING ISOPERIMETRIC APPROACH: A REVIEW
GRAPH PARTITIONING FOR IMAGE SEGMENTATION USING ISOPERIMETRIC APPROACH: A REVIEW
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform Designs
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
 
Graph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphGraph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraph
 
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
 
Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...
Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...
Performance Analysis of Iterative Closest Point (ICP) Algorithm using Modifie...
 
Wms Performance Tests Map Server Vs Geo Server
Wms Performance Tests Map Server Vs Geo ServerWms Performance Tests Map Server Vs Geo Server
Wms Performance Tests Map Server Vs Geo Server
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
 
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORSCOMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
 
Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterStockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
 

Más de Kyong-Ha Lee

좋은 논문 찾기
좋은 논문 찾기좋은 논문 찾기
좋은 논문 찾기Kyong-Ha Lee
 
A poster version of HadoopXML
A poster version of HadoopXMLA poster version of HadoopXML
A poster version of HadoopXMLKyong-Ha Lee
 
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...Kyong-Ha Lee
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureKyong-Ha Lee
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingKyong-Ha Lee
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingKyong-Ha Lee
 

Más de Kyong-Ha Lee (9)

좋은 논문 찾기
좋은 논문 찾기좋은 논문 찾기
좋은 논문 찾기
 
A poster version of HadoopXML
A poster version of HadoopXMLA poster version of HadoopXML
A poster version of HadoopXML
 
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing Architecture
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Último (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Scalable and Adaptive Graph Querying with MapReduce

  • 1. 2126 PAPER Scalable and Adaptive Graph Querying with MapReduce Song-Hyon KIM† , Kyong-Ha LEE†† , Inchul SONG††† , Hyebong CHOI†††† , and Yoon-Joon LEE†††† SUMMARY We address the problem of processing multiple graph queries over a massive set of data graphs in this letter. As the number of data graphs is growing rapidly, it is often hard to process graph queries with serial algorithms in a timely manner. We propose a distributed graph querying algorithm, which employs feature-based comparison and a filter- and-verify scheme working on the MapReduce framework. Moreover, we devise an efficient scheme that adaptively tunes a proper feature size at runtime by sampling data graphs. With various experiments, we show that the proposed method outperforms conventional algorithms in terms of both scalability and efficiency. key words: graph query, parallel processing, MapReduce, adaptive tuning 1. Introduction Graphs are widely used to model complex structures such as protein-to-protein interactions and Web data [1]. Graph data are growing rapidly and massively, e.g., the PubChem project now serves more than 40 million graphs that repre- sent chemical compounds. The overall size of the graphs now hits tens of terabytes [2]. A graph pattern match- ing query problem is to find all data graphs that contain a given query graph from a set of data graphs. A graph pat- tern matching query requires to process expensive subgraph isomorphism test, which is known to be NP-complete [3]. Thus, many studies have been reported in the literature to cut down the cost of graph query processing [4–7]. Although the algorithms are prominent, they are not scalable enough to handle a massive set of data graphs like PubChem since they all are serial, designed to work on a single machine. Meanwhile, it is important to set a proper feature size in filter-and-verify scheme since it dominantly determines the overall performance of query processing (refer to Sec- tion 6). However, It is hard to find a proper feature size before query processing since it depends on the statistics about both query graphs and data graphs. If the feature size is too large, the overhead of extracting and storing features increases sharply. On the other hand, many false positives are produced at filtering stage if the feature size is too small, involving many obsolete subgraph isomorphism tests. To tackle the problems, we propose MR-Graph, a dis- tributed graph query processing algorithm based on MapRe- † †† ††† †††† Manuscript received April 8, 2013. The author is with Korea Air Force Academy, Korea. The author is with KISTI, Korea. The author is with SAIT, Samsung Electronics, Korea. The authors are with CS Dept., KAIST, Korea. duce [8]. MR-Graph tunes the feature size at runtime. It determines the most appropriate feature size by sampling some portions of data graphs. MR-Graph processes multi- ple graph queries over a massive set of a data graph simulta- neously . In addition, MR-Graph does not require any index building and the modification of MapReduce internals. MR- Graph is sophisticatedly tailored to work in harmony with MapReduces programming model. 2. Related Work A major issue in the graph pattern matching query problem is to reduce the number of pairwise subgraph isomorphism testing. Most approaches to the issue are based on filter- and-verify scheme with index techniques [4–7]. They ex- tract substructures called features from data graphs and then filter irrelevant graphs before actual subgraph isomorphism testing. It greatly reduces the number of actual subgraph isomorphism tests. However, all the algorithms are serial so that they are not scalable enough to process a massive set of data graphs. MapReduce is a programming model that enables to process a massive volume of data in parallel with a large cluster of commodity PCs [8]. It provides a simple but powerful method that enables automatic parallelization and distribution of large-scale computation. Recently, Luo et al. [9] proposed a index-based method for parallel graph query processing on MapReduce. They build an edge index over data graphs, which maps each distinct edge to the list of data graphs that include the edge. However, their approach has a restricted assumption that every edge in a query graph must be uniquely identified by labels of its endpoints and itself. Thus it requires a subsequent verification phase to process general graph pattern matching queries. There are some studies about processing graphs with MapReduce. Lin et al. [10] introduce several design patterns suitable for iter- ative graph algorithms such as PageRank. Pregel [11] and Pegasus [12] are systems devised for processing large-scale graphs. However, they are different from our work in that they focused on processing a single large graph such as Web data or social network. 3. Preliminaries We denote a graph by a tuple g = (V, E, L, l), where V is a set of vertices and E is a set of undirected edges such that E ⊆ V × V. L denotes a set of labels for both vertices and Preprint submitted to the IEICE Transactions on Information and Systems
  • 2. KIM et al.: SCALABLE AND ADAPTIVE GRAPH QUERYING WITH MAPREDUCE 2127 1 KAIST Database Lab. ©shkim 1 0 Example A BB q2 A BA B B AB C g4 A CB g1q1 A D g2 B A C B A B C g3 b b b 2 C(q1) = {1,2}, C(q2) = {1,3,4} A(q1) = {1} , A(q2) = {1} A BB q2 A BA B B AB C g4 A CB g1q1 A D g2 B A C B A B C g3 2 a a a 0 1 2 0 1 2 0 1 3 20 1 3 4 2 0 1 3 2 0 1 3 b c b b b c b b b b b b c c c c d d 0 1 2 (a) Query graph set 1 KAIST Database Lab. ©shkim b1 0 xample A BB q2 A BA B B AB C g4 A CB g1q1 A D g2 B A C B A A C g3 b b b 2 C(q1) = {1,2}, C(q2) = {1,3,4} A(q1) = {1,2}, A(q2) = {1} A BB q2 A BA B B AB C g4 A CB g1q1 A D g2 B A C B A B C g3 2 a a a 0 1 2 0 1 2 0 1 3 20 1 3 4 2 0 1 3 2 0 1 3 b c b b c b b b b b b c c c c d d 0 1 2 c (b) Data graph set Fig. 1 An example of query and data graph set edges. l denotes a labeling function: (V ∪ E) → L. We also define the size of a graph g, denoted by |g|, as the number of edges in g, i.e., |g| = |E(g)|. Definition 1. Given two graphs g = (V, E, L, l) and g = (V , E , L , l ), g is subgraph isomorphic to g , denoted by g ⊆s g , if and only if there exists an injective function φ : V → V such that 1. ∀u ∈ V, φ(u) ∈ V and l(u) = l (φ(u)), 2. ∀(u, v) ∈ E, (φ(u), φ(v)) ∈ E and l(u, v) = l (φ(u), φ(v)). Problem statement Let D = {g1, g2, · · · , gn} be a set of data graphs and Q = {q1, q2, · · · , qm} be a set of query graphs, where m << n. For each query q ∈ Q, we find all graphs Dq such that Dq = {gi | q ⊆s gi, gi ∈ D}. Figure 1 shows an example which will be used through- out this paper. In the example, answers for two query graphs are Dq1 = {g1, g2, g4} and Dq2 = {g1}, where Dqi means a set of answers for qi. Graph representation In MapReduce, all input data are represented by key-value pairs. To parallelize graph query processing with MapReduce, we first encode each graph as a single key-value pair. A graph g is encoded as a pair of (ID, CODE) where ID is a unique identifier for each graph and CODE is a serialized version of g. CODE is composed of the number of vertices, the number of edges, a list of vertex labels, and a list of edges. In the format, each edge is represented by a triple that consists of from-vertex ID, to-vertex ID, and edge label.For exam- ple, the q2 query graph in Fig. 1(a) is encoded as (q2, ‘3,3,A,B,B,0,1,b,0,2,b,1,2,b’). Features in graph query processing A feature is a loosely defined term for a substructure of a graph. There are various kinds of features such as path, subtree, and subgraph [4– 7]. In this letter, we use a subgraph as feature because it exhibits the best filtering power among them. However, any other substructures can be used in MR-Graph without loss of generality. 4. Basic algorithm A MapReduce-based algorithm works in two stages: map and reduce. In MapReduce, input data are first partitioned into equal-sized blocks and each block is assigned to a map- per at map stage and mapped outputs are then stored into local disks. Finally, the outputs are shuffled and pulled to reducers for further processing at reduce stage. Algo- rithm 1 is the basic algorithm of MR-Graph. Each mapper processes graph queries over data graphs and each reducer groups mapped outputs by query graph’s ID. Algorithm 1: Basic MR-Graph input: D: a set of data graphs, Q: a set of query graphs output: (IDq, a list of IDg) pairs 1 init () // initializing a mapper 2 −→ F[] = ExtractFeatures (Q) 3 map (String IDg, String CODEg) // mapper 4 −→ Fg = ExtractFeatures (CODEg) 5 foreach (IDq, CODEq) ∈ Q do 6 if CheckContainment ( −→ F[IDq], −→ Fg) then 7 if TestSubIso (CODEq, CODEg) then 8 emit (IDq, IDg) 9 reduce (String IDq, Iterator −−→ IDg) // reducer 10 answer = Enlist ( −−→ IDg) 11 emit (IDq, answer) Algorithm 2: AdaptiveTune input: kinit: initial feature size, Vk: trial values for k, α: number of trials for each feature size in Vk output: k: the best appropriate feature size 1. The initial feature size kinit is set into DCS. 2. If a mapper is a trial mapper, choose one value from Vk. Otherwise, use the feature size stored in DCS. 3. The trial mapper updates DCS. 4. DCS notifies its update to all running mappers. 5. For each running mapper, a. Estimate the remaining time left till the mapper ends, denoted Tcurrent, and the remaining time expected if the mapper uses the best feature size, denoted Tbest. b. If Tbest < Tcurrent, change the feature size with one in DCS. c. Process the remaining data graphs in the input split. At map stage, we extract features from a query graph and a data graph by different rules (line 2 and 4). graph query processing is performed in filter-and-verify scheme. For each data graph, MR-Graph tests whether the data graph contains all the features that compose a certain query graph in a query set Q (line 6). If it is true, the algorithm pairs data graph with the query graph and tests subgraph isomorphism (line 7). At reduce stage, result data graphs are grouped by a query graph ID and emitted. MR-Graph greatly reduces the number of graphs which must be tested subgraph isomor- phism based on this filter-and-verify scheme. When it comes to feature extraction, we extract min(k, |q|)-sized features for a query graph q where |q| is the size of query graph q. Note that we do not use features whose sizes are less than min(k, |q|). We explain a rationale behind this. Suppose that a query graph q has two features
  • 3. 2128 IEICE TRANS. INF. & SYST., VOL.XX, NO.XX SEPTEMBER 2013 fx and fy, where fx ⊆s fy and the size of feature fy, de- noted |fy|, is set to min(k, |q|). It is straightforward that a data graph g contains fx if it contains fy. Thus, testing if g contains fx is unnecessary when g is confirmed to con- tain fy. Meanwhile, a data graph is processed with a set of various sized query graphs. For a data graph g, we ex- tract features whose sizes are up to min(k, |g|) each. In other words, the size of feature f for a data graph g must satisfy the condition, 1 ≤ | f| ≤ min(k, |g|). The ExtractFeatures function in Algorithm 1 extracts features from both query graphs and data graphs with the feature size following the rules described thus far. 5. AdaptiveTune: tuning the feature size at runtime Note that the feature size k in Algorithm 1 is determined by a user before runtime. However, feature size decision is not easy since the best feature size varies according to the char- acteristics of query graphs and data graphs. To address the issue, we devise an adaptive feature size tuning technique. AdaptiveTune examines a set of trial values for k, denoted by Vk, in the first few mappers, called trial mappers, then determines a proper feature size. Note that the number of trial mappers is set to be α times of |Vk|. The larger α, the more data graphs are tested to determine k. This tuning pro- cess requires information about other mappers to determine the best appropriate feature size and the remaining time of each mapper. We format the information as a 4-tuple rela- tion (k, T, N, O), where each tuple item represents ‘a feature size’, ‘elapsed time’, ‘the number of data graphs processed’, and ‘time overhead for feature extraction (lines 1–2 in Algo- rithm 1)’, respectively. For example, (3, 10, 5, 4) means that some mapper processes 5 data graphs in 10 seconds by us- ing feature size 3 and the time taken to extract features from query graphs is 4 seconds We exploit a distributed coordina- tion system (DCS) to share the information among mappers. Algorithm 2 describes AdaptiveTune. Note that this al- gorithm begins with feature size kinit, initially set into DCS by a user. After third step in Algorithm 2, DCS stores 4- tuple relation (k, TB, NB, OB) that means some trial mapper finishes its input split that contains NB data graphs in TB time, while it takes OB time to extract features from a query graph set. The Tcurrent and Tbest in the algorithm are esti- mated as follows: Tcurrent = TC NC × (NB − NC) (1) Tbest = TB NB × (NB − NC) + OB (2) where TC is the elapsed time when with a given k and TB is the elapsed time when k is the best. NC is the number of data graphs processed with the current feature size. We estimate the remaining input size of a mapper by NB − NC. Note that since NB is computed when a mapper ends, it also represents the total number of data graphs in the mapper’s input split. As AdaptiveTune runs α times with a trial value, NB is expected to converge to the average number of data 3 KAIST Database Lab. ©shkim k1SLOT 1 Timet1 k2 k2SLOT 2 k3SLOT 3 k3 kinitSLOT 4 k2 kinit , … t0 DCS k2 , …t1 t0 Task T1 T2 T3 T4 Fig. 2 An example of AdaptiveTune graphs in an input split. We also estimate execution time per data graph by TC NC or TB NB . Example 1. Figure 2 shows an example of AdaptiveTune that works with 4 mappers. Suppose that Vk = {k1, k2, k3} and α = 1, then the number of trial mappers is 3. Tasks T1, T2, and T3 are trial mappers and they use one of Vk, while T4 uses the feature size stored in DCS, kinit. After fin- ishing its input split, T2 updates DCS with ‘(k2, ...)’ at time t1. DCS notifies this tuple to the other mappers. Then the mappers determine whether they keep or change their fea- ture size, and process remaining input splits with the feature size. finally, T1 and T4 change the current feature size to k2. 6. Experiments Experimental setup We performed our experiments with Hadoop version 1.0.3 working on a 9-node cluster. Each data node is equipped with an Intel i5-2500 3.3GHz pro- cessor, 8GB memory, and a 7200RPM HDD, running on CentOS 6.2. All nodes are connected via a Gigabit switch- ing hub. The number of task slots is 32, which indicates how many tasks can run simultaneoulsy on Hadoop. The to- tal number of map and reduce tasks were set to be 8 times (256) and 2 (64) times as many as the number of task slots. We tested two datasets introduced in [7], i.e., PubChem for a real-world dataset and synthetic.E30.D3.L50 for a synthetic dataset. In addition, we varied the number of data graphs or query graphs. Table 1 presents the statistics of our datasets where the |V|, |E|, and |L| values are in average. We set α to 1 and Vk to {1, 2, . . . , 6}. The statistics of input splits is shown in Table 2. We compared our method with a non-filtering ap- proach, denoted Basic, and Luo et al.’s approach, denoted Luo [9]. All algorithms are written in C++ and the verifica- tion phase was attached into Luo for fair comparison (details are deferred to Sec. 2). In our system, we used VF2 [13] for subgraph isomorphism test, gSpan [14] for feature extrac- tion, and ZooKeeper [15] for distributed coordination. Result analysis We first tested how the feature size k affects the overall performance of graph query processing. Fig. 3 shows that the execution time, depicted by bars, is the fastest when k is set to 4 whereas the average number of candidates, depicted by lines, is the least when k is 6 for the real dataset. The reason is that the cost of feature extraction and feature comparison offsets the benefit of fine filtering effect. The best value for k on the synthetic dataset is 2 as shown in Fig. 4 Note that we omitted some results which exceeded
  • 4. KIM et al.: SCALABLE AND ADAPTIVE GRAPH QUERYING WITH MAPREDUCE 2129 Table 1 Statistics of graph datasets Dataset D/Q # |V| |E| |L| size Real D 10M 23.98 25.76 78 2.31 GB Q 1000 14.26 14.00 11 131 KB Synthetic D 10M 14.23 30.53 47 2.62 GB Q 1000 10.25 12.00 47 120 KB Table 2 Statistics of input splits Real Synthetic # of input splits 256 256 avg. # of data graphs per input split 39062.5 39062.5 Standard deviation 6416.4 48.6 0 2500 5000 7500 10000 12500 15000 17500 1 2 3 4 5 6 0 0.5 1 1.5 2 2.5 3 3.5 Executiontime(sec) #ofcandidates(×106) k Fig. 3 Execution time (real) 0 5000 10000 15000 20000 25000 1 2 3 4 5 6 0 0.05 0.1 0.15 0.2 0.25 Executiontime(sec) #ofcandidates(×106) k >24hrs Fig. 4 Execution time (synthetic) 0.1 1 10 100 1000 1 2 3 4 5 6 Featuresetsize(GB) k synthetic real Fig. 5 The size of feature sets 0.1 1 10 100 1000 1 2 3 4 5 6 Avg.extractiontime(ms) k synthetic real Fig. 6 Feature extraction time 0 500 1000 1500 2000 2500 3000 Preset(3) Best(4) AT Executiontime(sec) k 2690 1764 1804 Fig. 7 AdaptiveTune (real) 0 200 400 600 800 1000 1200 1400 Preset(3) Best(2) AT Executiontime(sec) k 1209 851 931 Fig. 8 AdaptiveTune (synthetic) 0.1 1 10 100 1000 1 10 100 1000 Executiontime/|Q|(sec) |Q| Luo Basic MR-Graph Fig. 9 Comparison of methods (real, 1M) 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 2 4 8 Executiontime(sec) # Nodes Basic MR-Graph >24hrs >24hrs Fig. 10 Scalability (real) 24 hours in our figures. The results also confirmed that the best k can be different according to the characteristics of given data and query graph sets. Fig. 5 and 6 show how a feature size impacted on the overall performance. As the feature size increases, both the total number of feature sets and the average elased time for feature extraction for a data graph increase exponentially. This causes not only space overhead to store the features, but a lot of I/Os during query processing on Hadoop. We now evaluate AdaptiveTune. Fig. 7 and 8 show the experimental results. In the figures, the ‘Preset’ value is ini- tially set whereas the ‘Best’ value is the best feature size computed after finishing all graph queries. In both cases, AdaptiveTune(AT) shows good execution time even com- pared to the best feature size. With the real dataset, Adap- tiveTune was successful to reduce the execution time 52.5%. We also compared our algorithm with conventional al- gorithms. We used 1 million real dataset rather than 10 mil- lion graphs in Table 1, since Luo exhibited limitation in scal- ability. As shown in Fig. 9, our algorithm is always better than the others in terms of execution time per query except in the case of processing a single graph query. As the num- ber of query graphs becomes larger, MR-Graph outperforms two other algorithms more. We also noticed that the perfor- mance of Luo was even poorer than our Basic algorithm. The reason is that Luo suffered from frequent accesses to the storage of index files and a skewness problem; value distribution in the indexes is highly skewed [16]. Finally, we evaluate scalability of MR-Graph. As shown in Fig. 10, MR-Graph is scalable to the size of a cluster. Specifically, MR-Graph records 25.3 times faster than Basic in execution time when we use 8 data nodes. 7. Conclusions We proposed an adaptive and parallel algorithm, MR- Graph, which efficiently processes multiple query graphs over a massive set of data graphs based on MapReduce. AdaptiveTune improves query processing as it adaptively tunes a feature size at runtime, keeping the system in a bal- ance between the effectiveness of graph filtering and the cost of feature extraction. Our experiments show that Adaptive- Tune performs with very little overhead in execution time. Acknowledgement This reseach was partly supported by NRF grant funded by Korea Government (No. 2011- 0016282). It was also partly funded by the MSIP(Ministry of Science, ICT & Future Planning), Korea in the ICT R&D program 2013. References [1] C. Aggarwal and H. Wang, Managing and mining graph data, Springer, 2010. [2] “The pubchem project.” http://pubchem.ncbi.nlm.nih.gov. [3] M.R. Garey and D.S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness, W. H. Freeman & Co., New York, NY, USA, 1990. [4] X. Yan, P. Yu, and J. Han, “Graph indexing: A frequent structure- based approach,” Proc. SIGMOD, pp.335–346, 2004. [5] P. Zhao et al., “Graph indexing: tree + delta >= graph,” Proc. VLDB, pp.938–949, VLDB Endowment, 2007. [6] J. Cheng et al., “Fg-index: towards verification-free query process- ing on graph databases,” Proc. SIGMOD, pp.857–872, 2007. [7] W. Han et al., “iGraph: a framework for comparisons of disk-based graph indexing techniques,” PVLDB, vol.3, no.1, pp.449–459, 2010. [8] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” Commun. ACM, vol.51, no.1, pp.107–113, 2008. [9] Y. Luo et al., “Towards efficient subgraph search in cloud computing environments,” DASFAA Workshops, pp.2–13, Springer, 2011. [10] J. Lin and M. Schatz, “Design patterns for efficient graph algorithms in mapreduce,” Proc. MLG, pp.78–85, ACM, 2010.
  • 5. 2130 IEICE TRANS. INF. & SYST., VOL.XX, NO.XX SEPTEMBER 2013 [11] G. Malewicz et al., “Pregel: a system for large-scale graph process- ing,” Proc. SIGMOD, pp.135–146, ACM, 2010. [12] U. Kang et al., “Pegasus: mining peta-scale graphs,” Knowl. Inf. Syst., vol.27, no.2, pp.303–325, 2011. [13] L. Cordella et al., “A (sub) graph isomorphism algorithm for match- ing large graphs,” IEEE Trans. Pattern Anal. Mach. Intell., vol.26, no.10, pp.1367–1372, 2004. [14] X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Min- ing,” Proc. ICDM, pp.721–724, 2002. [15] “Apache zookeeper.” http://zookeeper.apache.org. [16] Y. Kwon et al., “SkewTune: mitigating skew in mapreduce applica- tions,” Proc. SIGMOD, pp.25–36, ACM, 2012.