Planted Clique Research Paper

Understanding the complexity of the planted
clique problem
Jose Andres Valdes
Department of Computer Science and Engineering
California Louis Stokes Alliance for Minority Participation
(CAMP) in Science, Engineering and Mathematics
(NSF-LSAMP)
University of California, San Diego
August 28, 2015
Faculty Mentor: Professor Shachar Lovett
Department of Computer Science and Engineering

Introduction:
In the field of graph theory, a simple graph G(V,E) is defined as a structure with a set V of
vertices or nodes and a set E = {( 𝑢, 𝑣)| 𝑢 ∈ V, 𝑣 ∈ V, 𝑢 ≠ 𝑣} of edges that denotes which vertices
are connected, represented by unordered pairs of vertices. A graph G can contain subgraphs
called cliques. A clique is a set S ⊆ V where each vertex u ∈ S is connected with every other
vertex v ∈ S, creating a complete subgraph. Finding the clique with maximum cardinality,
namely the maximum clique, in a random graph model G(n, ½), where G contains n vertices and
each possible edge between vertices is included with probability ½, has been an open problem
for a long time.
Jerrum [Jerrum 1992] and Kucera [Kucera 1995] introduced the planted clique model G(n, ½, k)
as a variant of the maximum clique problem. First, create the random graph G(n, ½). Then, let K
be a “planted” clique formed by picking randomly a k number of vertices and forcing them to
become a clique by adding edges between them. It has been possible to find the clique K with
high probability in polynomial time when k = O(√n). However, for values of k = o(√n), there
has been no polynomial time algorithm that solves this problem. The focus of this research is to
understand why k = √n has become a natural boundary by comparing different approaches that
have led to that result and inferring about the relationship between those approaches and the
assumption that a graph G has a planted clique.
Different source of interest for this problem come from real life application of algorithmic
problems, especially in random graphs as they behave in a similar way to real time scenarios.
This problem can take different scenarios n sociology like finding large subgroups of people in
social networks, in marketing with any study involving clustering of objects or in biology
sciences as showed in the study of the discovery of patterns in DNA sequences [PS 2000].
Cryptography use the hardness of this problem and its variants as a source of ideas for a model in
crypto systems [ABW 2010]
In a random graph G(n, ½), it has been proved that the maximum clique is of size (2 + o(1)) log n
(All logarithms in this paper are base 2) with probability approaching to 1 as n →∞. However,
polynomial time algorithms have been able to find only cliques of size (1 +o(1)) log n. Karp
[Karp 1976] and Jerrum [Jerrum 1992] tried multiple natural algorithms for finding cliques of
size (1 + 𝜀)log 𝑛 for a fixed ε > 0 with no positive result, leading to the conjecture that there is
no polynomial algorithm that can find such clique.
Switching to the planted clique model as an easier variant of the problem, in a graph G(n, ½, k),
Kucera [Kucera 1995] observed that if k ≥ 𝑐√ 𝑛 log 𝑛 for some appropriate constant, the vertices
containing the largest number of edges conform the planted clique; therefore, an algorithm can
efficiently retrieving the planted clique of that graph. Alon, Krivelevich, Sudakov [AKS 1998]
were the ones who established the current lowest boundary of k by designing a polynomial time
algorithm that can find the planted clique on values of k = O(√ 𝑛) using the properties of
eigenvalues and eigenvectors of the adjacency matrix of a graph. In later papers, Feige and Ron
[Feige and Ron 2010] and Dekel, Gurel-Gurevich, Peres [DGP 2014] developed algorithms that
find smaller values of k and run faster than the algorithm presented in [AKS 1998]. Nevertheless,
no polynomial time algorithm has been found for values of k = o(√ 𝑛).

Methodology:
As a theoretical research, most of the procedures on this research will rely on the analysis of
Kucera’s and the Low Degree Removal (LDR) algorithms. Kucera’s algorithm was selected
because this algorithm sets the lower bound on how large must be the planted clique (k ≥
c√ 𝑛 log 𝑛) in order to find it quickly and with a low running time ( O(n log n) ). LDR algorithm
is a good representation of the current best algorithms in finding planted cliques with high
probability when k = O(√ 𝑛).It sets a better lower bound on the planted clique than Kucera’s
algorithm with a small asymptotic increase in its running time ( O(n2.) ). With the use of
programming software, we can simulate different random graphs with planted cliques and
implement the algorithms so that we can correlate the mathematical notions behind the problem
with the experimental results. The following list of tools is tentative and it can increase as the
needed.
Mathematical Tools:
 Probabilistic and Statistical Methods.
o Since the study field of this research is in random graphs, the use of those tools
are indispensable in order to understand how the properties of the graph behave,
calculate expectations of different properties in different distributions such as
normal and binomial ones, and use of different inequalities such as Markov and
Chernoff in order to prove or disprove the high probability of the results.
 Combinatorics.
o Since the number of vertices in a graph is finite, and cliques are a way of forming
subgraphs, combinatorics can help us to answer different questions relating to the
construction and analysis of different subgraphs that can be considerate. This has
been a field deeply involved with probability and graph theory
 Linear Algebra
o Because graphs can be represented by matrices called adjacency matrix of n x n,
the study of the properties and manipulation of matrix and vectors such as the use
of eigenvalues and eigenvectors used in the study [AKS 1998] can be exploited in
order to reveal new properties of random graphs and/or develop new ways of
approaching
 Set Theory.
o The composition of a graph requires knowledge of set theory, so all their concepts
and proved properties are necessary in order to understand objects presented by
different papers and to construct objects that can be manipulated to prove
properties and conjectures.
 Mathematical Proofs.
o In order to present findings on conjectures and properties on this research,
knowledge in mathematical proofs such as contradiction, induction, constructive
and nonconstructive proofs, will be an indispensable tool.
Similar Problems:
 Maximum Independent Set Problem:
o An independent set is a set of vertices in which not of them are connected to each
other. The maximum independent set is the set with the largest cardinality.
Currently this is an NP hard optimization problem. However, this problem can

bring some insight to the planted clique problem as it is problem trying to achieve
the opposite result.
 Graph Partitioning Problems
o In the paper [Kucera 1995], graph partitioning was the main topic of the paper,
and because of this angle, the author was able to find the value of k ≥ 𝑐√ 𝑛 log 𝑛
that set the first boundary. Problems like the min-cut partition can be solved in
polynomial time. However, other problems like the balance cut partition are NP
hard.
Computer Assistance Proof:
 With the use of C++ programming software, multiple simulations of a graph for larger
values can be created in order to prove or reinforce concepts previously intuited or
explained by other research papers.
Results on this research can offer new angles on how to improve the boundary for the planted
clique, provide a new perspective for the maximum clique problem for random graphs, and
understand which approaches or algorithms applied to the planted clique problem can also be
applied to the maximum clique problem and vice versa.
Experiment:
With the use of GCC compiler and C++ programming language, a program that creates an
Erdos-Renyi random graph and plants a clique into that graph was developed. The program
stores a list of which vertices were selected for the planted clique with the sole purpose of
corroborating the results of the algorithms against this list.
Kucera’s algorithm can find planted cliques with high probability (Probability that tends to 1 as
the number of vertices goes to infinity) when k = Ω(√ 𝑛 log 𝑛). This algorithm does the
following:
Input: Graph G with planted clique
Output: List of possible vertices that composes the planted clique
1. Sort V from highest to lowest based on the degree number of each of its vertices, which is
the number of vertices that are connected to a particular vertex.
2. Return the first k vertices as the candidates for the planted clique
The running time of Kucera’s algorithm is O(n log n) as the optimal running time of any sorting
process is O(n log n).
The LDR has a more complex process of finding the clique, but with high probability, this
algorithm can find planted cliques when k = O(√ 𝑛). The process is the following described in
[Feige and Ron 2010]:
Input: Graph G with planted clique
Output: List of possible vertices that composes the planted clique

1. Set r = 0 and G0 – G
2. If Gr is a clique, stop and go to step 5
3. Else, remove from Gr the vertex with the lowest degree (breaking ties in favor of vertices
with lower index) and update the degrees of the neighbors of the removed vertex
4. Increment r by one and return to step 2
5. Rename the vertices that are not in Gr as vr …v1. Set t = r and Kt to be the set of vertices
in Gr
6. If vt is connected to all vertices of Kt, thenKt-1 = Kt ∪vt. Else, Kt-1 = Kt
7. Decrement t by one
8. If t = 0 stop and return K0. Else, go to step 6
The running of the LDR algorithm is O(n2). That is because every time we are removing a
vertex, we have to update its neighbors. Also, the formation of K0 has a running time of O(n2) as
we have to check every vertex in Kt against every vertex not in Kt.
In order to measure the success probability of Kucera’s and LDR algorithms and to observe the
behavior of these algorithms in different situations, we will run the algorithms with different
values of n (The number of vertices in G), and c (The leading constant of the value of k, the
planted clique size) 1000 times. The different values of n to be used in this simulation are 1000,
2500, 5000, 7500, and 10000.
In addition, the average number of wrong vertices returned by the algorithms is going to be
calculated for the values of n = 1000, and 10000 in order to see by how many vertices these
algorithms failed in returning the planted clique.
Finally, to see how the presence of a planted clique influence the performance of those
algorithms, we will run those algorithms in a random graph without the presence of a planted
clique when n = 5000. As already stated at the introduction, the maximum clique in a random
graph is (2 + o(1)) log n ,and as Karp [Karp 1976] and Jerrum [Jerrum 1992] already showed,
there are cliques of size (1+o(1) log n) in those graphs. Since LDR algorithm will return a clique
independently of a planted clique, we will measure the size of the clique. For Kucera’s
algorithm, we will present two outcomes, one that takes the first 2 log n vertices and the other
one that will take only the first log n
Results:
The following are the results obtained after measuring Kucera’s algorithm:

Some observations of Kucera’s algorithm can be made by looking these graphs:
 When c ≤ 1, for any value of n, the success probability goes to 0%
 When c ≥2.5, for any value, the success probability goes to 100%
 There is large increase in the success probability from c = 1.75 to c = 2
 For the values of c between 1 and 2.5, the probability of success decreases as n increase
its value.
 The behavior (graph shape) of the Kucera’s algorithm doesn’t have any major change
between the different values of n.
For n = 1000, when c =1, the average number of wrong vertices returned by the algorithm was
14.202, the 14% of the planted clique. For n = 10000, when c = 1, the average was 49.28, the
13% of the planted clique. Those percentages decrease as c increases, with the success
probability increasing.
After being tested in random graphs of n = 5000, Kucera’s algorithm wasn’t able to return
neither a clique of size 2 log n, nor a clique of size log n.
Here are the results of the LDR algorithm:

We can see the following aspects of the behavior of the LDR algorithm:
 The success probability of the different values of c increases as n increases.
 When c ≥ 1.1, the probability of success is > 90%. Up to 100% when n ≥5000
 As n increases, we can see a major increase from c = 0.75 to c = 0.85
 The behavior of LDR algorithm is dynamic; there are notable changes for every different
value of n.
When n = 1000 and c = 0.75, the average wrong vertices returned by this algorithm was 22
vertices, that is, 91% percent of the planted clique. When n = 10000 and c = 0.75, the average
was 63vertices, that is, 84% of the planted clique. This average decreases as c increases and the
success probability increases.
When testing this algorithm in random graphs of n = 5000, this algorithm was able to find
cliques of size (1+o(1)) log n.
Discussion:
The interpretation of the behavior of these algorithms can give us some conjectures of the
complexity of the planted clique problem and why we can find only cliques of size O(√ 𝑛).
Kucera’s algorithm relies heavily on the degrees that the vertices acquired by the formation of
the random graph with the addition of the planted clique. That reliance on the vertices’ degrees
allows it to find planted cliques whose sizes are large enough to just only consider the vertices
with the higher number of connections.. Because this algorithm follows the flow of the random
graph formation, as we can see in the graph, its behavior remains the same even after changing

the values of n. This might also be the reason of why values of c between 1 and 2.5 decreases as
n increases, and c =1 and c = 2.5 remain as the end points of the success probability. As n
increase its value, the intermediate constants need to increase in order to reach the point of a high
success probability. Further simulations needs to be done for n > 10000 in order to see if this still
holds.
LDR algorithm is more deterministic than Kucera’s algorithm as we can observe its steps.
Instead of relying on vertices of highest degree, its focus is to eliminate the vertices with low
degree. However, it considers the relations between vertices as fundamental information in order
to make the next step, which is why it can find cliques of smaller size than Kucera’s algorithm.
As we can see in the first 4 steps of the LDR algorithm, the purpose of this algorithm is to reduce
the graph until we have a good fragment of the planted clique. By looking the LDR performance
with different n values, we can see that it is easier to find that particular fragment as n keeps
increasing. That can be the reason of why we can observe an increasing gap between the values
of c = 0.75 and c = 0.85 as their difference increase as n increase. Further simulations needs to be
done for n >10000 to see if this behavior still holds.
Even when both Kucera’s and LDR algorithms are deterministic, we can easily observe that their
outcome is heavily influenced by the formation of the random graph, the number of vertices in
the random graph, and the size of the planted clique in that graph. Kucera’s algorithm’s
behavior’s is constant regardless of the size of n. Even when the success probability was 0, for
values of c = 0.75, it still return 86% of the vertices correct. This reveals that even with a null
success probability, the influence the planted clique can still be found in the behavior of the
algorithm. However, as the results stated, when we don’t have a planted clique, Kucera’s
algorithm is useless, showing that just relying only on the degrees of the vertices is not enough to
even find the cliques of (1+o(1)) log n size.
As LDR algorithm’s behavior change for different values of n, we can realize that its tolerance to
failure is small than Kucera’s algorithm. When c = 0.75, there was the number of correct vertices
was less than 20%. This shows that contrary to Kucera’s algorithm, the LDR algorithm is way
less influenced by the presence of a planted clique. In addition, the fact of being able to find
cliques of (1 + o(1)) log n in a random graph only shows that this algorithm doesn’t need the
presence of a planted clique. However, as we can observe, this algorithm is unable to find the
cliques of size (2+o(1)), the maximum clique in a random graph .
The question remains of why we cannot find planted cliques with k = o(√ 𝑛). As already stated,
the presence of the planted clique plays an important role in those algorithms, so when k starts to
decrease, the influence of the planted clique begins to disappear and the graph’s behavior acts
more like a random graph. Current algorithms focus on the vertices of either highest degrees as
possible candidates for the planted clique, or vertices with the lowest degrees as not possible
candidates. While focusing on the vertices of highest degree is totally intuitive for finding the
maximum clique, we already seen in Kucera’s algorithm performance and in LDR algorithm
performance in random graphs without planted clique that that intuition might be only partially
correct. As Feige and Ron [Feige and Ron 2010] explained informally, the LDR algorithm
cannot find planted cliques of size o(√ 𝑛) because the extra degree received by the formation of
the clique is not enough to maintain the planted clique alive during the first 4 steps of the

algorithm. This only implies that we need to find ways to consider not only vertices with the
highest degree, but also vertices with a median number of degrees.
Further research needs to be done understanding the complexity of the Erdos-Renyi random
graph. As we can see the in the performance graphs, the number of vertices in the random graph
is not the only factor for the success of the algorithms. The results on the success probability of
those were not whole values of either 0 or 1, implying that there was a formation in the random
graph which the algorithms returned the correct value.
In addition, as we can see in Kucera’s and LDR algorithms, there is tradeoff between their
running time and the size of the planted clique that they can find. This brings the question that if
we were able to develop an algorithm that can find planted cliques of size O(√ 𝑛) with a running
time of O(n), would we be able to use that fact in order to modify or develop a new algorithm
that will be able to find planted cliques of size k = o(√ 𝑛) in polynomial time? In order to have
such algorithm with a running time of O(n), we will need to use properties of the random graph
that goes beyond current properties of the random graph being used by current planted clique
searcher algorithms.

Bibliography:
[AKS 1998] N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden
clique in a random graph. Random Structures and Algorithms,
13:457-466, 1998
[ABW 2010] Benny Applebaum, Boaz Barak, and Avi Wigderson. Public-key
cryptography from different assumptions. Proceedings of the forty-
second ACM symposium on Theory of computing, pages 171-180.
ACM, 2010
[DGP 2014] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden
cliques in linear time with high probability. Combinatorics,
Probability and Computing, 23:29-49, 1 2014
[Feige and Ron 2010] U. Feige and D. Ron. Finding hidden cliques in linear time. In
AOFA, 2010
[Jerrum 1992] M. Jerrum. Large cliques elude the metropolis process. Random
Structures and Algorithms 3:347-359, 1992
[Karp 1976] R. M. Karp. Probabilistic analysis of some combinatorial problem
search problems. Algorithms and Complexity: New Directions and
Recent Results, pages 1-19, 1976
[Kucera 1995] L. Kucera. Expected complexity in graph partitioning problems.
Discrete Applied Math. 193-212, 1995.
[PS 2000] Pavel A Pevner, Sing-Hoi Sze, et al. Combinatorial approaches to
finding subtle signals in DNA sequences. ISMB, volume 8, pages
269-278, 2000

Planted Clique Research Paper

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Planted Clique Research Paper

Similar a Planted Clique Research Paper (20)

Planted Clique Research Paper