SlideShare una empresa de Scribd logo
1 de 14
Understanding the complexity of the planted
clique problem
Jose Andres Valdes
Department of Computer Science and Engineering
California Louis Stokes Alliance for Minority Participation
(CAMP) in Science, Engineering and Mathematics
(NSF-LSAMP)
University of California, San Diego
August 28, 2015
Faculty Mentor: Professor Shachar Lovett
Department of Computer Science and Engineering
Introduction:
In the field of graph theory, a simple graph G(V,E) is defined as a structure with a set V of
vertices or nodes and a set E = {( 𝑢, 𝑣)| 𝑢 ∈ V, 𝑣 ∈ V, 𝑢 ≠ 𝑣} of edges that denotes which vertices
are connected, represented by unordered pairs of vertices. A graph G can contain subgraphs
called cliques. A clique is a set S ⊆ V where each vertex u ∈ S is connected with every other
vertex v ∈ S, creating a complete subgraph. Finding the clique with maximum cardinality,
namely the maximum clique, in a random graph model G(n, ½), where G contains n vertices and
each possible edge between vertices is included with probability ½, has been an open problem
for a long time.
Jerrum [Jerrum 1992] and Kucera [Kucera 1995] introduced the planted clique model G(n, ½, k)
as a variant of the maximum clique problem. First, create the random graph G(n, ½). Then, let K
be a “planted” clique formed by picking randomly a k number of vertices and forcing them to
become a clique by adding edges between them. It has been possible to find the clique K with
high probability in polynomial time when k = O(√n). However, for values of k = o(√n), there
has been no polynomial time algorithm that solves this problem. The focus of this research is to
understand why k = √n has become a natural boundary by comparing different approaches that
have led to that result and inferring about the relationship between those approaches and the
assumption that a graph G has a planted clique.
Different source of interest for this problem come from real life application of algorithmic
problems, especially in random graphs as they behave in a similar way to real time scenarios.
This problem can take different scenarios n sociology like finding large subgroups of people in
social networks, in marketing with any study involving clustering of objects or in biology
sciences as showed in the study of the discovery of patterns in DNA sequences [PS 2000].
Cryptography use the hardness of this problem and its variants as a source of ideas for a model in
crypto systems [ABW 2010]
In a random graph G(n, ½), it has been proved that the maximum clique is of size (2 + o(1)) log n
(All logarithms in this paper are base 2) with probability approaching to 1 as n →∞. However,
polynomial time algorithms have been able to find only cliques of size (1 +o(1)) log n. Karp
[Karp 1976] and Jerrum [Jerrum 1992] tried multiple natural algorithms for finding cliques of
size (1 + 𝜀)log 𝑛 for a fixed ε > 0 with no positive result, leading to the conjecture that there is
no polynomial algorithm that can find such clique.
Switching to the planted clique model as an easier variant of the problem, in a graph G(n, ½, k),
Kucera [Kucera 1995] observed that if k ≥ 𝑐√ 𝑛 log 𝑛 for some appropriate constant, the vertices
containing the largest number of edges conform the planted clique; therefore, an algorithm can
efficiently retrieving the planted clique of that graph. Alon, Krivelevich, Sudakov [AKS 1998]
were the ones who established the current lowest boundary of k by designing a polynomial time
algorithm that can find the planted clique on values of k = O(√ 𝑛) using the properties of
eigenvalues and eigenvectors of the adjacency matrix of a graph. In later papers, Feige and Ron
[Feige and Ron 2010] and Dekel, Gurel-Gurevich, Peres [DGP 2014] developed algorithms that
find smaller values of k and run faster than the algorithm presented in [AKS 1998]. Nevertheless,
no polynomial time algorithm has been found for values of k = o(√ 𝑛).
Methodology:
As a theoretical research, most of the procedures on this research will rely on the analysis of
Kucera’s and the Low Degree Removal (LDR) algorithms. Kucera’s algorithm was selected
because this algorithm sets the lower bound on how large must be the planted clique (k ≥
c√ 𝑛 log 𝑛) in order to find it quickly and with a low running time ( O(n log n) ). LDR algorithm
is a good representation of the current best algorithms in finding planted cliques with high
probability when k = O(√ 𝑛).It sets a better lower bound on the planted clique than Kucera’s
algorithm with a small asymptotic increase in its running time ( O(n2.) ). With the use of
programming software, we can simulate different random graphs with planted cliques and
implement the algorithms so that we can correlate the mathematical notions behind the problem
with the experimental results. The following list of tools is tentative and it can increase as the
needed.
Mathematical Tools:
 Probabilistic and Statistical Methods.
o Since the study field of this research is in random graphs, the use of those tools
are indispensable in order to understand how the properties of the graph behave,
calculate expectations of different properties in different distributions such as
normal and binomial ones, and use of different inequalities such as Markov and
Chernoff in order to prove or disprove the high probability of the results.
 Combinatorics.
o Since the number of vertices in a graph is finite, and cliques are a way of forming
subgraphs, combinatorics can help us to answer different questions relating to the
construction and analysis of different subgraphs that can be considerate. This has
been a field deeply involved with probability and graph theory
 Linear Algebra
o Because graphs can be represented by matrices called adjacency matrix of n x n,
the study of the properties and manipulation of matrix and vectors such as the use
of eigenvalues and eigenvectors used in the study [AKS 1998] can be exploited in
order to reveal new properties of random graphs and/or develop new ways of
approaching
 Set Theory.
o The composition of a graph requires knowledge of set theory, so all their concepts
and proved properties are necessary in order to understand objects presented by
different papers and to construct objects that can be manipulated to prove
properties and conjectures.
 Mathematical Proofs.
o In order to present findings on conjectures and properties on this research,
knowledge in mathematical proofs such as contradiction, induction, constructive
and nonconstructive proofs, will be an indispensable tool.
Similar Problems:
 Maximum Independent Set Problem:
o An independent set is a set of vertices in which not of them are connected to each
other. The maximum independent set is the set with the largest cardinality.
Currently this is an NP hard optimization problem. However, this problem can
bring some insight to the planted clique problem as it is problem trying to achieve
the opposite result.
 Graph Partitioning Problems
o In the paper [Kucera 1995], graph partitioning was the main topic of the paper,
and because of this angle, the author was able to find the value of k ≥ 𝑐√ 𝑛 log 𝑛
that set the first boundary. Problems like the min-cut partition can be solved in
polynomial time. However, other problems like the balance cut partition are NP
hard.
Computer Assistance Proof:
 With the use of C++ programming software, multiple simulations of a graph for larger
values can be created in order to prove or reinforce concepts previously intuited or
explained by other research papers.
Results on this research can offer new angles on how to improve the boundary for the planted
clique, provide a new perspective for the maximum clique problem for random graphs, and
understand which approaches or algorithms applied to the planted clique problem can also be
applied to the maximum clique problem and vice versa.
Experiment:
With the use of GCC compiler and C++ programming language, a program that creates an
Erdos-Renyi random graph and plants a clique into that graph was developed. The program
stores a list of which vertices were selected for the planted clique with the sole purpose of
corroborating the results of the algorithms against this list.
Kucera’s algorithm can find planted cliques with high probability (Probability that tends to 1 as
the number of vertices goes to infinity) when k = Ω(√ 𝑛 log 𝑛). This algorithm does the
following:
Input: Graph G with planted clique
Output: List of possible vertices that composes the planted clique
1. Sort V from highest to lowest based on the degree number of each of its vertices, which is
the number of vertices that are connected to a particular vertex.
2. Return the first k vertices as the candidates for the planted clique
The running time of Kucera’s algorithm is O(n log n) as the optimal running time of any sorting
process is O(n log n).
The LDR has a more complex process of finding the clique, but with high probability, this
algorithm can find planted cliques when k = O(√ 𝑛). The process is the following described in
[Feige and Ron 2010]:
Input: Graph G with planted clique
Output: List of possible vertices that composes the planted clique
1. Set r = 0 and G0 – G
2. If Gr is a clique, stop and go to step 5
3. Else, remove from Gr the vertex with the lowest degree (breaking ties in favor of vertices
with lower index) and update the degrees of the neighbors of the removed vertex
4. Increment r by one and return to step 2
5. Rename the vertices that are not in Gr as vr …v1. Set t = r and Kt to be the set of vertices
in Gr
6. If vt is connected to all vertices of Kt, thenKt-1 = Kt ∪vt. Else, Kt-1 = Kt
7. Decrement t by one
8. If t = 0 stop and return K0. Else, go to step 6
The running of the LDR algorithm is O(n2). That is because every time we are removing a
vertex, we have to update its neighbors. Also, the formation of K0 has a running time of O(n2) as
we have to check every vertex in Kt against every vertex not in Kt.
In order to measure the success probability of Kucera’s and LDR algorithms and to observe the
behavior of these algorithms in different situations, we will run the algorithms with different
values of n (The number of vertices in G), and c (The leading constant of the value of k, the
planted clique size) 1000 times. The different values of n to be used in this simulation are 1000,
2500, 5000, 7500, and 10000.
In addition, the average number of wrong vertices returned by the algorithms is going to be
calculated for the values of n = 1000, and 10000 in order to see by how many vertices these
algorithms failed in returning the planted clique.
Finally, to see how the presence of a planted clique influence the performance of those
algorithms, we will run those algorithms in a random graph without the presence of a planted
clique when n = 5000. As already stated at the introduction, the maximum clique in a random
graph is (2 + o(1)) log n ,and as Karp [Karp 1976] and Jerrum [Jerrum 1992] already showed,
there are cliques of size (1+o(1) log n) in those graphs. Since LDR algorithm will return a clique
independently of a planted clique, we will measure the size of the clique. For Kucera’s
algorithm, we will present two outcomes, one that takes the first 2 log n vertices and the other
one that will take only the first log n
Results:
The following are the results obtained after measuring Kucera’s algorithm:
Some observations of Kucera’s algorithm can be made by looking these graphs:
 When c ≤ 1, for any value of n, the success probability goes to 0%
 When c ≥2.5, for any value, the success probability goes to 100%
 There is large increase in the success probability from c = 1.75 to c = 2
 For the values of c between 1 and 2.5, the probability of success decreases as n increase
its value.
 The behavior (graph shape) of the Kucera’s algorithm doesn’t have any major change
between the different values of n.
For n = 1000, when c =1, the average number of wrong vertices returned by the algorithm was
14.202, the 14% of the planted clique. For n = 10000, when c = 1, the average was 49.28, the
13% of the planted clique. Those percentages decrease as c increases, with the success
probability increasing.
After being tested in random graphs of n = 5000, Kucera’s algorithm wasn’t able to return
neither a clique of size 2 log n, nor a clique of size log n.
Here are the results of the LDR algorithm:
We can see the following aspects of the behavior of the LDR algorithm:
 The success probability of the different values of c increases as n increases.
 When c ≥ 1.1, the probability of success is > 90%. Up to 100% when n ≥5000
 As n increases, we can see a major increase from c = 0.75 to c = 0.85
 The behavior of LDR algorithm is dynamic; there are notable changes for every different
value of n.
When n = 1000 and c = 0.75, the average wrong vertices returned by this algorithm was 22
vertices, that is, 91% percent of the planted clique. When n = 10000 and c = 0.75, the average
was 63vertices, that is, 84% of the planted clique. This average decreases as c increases and the
success probability increases.
When testing this algorithm in random graphs of n = 5000, this algorithm was able to find
cliques of size (1+o(1)) log n.
Discussion:
The interpretation of the behavior of these algorithms can give us some conjectures of the
complexity of the planted clique problem and why we can find only cliques of size O(√ 𝑛).
Kucera’s algorithm relies heavily on the degrees that the vertices acquired by the formation of
the random graph with the addition of the planted clique. That reliance on the vertices’ degrees
allows it to find planted cliques whose sizes are large enough to just only consider the vertices
with the higher number of connections.. Because this algorithm follows the flow of the random
graph formation, as we can see in the graph, its behavior remains the same even after changing
the values of n. This might also be the reason of why values of c between 1 and 2.5 decreases as
n increases, and c =1 and c = 2.5 remain as the end points of the success probability. As n
increase its value, the intermediate constants need to increase in order to reach the point of a high
success probability. Further simulations needs to be done for n > 10000 in order to see if this still
holds.
LDR algorithm is more deterministic than Kucera’s algorithm as we can observe its steps.
Instead of relying on vertices of highest degree, its focus is to eliminate the vertices with low
degree. However, it considers the relations between vertices as fundamental information in order
to make the next step, which is why it can find cliques of smaller size than Kucera’s algorithm.
As we can see in the first 4 steps of the LDR algorithm, the purpose of this algorithm is to reduce
the graph until we have a good fragment of the planted clique. By looking the LDR performance
with different n values, we can see that it is easier to find that particular fragment as n keeps
increasing. That can be the reason of why we can observe an increasing gap between the values
of c = 0.75 and c = 0.85 as their difference increase as n increase. Further simulations needs to be
done for n >10000 to see if this behavior still holds.
Even when both Kucera’s and LDR algorithms are deterministic, we can easily observe that their
outcome is heavily influenced by the formation of the random graph, the number of vertices in
the random graph, and the size of the planted clique in that graph. Kucera’s algorithm’s
behavior’s is constant regardless of the size of n. Even when the success probability was 0, for
values of c = 0.75, it still return 86% of the vertices correct. This reveals that even with a null
success probability, the influence the planted clique can still be found in the behavior of the
algorithm. However, as the results stated, when we don’t have a planted clique, Kucera’s
algorithm is useless, showing that just relying only on the degrees of the vertices is not enough to
even find the cliques of (1+o(1)) log n size.
As LDR algorithm’s behavior change for different values of n, we can realize that its tolerance to
failure is small than Kucera’s algorithm. When c = 0.75, there was the number of correct vertices
was less than 20%. This shows that contrary to Kucera’s algorithm, the LDR algorithm is way
less influenced by the presence of a planted clique. In addition, the fact of being able to find
cliques of (1 + o(1)) log n in a random graph only shows that this algorithm doesn’t need the
presence of a planted clique. However, as we can observe, this algorithm is unable to find the
cliques of size (2+o(1)), the maximum clique in a random graph .
The question remains of why we cannot find planted cliques with k = o(√ 𝑛). As already stated,
the presence of the planted clique plays an important role in those algorithms, so when k starts to
decrease, the influence of the planted clique begins to disappear and the graph’s behavior acts
more like a random graph. Current algorithms focus on the vertices of either highest degrees as
possible candidates for the planted clique, or vertices with the lowest degrees as not possible
candidates. While focusing on the vertices of highest degree is totally intuitive for finding the
maximum clique, we already seen in Kucera’s algorithm performance and in LDR algorithm
performance in random graphs without planted clique that that intuition might be only partially
correct. As Feige and Ron [Feige and Ron 2010] explained informally, the LDR algorithm
cannot find planted cliques of size o(√ 𝑛) because the extra degree received by the formation of
the clique is not enough to maintain the planted clique alive during the first 4 steps of the
algorithm. This only implies that we need to find ways to consider not only vertices with the
highest degree, but also vertices with a median number of degrees.
Further research needs to be done understanding the complexity of the Erdos-Renyi random
graph. As we can see the in the performance graphs, the number of vertices in the random graph
is not the only factor for the success of the algorithms. The results on the success probability of
those were not whole values of either 0 or 1, implying that there was a formation in the random
graph which the algorithms returned the correct value.
In addition, as we can see in Kucera’s and LDR algorithms, there is tradeoff between their
running time and the size of the planted clique that they can find. This brings the question that if
we were able to develop an algorithm that can find planted cliques of size O(√ 𝑛) with a running
time of O(n), would we be able to use that fact in order to modify or develop a new algorithm
that will be able to find planted cliques of size k = o(√ 𝑛) in polynomial time? In order to have
such algorithm with a running time of O(n), we will need to use properties of the random graph
that goes beyond current properties of the random graph being used by current planted clique
searcher algorithms.
Bibliography:
[AKS 1998] N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden
clique in a random graph. Random Structures and Algorithms,
13:457-466, 1998
[ABW 2010] Benny Applebaum, Boaz Barak, and Avi Wigderson. Public-key
cryptography from different assumptions. Proceedings of the forty-
second ACM symposium on Theory of computing, pages 171-180.
ACM, 2010
[DGP 2014] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden
cliques in linear time with high probability. Combinatorics,
Probability and Computing, 23:29-49, 1 2014
[Feige and Ron 2010] U. Feige and D. Ron. Finding hidden cliques in linear time. In
AOFA, 2010
[Jerrum 1992] M. Jerrum. Large cliques elude the metropolis process. Random
Structures and Algorithms 3:347-359, 1992
[Karp 1976] R. M. Karp. Probabilistic analysis of some combinatorial problem
search problems. Algorithms and Complexity: New Directions and
Recent Results, pages 1-19, 1976
[Kucera 1995] L. Kucera. Expected complexity in graph partitioning problems.
Discrete Applied Math. 193-212, 1995.
[PS 2000] Pavel A Pevner, Sing-Hoi Sze, et al. Combinatorial approaches to
finding subtle signals in DNA sequences. ISMB, volume 8, pages
269-278, 2000

Más contenido relacionado

La actualidad más candente

A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...
Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...
Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...paperpublications3
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
 
THE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSES
THE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSESTHE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSES
THE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSESgraphhoc
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...Alexander Decker
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
Csr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCsr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCSR2011
 
Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesCommon fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesAlexander Decker
 
Testing Forest-Isomorphism in the Adjacency List Model
Testing Forest-Isomorphismin the Adjacency List ModelTesting Forest-Isomorphismin the Adjacency List Model
Testing Forest-Isomorphism in the Adjacency List Modelirrrrr
 
One modulo n gracefulness of
One modulo n gracefulness ofOne modulo n gracefulness of
One modulo n gracefulness ofgraphhoc
 
Harvard_University_-_Linear_Al
Harvard_University_-_Linear_AlHarvard_University_-_Linear_Al
Harvard_University_-_Linear_Alramiljayureta
 

La actualidad más candente (20)

A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...
Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...
Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
Lecture50
Lecture50Lecture50
Lecture50
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
THE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSES
THE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSESTHE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSES
THE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSES
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Csr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCsr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudan
 
Joco pavone
Joco pavoneJoco pavone
Joco pavone
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesCommon fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spaces
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Testing Forest-Isomorphism in the Adjacency List Model
Testing Forest-Isomorphismin the Adjacency List ModelTesting Forest-Isomorphismin the Adjacency List Model
Testing Forest-Isomorphism in the Adjacency List Model
 
One modulo n gracefulness of
One modulo n gracefulness ofOne modulo n gracefulness of
One modulo n gracefulness of
 
Harvard_University_-_Linear_Al
Harvard_University_-_Linear_AlHarvard_University_-_Linear_Al
Harvard_University_-_Linear_Al
 

Similar a Planted Clique Research Paper

A Branch And Bound Algorithm For The Maximum Clique Problem
A Branch And Bound Algorithm For The Maximum Clique ProblemA Branch And Bound Algorithm For The Maximum Clique Problem
A Branch And Bound Algorithm For The Maximum Clique ProblemSara Alvarez
 
Graph theory and life
Graph theory and lifeGraph theory and life
Graph theory and lifeMilan Joshi
 
Lego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawingsLego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawingsMathieu Dutour Sikiric
 
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRYON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRYFransiskeran
 
On algorithmic problems concerning graphs of higher degree of symmetry
On algorithmic problems concerning graphs of higher degree of symmetryOn algorithmic problems concerning graphs of higher degree of symmetry
On algorithmic problems concerning graphs of higher degree of symmetrygraphhoc
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...IJECEIAES
 
Symbolic Computation via Gröbner Basis
Symbolic Computation via Gröbner BasisSymbolic Computation via Gröbner Basis
Symbolic Computation via Gröbner BasisIJERA Editor
 
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in GraphsAlgorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in GraphsIJERA Editor
 
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in GraphsAlgorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in GraphsIJERA Editor
 
An analysis between different algorithms for the graph vertex coloring problem
An analysis between different algorithms for the graph vertex coloring problem An analysis between different algorithms for the graph vertex coloring problem
An analysis between different algorithms for the graph vertex coloring problem IJECEIAES
 
Point Placement Algorithms: An Experimental Study
Point Placement Algorithms: An Experimental StudyPoint Placement Algorithms: An Experimental Study
Point Placement Algorithms: An Experimental StudyCSCJournals
 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...ijscmcj
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-reportRyen Krusinga
 
cis98010
cis98010cis98010
cis98010perfj
 

Similar a Planted Clique Research Paper (20)

A Branch And Bound Algorithm For The Maximum Clique Problem
A Branch And Bound Algorithm For The Maximum Clique ProblemA Branch And Bound Algorithm For The Maximum Clique Problem
A Branch And Bound Algorithm For The Maximum Clique Problem
 
Graph theory and life
Graph theory and lifeGraph theory and life
Graph theory and life
 
Lego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawingsLego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawings
 
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRYON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
 
On algorithmic problems concerning graphs of higher degree of symmetry
On algorithmic problems concerning graphs of higher degree of symmetryOn algorithmic problems concerning graphs of higher degree of symmetry
On algorithmic problems concerning graphs of higher degree of symmetry
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...
 
Linear algebra havard university
Linear algebra havard universityLinear algebra havard university
Linear algebra havard university
 
Ou3425912596
Ou3425912596Ou3425912596
Ou3425912596
 
Symbolic Computation via Gröbner Basis
Symbolic Computation via Gröbner BasisSymbolic Computation via Gröbner Basis
Symbolic Computation via Gröbner Basis
 
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in GraphsAlgorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
 
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in GraphsAlgorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
Algorithmic Aspects of Vertex Geo-dominating Sets and Geonumber in Graphs
 
An analysis between different algorithms for the graph vertex coloring problem
An analysis between different algorithms for the graph vertex coloring problem An analysis between different algorithms for the graph vertex coloring problem
An analysis between different algorithms for the graph vertex coloring problem
 
Modeling the dynamics of molecular concentration during the diffusion procedure
Modeling the dynamics of molecular concentration during the  diffusion procedureModeling the dynamics of molecular concentration during the  diffusion procedure
Modeling the dynamics of molecular concentration during the diffusion procedure
 
Point Placement Algorithms: An Experimental Study
Point Placement Algorithms: An Experimental StudyPoint Placement Algorithms: An Experimental Study
Point Placement Algorithms: An Experimental Study
 
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
A PROBABILISTIC ALGORITHM OF COMPUTING THE POLYNOMIAL GREATEST COMMON DIVISOR...
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-report
 
APPLICATION OF NUMERICAL METHODS IN SMALL SIZE
APPLICATION OF NUMERICAL METHODS IN SMALL SIZEAPPLICATION OF NUMERICAL METHODS IN SMALL SIZE
APPLICATION OF NUMERICAL METHODS IN SMALL SIZE
 
Scribed lec8
Scribed lec8Scribed lec8
Scribed lec8
 
cis98010
cis98010cis98010
cis98010
 

Planted Clique Research Paper

  • 1. Understanding the complexity of the planted clique problem Jose Andres Valdes Department of Computer Science and Engineering California Louis Stokes Alliance for Minority Participation (CAMP) in Science, Engineering and Mathematics (NSF-LSAMP) University of California, San Diego August 28, 2015 Faculty Mentor: Professor Shachar Lovett Department of Computer Science and Engineering
  • 2. Introduction: In the field of graph theory, a simple graph G(V,E) is defined as a structure with a set V of vertices or nodes and a set E = {( 𝑢, 𝑣)| 𝑢 ∈ V, 𝑣 ∈ V, 𝑢 ≠ 𝑣} of edges that denotes which vertices are connected, represented by unordered pairs of vertices. A graph G can contain subgraphs called cliques. A clique is a set S ⊆ V where each vertex u ∈ S is connected with every other vertex v ∈ S, creating a complete subgraph. Finding the clique with maximum cardinality, namely the maximum clique, in a random graph model G(n, ½), where G contains n vertices and each possible edge between vertices is included with probability ½, has been an open problem for a long time. Jerrum [Jerrum 1992] and Kucera [Kucera 1995] introduced the planted clique model G(n, ½, k) as a variant of the maximum clique problem. First, create the random graph G(n, ½). Then, let K be a “planted” clique formed by picking randomly a k number of vertices and forcing them to become a clique by adding edges between them. It has been possible to find the clique K with high probability in polynomial time when k = O(√n). However, for values of k = o(√n), there has been no polynomial time algorithm that solves this problem. The focus of this research is to understand why k = √n has become a natural boundary by comparing different approaches that have led to that result and inferring about the relationship between those approaches and the assumption that a graph G has a planted clique. Different source of interest for this problem come from real life application of algorithmic problems, especially in random graphs as they behave in a similar way to real time scenarios. This problem can take different scenarios n sociology like finding large subgroups of people in social networks, in marketing with any study involving clustering of objects or in biology sciences as showed in the study of the discovery of patterns in DNA sequences [PS 2000]. Cryptography use the hardness of this problem and its variants as a source of ideas for a model in crypto systems [ABW 2010] In a random graph G(n, ½), it has been proved that the maximum clique is of size (2 + o(1)) log n (All logarithms in this paper are base 2) with probability approaching to 1 as n →∞. However, polynomial time algorithms have been able to find only cliques of size (1 +o(1)) log n. Karp [Karp 1976] and Jerrum [Jerrum 1992] tried multiple natural algorithms for finding cliques of size (1 + 𝜀)log 𝑛 for a fixed ε > 0 with no positive result, leading to the conjecture that there is no polynomial algorithm that can find such clique. Switching to the planted clique model as an easier variant of the problem, in a graph G(n, ½, k), Kucera [Kucera 1995] observed that if k ≥ 𝑐√ 𝑛 log 𝑛 for some appropriate constant, the vertices containing the largest number of edges conform the planted clique; therefore, an algorithm can efficiently retrieving the planted clique of that graph. Alon, Krivelevich, Sudakov [AKS 1998] were the ones who established the current lowest boundary of k by designing a polynomial time algorithm that can find the planted clique on values of k = O(√ 𝑛) using the properties of eigenvalues and eigenvectors of the adjacency matrix of a graph. In later papers, Feige and Ron [Feige and Ron 2010] and Dekel, Gurel-Gurevich, Peres [DGP 2014] developed algorithms that find smaller values of k and run faster than the algorithm presented in [AKS 1998]. Nevertheless, no polynomial time algorithm has been found for values of k = o(√ 𝑛).
  • 3. Methodology: As a theoretical research, most of the procedures on this research will rely on the analysis of Kucera’s and the Low Degree Removal (LDR) algorithms. Kucera’s algorithm was selected because this algorithm sets the lower bound on how large must be the planted clique (k ≥ c√ 𝑛 log 𝑛) in order to find it quickly and with a low running time ( O(n log n) ). LDR algorithm is a good representation of the current best algorithms in finding planted cliques with high probability when k = O(√ 𝑛).It sets a better lower bound on the planted clique than Kucera’s algorithm with a small asymptotic increase in its running time ( O(n2.) ). With the use of programming software, we can simulate different random graphs with planted cliques and implement the algorithms so that we can correlate the mathematical notions behind the problem with the experimental results. The following list of tools is tentative and it can increase as the needed. Mathematical Tools:  Probabilistic and Statistical Methods. o Since the study field of this research is in random graphs, the use of those tools are indispensable in order to understand how the properties of the graph behave, calculate expectations of different properties in different distributions such as normal and binomial ones, and use of different inequalities such as Markov and Chernoff in order to prove or disprove the high probability of the results.  Combinatorics. o Since the number of vertices in a graph is finite, and cliques are a way of forming subgraphs, combinatorics can help us to answer different questions relating to the construction and analysis of different subgraphs that can be considerate. This has been a field deeply involved with probability and graph theory  Linear Algebra o Because graphs can be represented by matrices called adjacency matrix of n x n, the study of the properties and manipulation of matrix and vectors such as the use of eigenvalues and eigenvectors used in the study [AKS 1998] can be exploited in order to reveal new properties of random graphs and/or develop new ways of approaching  Set Theory. o The composition of a graph requires knowledge of set theory, so all their concepts and proved properties are necessary in order to understand objects presented by different papers and to construct objects that can be manipulated to prove properties and conjectures.  Mathematical Proofs. o In order to present findings on conjectures and properties on this research, knowledge in mathematical proofs such as contradiction, induction, constructive and nonconstructive proofs, will be an indispensable tool. Similar Problems:  Maximum Independent Set Problem: o An independent set is a set of vertices in which not of them are connected to each other. The maximum independent set is the set with the largest cardinality. Currently this is an NP hard optimization problem. However, this problem can
  • 4. bring some insight to the planted clique problem as it is problem trying to achieve the opposite result.  Graph Partitioning Problems o In the paper [Kucera 1995], graph partitioning was the main topic of the paper, and because of this angle, the author was able to find the value of k ≥ 𝑐√ 𝑛 log 𝑛 that set the first boundary. Problems like the min-cut partition can be solved in polynomial time. However, other problems like the balance cut partition are NP hard. Computer Assistance Proof:  With the use of C++ programming software, multiple simulations of a graph for larger values can be created in order to prove or reinforce concepts previously intuited or explained by other research papers. Results on this research can offer new angles on how to improve the boundary for the planted clique, provide a new perspective for the maximum clique problem for random graphs, and understand which approaches or algorithms applied to the planted clique problem can also be applied to the maximum clique problem and vice versa. Experiment: With the use of GCC compiler and C++ programming language, a program that creates an Erdos-Renyi random graph and plants a clique into that graph was developed. The program stores a list of which vertices were selected for the planted clique with the sole purpose of corroborating the results of the algorithms against this list. Kucera’s algorithm can find planted cliques with high probability (Probability that tends to 1 as the number of vertices goes to infinity) when k = Ω(√ 𝑛 log 𝑛). This algorithm does the following: Input: Graph G with planted clique Output: List of possible vertices that composes the planted clique 1. Sort V from highest to lowest based on the degree number of each of its vertices, which is the number of vertices that are connected to a particular vertex. 2. Return the first k vertices as the candidates for the planted clique The running time of Kucera’s algorithm is O(n log n) as the optimal running time of any sorting process is O(n log n). The LDR has a more complex process of finding the clique, but with high probability, this algorithm can find planted cliques when k = O(√ 𝑛). The process is the following described in [Feige and Ron 2010]: Input: Graph G with planted clique Output: List of possible vertices that composes the planted clique
  • 5. 1. Set r = 0 and G0 – G 2. If Gr is a clique, stop and go to step 5 3. Else, remove from Gr the vertex with the lowest degree (breaking ties in favor of vertices with lower index) and update the degrees of the neighbors of the removed vertex 4. Increment r by one and return to step 2 5. Rename the vertices that are not in Gr as vr …v1. Set t = r and Kt to be the set of vertices in Gr 6. If vt is connected to all vertices of Kt, thenKt-1 = Kt ∪vt. Else, Kt-1 = Kt 7. Decrement t by one 8. If t = 0 stop and return K0. Else, go to step 6 The running of the LDR algorithm is O(n2). That is because every time we are removing a vertex, we have to update its neighbors. Also, the formation of K0 has a running time of O(n2) as we have to check every vertex in Kt against every vertex not in Kt. In order to measure the success probability of Kucera’s and LDR algorithms and to observe the behavior of these algorithms in different situations, we will run the algorithms with different values of n (The number of vertices in G), and c (The leading constant of the value of k, the planted clique size) 1000 times. The different values of n to be used in this simulation are 1000, 2500, 5000, 7500, and 10000. In addition, the average number of wrong vertices returned by the algorithms is going to be calculated for the values of n = 1000, and 10000 in order to see by how many vertices these algorithms failed in returning the planted clique. Finally, to see how the presence of a planted clique influence the performance of those algorithms, we will run those algorithms in a random graph without the presence of a planted clique when n = 5000. As already stated at the introduction, the maximum clique in a random graph is (2 + o(1)) log n ,and as Karp [Karp 1976] and Jerrum [Jerrum 1992] already showed, there are cliques of size (1+o(1) log n) in those graphs. Since LDR algorithm will return a clique independently of a planted clique, we will measure the size of the clique. For Kucera’s algorithm, we will present two outcomes, one that takes the first 2 log n vertices and the other one that will take only the first log n Results: The following are the results obtained after measuring Kucera’s algorithm:
  • 6.
  • 7.
  • 8. Some observations of Kucera’s algorithm can be made by looking these graphs:  When c ≤ 1, for any value of n, the success probability goes to 0%  When c ≥2.5, for any value, the success probability goes to 100%  There is large increase in the success probability from c = 1.75 to c = 2  For the values of c between 1 and 2.5, the probability of success decreases as n increase its value.  The behavior (graph shape) of the Kucera’s algorithm doesn’t have any major change between the different values of n. For n = 1000, when c =1, the average number of wrong vertices returned by the algorithm was 14.202, the 14% of the planted clique. For n = 10000, when c = 1, the average was 49.28, the 13% of the planted clique. Those percentages decrease as c increases, with the success probability increasing. After being tested in random graphs of n = 5000, Kucera’s algorithm wasn’t able to return neither a clique of size 2 log n, nor a clique of size log n. Here are the results of the LDR algorithm:
  • 9.
  • 10.
  • 11. We can see the following aspects of the behavior of the LDR algorithm:  The success probability of the different values of c increases as n increases.  When c ≥ 1.1, the probability of success is > 90%. Up to 100% when n ≥5000  As n increases, we can see a major increase from c = 0.75 to c = 0.85  The behavior of LDR algorithm is dynamic; there are notable changes for every different value of n. When n = 1000 and c = 0.75, the average wrong vertices returned by this algorithm was 22 vertices, that is, 91% percent of the planted clique. When n = 10000 and c = 0.75, the average was 63vertices, that is, 84% of the planted clique. This average decreases as c increases and the success probability increases. When testing this algorithm in random graphs of n = 5000, this algorithm was able to find cliques of size (1+o(1)) log n. Discussion: The interpretation of the behavior of these algorithms can give us some conjectures of the complexity of the planted clique problem and why we can find only cliques of size O(√ 𝑛). Kucera’s algorithm relies heavily on the degrees that the vertices acquired by the formation of the random graph with the addition of the planted clique. That reliance on the vertices’ degrees allows it to find planted cliques whose sizes are large enough to just only consider the vertices with the higher number of connections.. Because this algorithm follows the flow of the random graph formation, as we can see in the graph, its behavior remains the same even after changing
  • 12. the values of n. This might also be the reason of why values of c between 1 and 2.5 decreases as n increases, and c =1 and c = 2.5 remain as the end points of the success probability. As n increase its value, the intermediate constants need to increase in order to reach the point of a high success probability. Further simulations needs to be done for n > 10000 in order to see if this still holds. LDR algorithm is more deterministic than Kucera’s algorithm as we can observe its steps. Instead of relying on vertices of highest degree, its focus is to eliminate the vertices with low degree. However, it considers the relations between vertices as fundamental information in order to make the next step, which is why it can find cliques of smaller size than Kucera’s algorithm. As we can see in the first 4 steps of the LDR algorithm, the purpose of this algorithm is to reduce the graph until we have a good fragment of the planted clique. By looking the LDR performance with different n values, we can see that it is easier to find that particular fragment as n keeps increasing. That can be the reason of why we can observe an increasing gap between the values of c = 0.75 and c = 0.85 as their difference increase as n increase. Further simulations needs to be done for n >10000 to see if this behavior still holds. Even when both Kucera’s and LDR algorithms are deterministic, we can easily observe that their outcome is heavily influenced by the formation of the random graph, the number of vertices in the random graph, and the size of the planted clique in that graph. Kucera’s algorithm’s behavior’s is constant regardless of the size of n. Even when the success probability was 0, for values of c = 0.75, it still return 86% of the vertices correct. This reveals that even with a null success probability, the influence the planted clique can still be found in the behavior of the algorithm. However, as the results stated, when we don’t have a planted clique, Kucera’s algorithm is useless, showing that just relying only on the degrees of the vertices is not enough to even find the cliques of (1+o(1)) log n size. As LDR algorithm’s behavior change for different values of n, we can realize that its tolerance to failure is small than Kucera’s algorithm. When c = 0.75, there was the number of correct vertices was less than 20%. This shows that contrary to Kucera’s algorithm, the LDR algorithm is way less influenced by the presence of a planted clique. In addition, the fact of being able to find cliques of (1 + o(1)) log n in a random graph only shows that this algorithm doesn’t need the presence of a planted clique. However, as we can observe, this algorithm is unable to find the cliques of size (2+o(1)), the maximum clique in a random graph . The question remains of why we cannot find planted cliques with k = o(√ 𝑛). As already stated, the presence of the planted clique plays an important role in those algorithms, so when k starts to decrease, the influence of the planted clique begins to disappear and the graph’s behavior acts more like a random graph. Current algorithms focus on the vertices of either highest degrees as possible candidates for the planted clique, or vertices with the lowest degrees as not possible candidates. While focusing on the vertices of highest degree is totally intuitive for finding the maximum clique, we already seen in Kucera’s algorithm performance and in LDR algorithm performance in random graphs without planted clique that that intuition might be only partially correct. As Feige and Ron [Feige and Ron 2010] explained informally, the LDR algorithm cannot find planted cliques of size o(√ 𝑛) because the extra degree received by the formation of the clique is not enough to maintain the planted clique alive during the first 4 steps of the
  • 13. algorithm. This only implies that we need to find ways to consider not only vertices with the highest degree, but also vertices with a median number of degrees. Further research needs to be done understanding the complexity of the Erdos-Renyi random graph. As we can see the in the performance graphs, the number of vertices in the random graph is not the only factor for the success of the algorithms. The results on the success probability of those were not whole values of either 0 or 1, implying that there was a formation in the random graph which the algorithms returned the correct value. In addition, as we can see in Kucera’s and LDR algorithms, there is tradeoff between their running time and the size of the planted clique that they can find. This brings the question that if we were able to develop an algorithm that can find planted cliques of size O(√ 𝑛) with a running time of O(n), would we be able to use that fact in order to modify or develop a new algorithm that will be able to find planted cliques of size k = o(√ 𝑛) in polynomial time? In order to have such algorithm with a running time of O(n), we will need to use properties of the random graph that goes beyond current properties of the random graph being used by current planted clique searcher algorithms.
  • 14. Bibliography: [AKS 1998] N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden clique in a random graph. Random Structures and Algorithms, 13:457-466, 1998 [ABW 2010] Benny Applebaum, Boaz Barak, and Avi Wigderson. Public-key cryptography from different assumptions. Proceedings of the forty- second ACM symposium on Theory of computing, pages 171-180. ACM, 2010 [DGP 2014] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden cliques in linear time with high probability. Combinatorics, Probability and Computing, 23:29-49, 1 2014 [Feige and Ron 2010] U. Feige and D. Ron. Finding hidden cliques in linear time. In AOFA, 2010 [Jerrum 1992] M. Jerrum. Large cliques elude the metropolis process. Random Structures and Algorithms 3:347-359, 1992 [Karp 1976] R. M. Karp. Probabilistic analysis of some combinatorial problem search problems. Algorithms and Complexity: New Directions and Recent Results, pages 1-19, 1976 [Kucera 1995] L. Kucera. Expected complexity in graph partitioning problems. Discrete Applied Math. 193-212, 1995. [PS 2000] Pavel A Pevner, Sing-Hoi Sze, et al. Combinatorial approaches to finding subtle signals in DNA sequences. ISMB, volume 8, pages 269-278, 2000