The Problem
Imagine wanting to empirically compute a statistic on connected graphs. The set of non-isomorphic connected graphs on a given number of nodes becomes huge very quickly. This is visualized in the following plot with logarithmic y-axis:
This is why computing your statistic for every possible graph becomes quickly infeasible and you might want to resort to random sampling instead. Unfortunately, at least my favorite python package for graphs, networkx, does not provide any method to uniformly sample from the set of connected graphs with $n$ nodes.
A Potential Solution
In order to draw from the set of non-isomorphic simple graphs on $n$ vertices uniformly, I am going to use the Metropolis-Hastings Algorithm as described in this book by Roger D. Peng for instance.
The Metropolis-Hastings Algorithm is more like a template which one can use to approximate a desired target distribution given an available proposal distribution. Specifically, the algorithm needs access to the conditional proposal distribution $q(y | x)$ and a ratio $\frac{p(y)}{p(x)}$ where $p(\cdot)$ is the desired target distribution. Its output converges to a sample-set from the target distribution $p(\cdot)$, if you let the algorithm run for enough iterations. Finding out when we have run enough iterations is out of scope for this post, however.
In the discussed problem setting, the target distribution $p(\cdot)$ is the uniform distribution on the set of all non-isomorphic, connected simple graphs on $n$ vertices. I am going to call this set $\mathcal{G}_n$ in the following. Since $p(G) = p(G’)$ for all graphs $G, G’ \in \mathcal{G}_n$, the ratio $\frac{p(G’)}{p(G)}$ just equals $1$.
Now, we need to choose a conditional proposal distribution $q(G’ | G)$. This is basically a transition rule to randomly choose a neighbouring graph $G’$ for a given graph $G$. The transition rule needs to fulfill the condition that we can reach any graph $H$ from any graph $G$ with a series of intermediate graphs $G_1, \dots, G_k$ such that $q(G_1 | G), \dots q(G_{i+1} | G_i), \dots, q(H | G_k) > 0$.
I chose a straightforward conditional proposal distribution $q(G’ | G)$ by creating a list $\mathcal{N}(G)$ of neighbouring graphs of $G$ and taking a uniformly random $G’$ from this list. The neighbor list, $\mathcal{N}(G)$, is defined by iterating through all unordered pairs of vertices $e = { v, w }$ and checking if they are an edge of $G$ or not. If ${ v, w }$ is not an edge of $G$, we just add $G + e$ to the list $\mathcal{N}(G)$. If $e$ is an edge of $G$ however, we first check if $G - e$ is connected. If $G - e$ is connected, we add it to $\mathcal{N}(G)$. If $G - e$ is not connected, we just ignore it. Here we used the shorthand $G + e$ to denote the graph $G$ with edge $e$ added to it and $G - e$ with edge $e$ removed from it. In a last step, we filter $\mathcal{N}(G)$ such that it contains only non-isomorphic graphs.
With these choices, the Metropolis-Hastings Algorithm looks as follows:
- Initialize the list of samples $S \leftarrow \emptyset$
- Initialize $G$.
- Uniformly draw a graph $G’$ from $\mathcal{N}(G)$
- Compute the acceptance ratio $\alpha \leftarrow \min( \frac{|\mathcal{N}(G)|}{|\mathcal{N}(G’)|}, 1)$
- Uniformly draw a rational number $u$ from the interval $[0, 1]$
- If $u \leq \alpha$ set $G \leftarrow G'$
- Add $G$ to the list $S$
- After enough iterations return $S$ - otherwise go to step 3.
Next Steps
So far we have seen an idea for an instantiation of the Metropolis-Hastings Method to sample uniformly from the connected graphs on a given number of vertices. From here, I see two ways to test the idea. The first one would be to somewhat become an expert on MCMC and produce formal proofs that the method is working well, i.e. that my proposal distribution is indeed usable for Metropolis-Hastings and then give asymptotic convergence rates. The second way to test the idea would be to just code up the idea and see if it gives promising experimental results on small graphs. In the spirit of testing ideas quickly, I want to explore my experimental results in a follow-up post.