Statistics
| Branch: | Revision:

## root / thesis / ch3_computation_of_bc.tex @ c6e2ce5a

 1 %!TEX root = main.tex  \chapter{Computation of Betweenness Centrality} \label{sec:computation_betweenness}   Recall the definition of betweenness centrality:   \begin{equation*}   c_B(v) = \sum_{s \neq t \neq v \in V} \delta_{st}(v) = \sum_{s \neq t \neq v \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}   \end{equation*}   Traditionally, just by looking at the equation, we normally divide the task of computing betweenness centrality into 2 steps:   \begin{itemize}   \item First we need to compute tables with length and number of shortest paths between all pairs of vertices in the network;   \item Second for each vertex $v$, we need to consider all the pair $s, t$, we use the tables calculated above to determine the fraction of shortest paths between $s, t$ that are passing through $v$. Summing all the fraction for every possible $s, t$ and we arrive at the $c_B(v)$. This way of summing is referred to by the term \emph{naive summation of pair-dependencies}.   \end{itemize}   The chapter starts with presenting some important concept in \cref{sec:computation_important_concepts}. Then \cref{sec:computation_counting_shortest_paths} gives an overview on classic ways to find the shortest paths. And \cref{sec:computation_summing_pair_dependencies} introduces 2 ways to sum all the pair-dependencies: (i) the naive way, (ii) the faster approach to accumulate pair-dependencies gradually by \cite{Brandes01afaster}. Up until now, the approach in \cite{Brandes01afaster} is the state-of-the-art in calculating the exact betweenness centrality, and it is implemented in many standard graph libraries such as   \emph{networkx}\footnote{https://networkx.github.io/},   \emph{Boost Graph Library}\footnote{www.boost.org/libs/graph/doc/}   The \cref{sec:computation_summary} intends to summarize the result of two previous section by recapping the complexity of some algorithms used in determining betweenness centrality.   After having a throughout background, \cref{sec:computation_heuristic} presents the heuristic way to calculate also the exact betweenness centrality by \cite{PuzisZEDB12}. However, the complexity of the Heuristic Betweenness Centrality is not guaranteed to perform better than the algorithm of \cite{Brandes01afaster}, its performance depends on the topology.  \section{Important concepts} \label{sec:computation_important_concepts}   Define the set of \emph{predecessors} of a vertex $v$ on the shortest path from $s$ as:   \label{eq:predecessor}   P_s(v) = \{ u \in V : {u, v} \in E, d(s, v) = d(s, u) + \omega(u, v) \}     where $d(s,u)$ is the minimum length of any path connecting $s$ and $u$. And $\omega(u, v)$ is the weight of the edge ${u, v}$.   Given $P_s(v)$, then we have the following relation:     \sigma_{st} = \sum_{u \in P_s(v)}{\sigma_{su}}    \section{Counting the number of shortest paths} \label{sec:computation_counting_shortest_paths}   Before we begin, there are 2 types of shortest paths problem. The first one is called Single Source Shortest Paths (SSSP), where we compute the shortest paths between one given source vertex $s$ to all other vertices. The second one is All-Pairs Shortest Paths (APSP) where we compute the shortest paths between all vertex pairs.   The SSSP is a classical problem in Graph Theory, and it is usually solved by Breadth-First Search (BFS) for unweighted graphs in $\mathcal{O}(m + n)$, and Dijkstra's algorithm for the weighted graphs in $\mathcal{O}(m + n \log n)$.   The APSP can be computed straightforwardly by applying algorithm for SSSP $n$ times, for all vertex $s \in V$. The resulting time complexity thus will be $\mathcal{O}(mn + n^2)$, and $\mathcal{O}(mn + n^2 \log n)$ respectively. In the sparse graph, it is a good practice to solving the SSSP multiple times. However, in non-sparse graph, where the number of edges is close to the maximal number of edges, i.e. $m \simeq n(n-1) \simeq n^2$, then it is better to use Floyd/Warshall algorithm with the complexity of $\mathcal{O}(n^3)$.  \section{Summing all the pair-dependencies} \label{sec:computation_summing_pair_dependencies}   \subsubsection*{Naive ways}   In order to calculate the betweenness centrality for vertex $v$, we have to sum all the pair dependencies $\delta_{st}(v)$ for all possible source $s$ and target $t$ that are different from $v$. The first part of the algorithm calculates all the distance $d$ between all pair of vertices. Vertex $v$ lies on the shortest path between a pair $(s, t)$ if $d(s, t) = d(s, v) + d(v, t)$. And the number of shortest path from $s$ to $t$ passing through $v$ is calculated as $\sigma_{st}(v) = \sigma_{sv} \cdot \sigma_{vt}$.   Therefore, summing of all possible $\delta_{st}(v)$ yields $\mathcal{O}(n^2)$, and we have the betweenness centrality per vertex $v$: $c_B(v)$. Doing for all other vertices in the network, and it yields $\mathcal{O}(n^3)$.   \subsubsection*{Gradually accumulation of pair-dependencies} \label{sec:computation_accumulation_pair_dependencies}   \citeauthor{Brandes01afaster} noticed that there is a better way to sum all the pair dependencies \cite{Brandes01afaster}. The basic way is to finish completely the first part of finding shortest paths and distance between all pair of vertices, and then start with the summation of pair dependencies. The better way is achieved by exploiting the recursive relation of the \emph{dependency} $\delta_{s\LargerCdot}(v)$ of a vertex $s \in V$ on a single vertex $v \in V$, defined as:     \label{eq:dependency}   \delta_{s\LargerCdot}(v) = \sum_{t \in V} \delta_{st}(v)     \cite{Brandes01afaster} proved that the \emph{dependency} of $s \in V$ on any $v \in V$ obeys:     \label{eq:partial_sum_dependency}   \delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w))     The set of predecessor of vertex $w$ on the shortest path from $s$: $P_s(w)$ can be obtained from the first part of the algorithm. And the dependencies of $s$ on all vertices can be computed in $\mathcal{O}(m)$. Therefore, instead of solving the APSP algorithm from start to end, we solve the SSSP for each source vertex $s \in V$. After completing the SSSP algorithm, we calculate the dependencies of $s$ as laid out in \cref{eq:partial_sum_dependency}.   Next, we look at the relation between the betweenness centrality for $v \in V$ $c_B(v)$ and dependency $\delta_{s\LargerCdot}(v)$. Summing all the dependencies of $s \ in V$ on a single vertex $v$, and we arrive at the betweenness centrality $c_B(v)$ as in the following equation:     \label{eq:accummulate_betweenness_centrality}   c_B(v) = \sum_{s \neq v } \delta_{s\LargerCdot}(v)     Therefore, we can then gradually accummulate the betweenness centrality score $c_B(v)$ by summing up the dependency $\delta_{s\LargerCdot}(v)$ to $c_B(v)$.   Next section is intended to provide the high-level on the algorithm to calculate betweenness centrality  \section{Complexity of determining betweenness centrality as a whole} \label{sec:computation_summary}   This section summarizes the complexity for naive way, and for Brandes' algorithm to calculate betweenness centrality. The \cref{algo:basic_bc} run in $\mathcal{O}(n^3)$ for both unweighted and weighted graphs. And the Brandes's algorithm in \cref{algo:brandes_bc} can run in $\mathcal{O}(mn)$ for unweighted graph, and $\mathcal{O}(mn + n^2 \log n)$. The breaking down of the computational time of those two algorithms are given below.   For \cref{algo:basic_bc}, the overall computation is dominated by line 2 to sum all the pair dependencies.   \begin{algorithm}   \caption{Basic way to calculate betweenness centrality} \label{algo:basic_bc}   \begin{algorithmic}[1]   \Procedure{Basic Betweenness Centrality}{}   \State Solving APSP problem   \State Summing all pair-dependencies   \EndProcedure   \end{algorithmic}   \end{algorithm}   For \cref{algo:brandes_bc}, within the \texttt{for} loop, for unweighted graph, each execution from line 2 - 3 executes in $\mathcal{O}(m)$. For weighted graph, solving SSSP problem dominates the computation with $O(m + n \log n)$ for unweighted graph. The \texttt{for} loop is executed $n$ times. Thus, we have the running time stated in the beginning of the section.   \begin{algorithm}   \caption{Brandes betweenness centrality} \label{algo:brandes_bc}   \begin{algorithmic}[1]   \Procedure{Brandes Betweenness Centrality}{}   \For{ $s \in V$ do }   \State Solving SSSP problem for source $s$ (with BFS or Dijkstra's algorithm)   \State Finding all the dependencies of $s$ for all other vertices $v$ \cref{eq:partial_sum_dependency}   \State Accumulate the betweenness centrality for vertices $v$ \cref{eq:accummulate_betweenness_centrality}   \EndFor   \EndProcedure   \end{algorithmic}   \end{algorithm}  \section{Heuristic algorithm for the exact betweenness centrality} \label{sec:computation_heuristic}   We start the section with the high-level description of steps in the heuristic algorithm. We also provides definition for some term that deem to be important. For the details and all the proofs, reader can always refer to \cite{PuzisZEDB12}   \citeauthor{PuzisZEDB12} uses the insight that computing betweenness centrality on tree can be done in linear time in \cite{PuzisZEDB12}. From now on this algorithm is denoted as HBC. He defined a heuristic algorithm to speed up the betweenness centrality calculation. The original graph is partitioned into one or many bi-connected components (BCCs). Then the bi-connected components tree is formed, and several analysis is performed on the BCC tree. The summary of HBC's steps are as follow:   \begin{enumerate}   \item Divide the graphs into bi-connected components and form BCC tree;   \item Compute the \emph{Component Tree Weight}. This step involves the knowledge of the whole graph;   \item Form the matrix with the \emph{communication intensity} value for each pair of vertices in a BCC;   \item Calculate the betweenness centrality for each vertices in each BCC using \cref{eq:BC_for_v_heuristic};   \item Finalize the betweenness centrality for the original graph.   \end{enumerate}   \subsection*{Partitioning a graph into bi-connected components}   Given a graph $G=(V,E)$. If the removal of any $v\in V$ disconnects $G$, then $v$ is said to be a \emph{cut-point} (or \emph{articulation point}). A graph is said to be \emph{bi-connected }if at least 2 vertices must be removed to disconnect the graph. \cref{fig:puzis_bcc} from \cite{PuzisZEDB12} gives an example of partitioning the graph into BCCs.   \begin{figure}   \centering   \includegraphics[scale=0.5]{images/puzis_bcc.png}   \caption[BCC partition and cut-points]{BCC partition and cut-points. Reproduced from \cite{PuzisZEDB12}.}   \label{fig:puzis_bcc}   \end{figure}   \subsection*{Component Tree Weight}   The Component Tree Weight is intended to keep track of how many number of nodes can be reach via a cut-point of a specific BCC when that BCC is separated from the original graph.Let $B$ be a bi-connected component and $v \in B$ be a cut-point. Let $B^{+}(v)$ be the \emph{connected component} in G which includes $B$ after the removal of $v$. In another term, $B^{+}(v)$ include all the vertices in G that are connected to $v$ via $B$. An example is for a bi-connected component $BCC_B$ in \cref{fig:puzis_bcc}. After removing vertex $v$, the connected component ${BCC_B}^{+}(v) = \{5, 6, u, 1, 2, 3, 4\}$. The \emph{cut with} value is defined to be the size of $D_B^{v+} = |B^{+}(v)|$.   Define $B^{-}(v)$ as a set of vertices that can only be reach through $v$ from $B$. Therefore, when $v$ is cut out from the graph, those vertices in $B^{+}(v)$ (a connected component including $B$ without the vertex $v$) cannot reach $B^{-}(v)$. The cut without'' value is defined to be $D_B^{v-} = |B^{-}(v)|$. For example, when removing vertex $v$ from $BCC_B$, ${BCC_B}^{-}(v) = \{7, 8, 9\}$. Those vertices cannot be reach from $B^{+}(v)$ anymore.   The relation between cut with'' and cut without'' is as follow: $|B^{+}(v)| + 1 + |B^{+}(v)| = |V|$. It can be thought as the number of vertices.   \cref{tab:component_tree_weight} shows the value assigned to Component Tree Weight after the Step 2.   \begin{table}[h]   \caption{Component Tree Weight calculation for \cref{fig:puzis_bcc}}   \label{tab:component_tree_weight}   \centering   \begin{tabular}{l l l l}   \toprule   Component & Cut-point & Cut with'' value & Cut without'' value \\   \midrule   $BCC_A$ & u & 2 & 8 \\   $BCC_B$ & u & 8 & 2 \\   $BCC_C$ & v & 7 & 3 \\   $BCC_D$ & v & 1 & 9 \\   \bottomrule   \end{tabular}   \end{table}   \subsection*{Communication intensity}   For each bi-connected component $B_i$, we calculate the traffic matrix $h^{i}_{st}$ representing the \emph{communication intensity} between a pair of vertex $s$, $t$. The traffic matrix is symmetric. There are 3 cases:   \begin{itemize}   \item If $x$ and $y$ are not cut-points, then $h_{xy} = 0$ for $x = y$ and $h_{xy} = 1$ for $x \neq y$   \item If $x$ is not a cut-point, and $y$ is a cut-point, then $h_{xy} = D_B^{y-} + 1$   \item If both $x$ and $y$ are cut-points, then $h_{xy} = (D_B^{x-} + 1) \cdot (D_B^{y-} + 1)$   \end{itemize}   \subsection*{Betweenness computation for bi-connected component}   The betweenness centrality calculated   \label{eq:BC_for_v_heuristic}   c_B(v)_{v \in B_i} = \sum_{s, t \in B_i} \delta_{st}(v) \dot h_{st}^{i}     And the heuristic algorithm also use the faster Brandes's algorithm to calculate betweenness centrality. Recall the equation for summing the dependency of $s \in V$ on any $v \in V$   \begin{equation*}   \delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w))   \end{equation*}   For any bi-connected component $B_i$, the dependency of $s \in B_i$ on any $v \in B_i$ can be written as follow:     \label{eq:dependency_heuristic_partial_sum}   \delta_{s\LargerCdot}(v) = h^{i}_{sv} + \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot \delta_{s\LargerCdot}(w)     \subsection*{Betweenness computation for original graph $G$}   It was proven in the paper that for non-cut-point, the betweenness value calculated locally in each $B_i$ is equal to the betweenness value calculated on the whole graph.     c_B(v) = c_B(v)_{v \in B}     For the cut-point betweenness value, we first have to figure out the inter-component communication, denoted as $BC^{inter}(v)$:     BC^{inter}(v) = \sum_{B \ni v} D^{v-}_{B} \cdot D^{v+}_{B}     For cut-point $v$, the betweenness value on the whole graph is defined as:     c_B(v) = \sum_{B \ni v} c_B(v)_{v \in B} - BC^{inter}(v)