Revision c6e2ce5a thesis/ch3_computation_of_bc.tex
thesis/ch3_computation_of_bc.tex  

4  4 
Recall the definition of betweenness centrality: 
5  5  
6  6 
\begin{equation*} 
7 
\label{eq:betweenness_centrality} 

8  7 
c_B(v) = \sum_{s \neq t \neq v \in V} \delta_{st}(v) = \sum_{s \neq t \neq v \in V} \frac{\sigma_{st}(v)}{\sigma_{st}} 
9  8 
\end{equation*} 
10  9  
11  10 
Traditionally, just by looking at the equation, we normally divide the task of computing betweenness centrality into 2 steps: 
12  11  
13  12 
\begin{itemize} 
14 
\item First we need to compute tables with length and number of shortest paths between all pairs of vertices in the network.


15 
\item Second for each vertex $v$, we need to consider all the pair $s, t$, we use the tables calculated above to determine the fraction of shortest paths between $s, t$ that are passing through $v$. Summing all the fraction for every possible $s, t$ and we arrive at the $c_B(v)$. This way of summing is referred to by the term \emph{naive summation of pairdependencies} 

13 
\item First we need to compute tables with length and number of shortest paths between all pairs of vertices in the network;


14 
\item Second for each vertex $v$, we need to consider all the pair $s, t$, we use the tables calculated above to determine the fraction of shortest paths between $s, t$ that are passing through $v$. Summing all the fraction for every possible $s, t$ and we arrive at the $c_B(v)$. This way of summing is referred to by the term \emph{naive summation of pairdependencies}.


16  15 
\end{itemize} 
17  16  
18 
The chapter starts with presenting some important concept in \cref{sec:computation_important_concepts}. Then \cref{sec:computation_counting_shortest_paths} gives an overview on classic ways to find the shortest paths. And \cref{sec:computation_summing_pair_dependencies} introduces 2 ways to sum all the pairdependencies: (i) the naive way, (ii) the faster approach to accummulate pairdependencies gradually by \cite{Brandes01afaster}. Up until now, the approach in \cite{Brandes01afaster} is the stateoftheart in calculating the exact betweenness centrality, and it is implemented in many standard graph libraries such as 

19 
\emph{networkx} \footnote{https://networkx.github.io/}, 

20 
\emph{Boost Graph Library} \footnote{www.boost.org/libs/graph/doc/} 

21 
The \cref{sec:computation_summary} intends to summarize the result of two previous section with \cref{tab:complexity_of_algorithms} recaping the complexity of some algorithms used in determining betweenness centrality. 

17 
The chapter starts with presenting some important concept in \cref{sec:computation_important_concepts}. Then \cref{sec:computation_counting_shortest_paths} gives an overview on classic ways to find the shortest paths. And \cref{sec:computation_summing_pair_dependencies} introduces 2 ways to sum all the pairdependencies: (i) the naive way, (ii) the faster approach to accumulate pairdependencies gradually by \cite{Brandes01afaster}. Up until now, the approach in \cite{Brandes01afaster} is the stateoftheart in calculating the exact betweenness centrality, and it is implemented in many standard graph libraries such as 

18 
\emph{networkx}\footnote{https://networkx.github.io/}, 

19 
\emph{Boost Graph Library}\footnote{www.boost.org/libs/graph/doc/} 

20  
21 
The \cref{sec:computation_summary} intends to summarize the result of two previous section by recapping the complexity of some algorithms used in determining betweenness centrality. 

22  22  
23  23 
After having a throughout background, \cref{sec:computation_heuristic} presents the heuristic way to calculate also the exact betweenness centrality by \cite{PuzisZEDB12}. However, the complexity of the Heuristic Betweenness Centrality is not guaranteed to perform better than the algorithm of \cite{Brandes01afaster}, its performance depends on the topology. 
24  24  
25  25 
\section{Important concepts} \label{sec:computation_important_concepts} 
26  
26  27 
Define the set of \emph{predecessors} of a vertex $v$ on the shortest path from $s$ as: 
27  28  
28  29 
\begin{equation} \label{eq:predecessor} 
...  ...  
46  47 
The APSP can be computed straightforwardly by applying algorithm for SSSP $n$ times, for all vertex $s \in V$. The resulting time complexity thus will be $\mathcal{O}(mn + n^2)$, and $\mathcal{O}(mn + n^2 \log n)$ respectively. In the sparse graph, it is a good practice to solving the SSSP multiple times. However, in nonsparse graph, where the number of edges is close to the maximal number of edges, i.e. $m \simeq n(n1) \simeq n^2$, then it is better to use Floyd/Warshall algorithm with the complexity of $\mathcal{O}(n^3)$. 
47  48  
48  49  
49 
\textbf{XXX What is the input}: the source vertex $s$ 

50 
\textbf{XXX What is the output?}: 

50 
\section{Summing all the pairdependencies} \label{sec:computation_summing_pair_dependencies} 

51 
\subsubsection*{Naive ways} 

52 
In order to calculate the betweenness centrality for vertex $v$, we have to sum all the pair dependencies $\delta_{st}(v)$ for all possible source $s$ and target $t$ that are different from $v$. The first part of the algorithm calculates all the distance $d$ between all pair of vertices. Vertex $v$ lies on the shortest path between a pair $(s, t)$ if $d(s, t) = d(s, v) + d(v, t)$. And the number of shortest path from $s$ to $t$ passing through $v$ is calculated as $\sigma_{st}(v) = \sigma_{sv} \cdot \sigma_{vt}$. 

53  
54 
Therefore, summing of all possible $\delta_{st}(v)$ yields $\mathcal{O}(n^2)$, and we have the betweenness centrality per vertex $v$: $c_B(v)$. Doing for all other vertices in the network, and it yields $\mathcal{O}(n^3)$. 

51  55  
52 
\subsection{BreadthFirst Search for unweighted graph}


53 
I understood the algorithm, but how can I explain it to another person?


56 
\subsubsection*{Gradually accumulation of pairdependencies} \label{sec:computation_accumulation_pair_dependencies}


57 
\citeauthor{Brandes01afaster} noticed that there is a better way to sum all the pair dependencies \cite{Brandes01afaster}. The basic way is to finish completely the first part of finding shortest paths and distance between all pair of vertices, and then start with the summation of pair dependencies. The better way is achieved by exploiting the recursive relation of the \emph{dependency} $\delta_{s\LargerCdot}(v)$ of a vertex $s \in V$ on a single vertex $v \in V$, defined as:


54  58  
59 
\begin{equation} 

60 
\label{eq:dependency} 

61 
\delta_{s\LargerCdot}(v) = \sum_{t \in V} \delta_{st}(v) 

62 
\end{equation} 

55  63  
64 
\cite{Brandes01afaster} proved that the \emph{dependency} of $s \in V$ on any $v \in V$ obeys: 

56  65  
57 
\subsection{Dijkstra's algorithm for weighted graph} 

58 
The Dijkstra's algorithm can be modified so that when given a source vertex $s$, the output will be the set of predecessors and the number of shortest paths from $s$ to other $t \neq s \in V$ 

66 
\begin{equation} 

67 
\label{eq:partial_sum_dependency} 

68 
\delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w)) 

69 
\end{equation} 

59  70  
60 
\begin{algorithm} 

61 
\caption{Dijkstra's algorithm} \label{algo:dijkstra} 

62 
\begin{algorithmic}[1] 

63 
\Procedure{Dijkstra for SSSP with source vertex s}{} 

64 
\State $S \leftarrow$ empty stack 

65 
\State $P[w] \leftarrow$ empty list, $w \in V$ 

66 
\State $\sigma[t] \leftarrow 0, t \in V$; $\sigma[s] \leftarrow 1$ 

67 
\State $d[t] \leftarrow 1, t \in V$; $d[s] \leftarrow 0$ 

68 
\State $Q \leftarrow $ empty queue 

69 
\State enqueue $s \rightarrow Q$ 

70 
\While{$Q$ \emph{not empty}} 

71 
dequeue $v \leftarrow Q$ 

72 
push $v \rightarrow S$ 

71 
The set of predecessor of vertex $w$ on the shortest path from $s$: $P_s(w)$ can be obtained from the first part of the algorithm. And the dependencies of $s$ on all vertices can be computed in $\mathcal{O}(m)$. Therefore, instead of solving the APSP algorithm from start to end, we solve the SSSP for each source vertex $s \in V$. After completing the SSSP algorithm, we calculate the dependencies of $s$ as laid out in \cref{eq:partial_sum_dependency}. 

73  72  
74 
\EndWhile 

75 
\EndProcedure 

76 
\end{algorithmic} 

77 
\end{algorithm} 

73 
Next, we look at the relation between the betweenness centrality for $v \in V$ $c_B(v)$ and dependency $\delta_{s\LargerCdot}(v)$. Summing all the dependencies of $s \ in V$ on a single vertex $v$, and we arrive at the betweenness centrality $c_B(v)$ as in the following equation: 

78  74  
75 
\begin{equation} 

76 
\label{eq:accummulate_betweenness_centrality} 

77 
c_B(v) = \sum_{s \neq v } \delta_{s\LargerCdot}(v) 

78 
\end{equation} 

79  79  
80 
\section{Summing all the pairdependencies} \label{sec:computation_summing_pair_dependencies}


81 
\subsubsection{Naive ways} 

82 
\subsubsection{Gradually accummulation of pairdependencies} \label{sec:computation_accumulation_pair_dependencies}


80 
Therefore, we can then gradually accummulate the betweenness centrality score $c_B(v)$ by summing up the dependency $\delta_{s\LargerCdot}(v)$ to $c_B(v)$.


81  
82 
Next section is intended to provide the highlevel on the algorithm to calculate betweenness centrality


83  83  
84  84  
85  85 
\section{Complexity of determining betweenness centrality as a whole} \label{sec:computation_summary} 
86  86  
87 
\begin{table} [h] 

88 
\caption{Complexity for the 1st step: counting the number and length of shortest paths} 

89 
\label{tab:complexity_of_algorithms} 

90 
\begin{tabular}{lll} 

91 
\toprule 

92 
Algorithm name & Unweighted graphs & Weighted graph \\ 

93 
\midrule 

94 
BreadthFirst Search & $\mathcal{O}(mn + n^2)$ & not applicable\\ 

95 
Dijkstra & & $\mathcal{O}(mn + n^2 \log n)$ \\ 

96 
Floyd/Warshall & & $\mathcal{O}(n^3)$ \\ 

97 
\bottomrule 

98 
\end{tabular} 

99 
\end{table} 

100  
101 
\begin{table} [h] 

102 
\caption{Complexity for the 2nd step: summing all pairdependencies} 

103 
\label{tab:complexity_of_algorithms} 

104 
\begin{tabular}{lll} 

105 
\toprule 

106 
Algorithm name & Complexity \\ 

107 
\midrule 

108 
Naive summation & $\mathcal{O}(n^3)$ \\ 

109 
\cite{Brandes01afaster} & $\mathcal{O}(mn)$\\ 

110 
\bottomrule 

111 
\end{tabular} 

112 
\end{table} 

87 
This section summarizes the complexity for naive way, and for Brandes' algorithm to calculate betweenness centrality. The \cref{algo:basic_bc} run in $\mathcal{O}(n^3)$ for both unweighted and weighted graphs. And the Brandes's algorithm in \cref{algo:brandes_bc} can run in $\mathcal{O}(mn)$ for unweighted graph, and $\mathcal{O}(mn + n^2 \log n)$. The breaking down of the computational time of those two algorithms are given below. 

88  
89 
For \cref{algo:basic_bc}, the overall computation is dominated by line 2 to sum all the pair dependencies. 

90  
91 
\begin{algorithm} 

92 
\caption{Basic way to calculate betweenness centrality} \label{algo:basic_bc} 

93 
\begin{algorithmic}[1] 

94 
\Procedure{Basic Betweenness Centrality}{} 

95 
\State Solving APSP problem 

96 
\State Summing all pairdependencies 

97 
\EndProcedure 

98 
\end{algorithmic} 

99 
\end{algorithm} 

100  
101 
For \cref{algo:brandes_bc}, within the \texttt{for} loop, for unweighted graph, each execution from line 2  3 executes in $\mathcal{O}(m)$. For weighted graph, solving SSSP problem dominates the computation with $O(m + n \log n)$ for unweighted graph. The \texttt{for} loop is executed $n$ times. Thus, we have the running time stated in the beginning of the section. 

102  
103 
\begin{algorithm} 

104 
\caption{Brandes betweenness centrality} \label{algo:brandes_bc} 

105 
\begin{algorithmic}[1] 

106 
\Procedure{Brandes Betweenness Centrality}{} 

107 
\For{ $s \in V$ do } 

108 
\State Solving SSSP problem for source $s$ (with BFS or Dijkstra's algorithm) 

109 
\State Finding all the dependencies of $s$ for all other vertices $v$ \cref{eq:partial_sum_dependency} 

110 
\State Accumulate the betweenness centrality for vertices $v$ \cref{eq:accummulate_betweenness_centrality} 

111 
\EndFor 

112 
\EndProcedure 

113 
\end{algorithmic} 

114 
\end{algorithm} 

115  
113  116  
114  117 
\section{Heuristic algorithm for the exact betweenness centrality} \label{sec:computation_heuristic} 
118 
We start the section with the highlevel description of steps in the heuristic algorithm. We also provides definition for some term that deem to be important. For the details and all the proofs, reader can always refer to \cite{PuzisZEDB12} 

119  
120 
\citeauthor{PuzisZEDB12} uses the insight that computing betweenness centrality on tree can be done in linear time in \cite{PuzisZEDB12}. From now on this algorithm is denoted as HBC. He defined a heuristic algorithm to speed up the betweenness centrality calculation. The original graph is partitioned into one or many biconnected components (BCCs). Then the biconnected components tree is formed, and several analysis is performed on the BCC tree. The summary of HBC's steps are as follow: 

121  
122 
\begin{enumerate} 

123 
\item Divide the graphs into biconnected components and form BCC tree; 

124 
\item Compute the \emph{Component Tree Weight}. This step involves the knowledge of the whole graph; 

125 
\item Form the matrix with the \emph{communication intensity} value for each pair of vertices in a BCC; 

126 
\item Calculate the betweenness centrality for each vertices in each BCC using \cref{eq:BC_for_v_heuristic}; 

127 
\item Finalize the betweenness centrality for the original graph. 

128 
\end{enumerate} 

129  
130 
\subsection*{Partitioning a graph into biconnected components} 

131  
132 
Given a graph $G=(V,E)$. If the removal of any $v\in V$ disconnects $G$, then $v$ is said to be a \emph{cutpoint} (or \emph{articulation point}). A graph is said to be \emph{biconnected }if at least 2 vertices must be removed to disconnect the graph. \cref{fig:puzis_bcc} from \cite{PuzisZEDB12} gives an example of partitioning the graph into BCCs. 

133  
134 
\begin{figure} 

135 
\centering 

136 
\includegraphics[scale=0.5]{images/puzis_bcc.png} 

137 
\caption[BCC partition and cutpoints]{BCC partition and cutpoints. Reproduced from \cite{PuzisZEDB12}.} 

138 
\label{fig:puzis_bcc} 

139 
\end{figure} 

140  
141  
142 
\subsection*{Component Tree Weight} 

143 
The Component Tree Weight is intended to keep track of how many number of nodes can be reach via a cutpoint of a specific BCC when that BCC is separated from the original graph.Let $B$ be a biconnected component and $v \in B$ be a cutpoint. Let $B^{+}(v)$ be the \emph{connected component} in G which includes $B$ after the removal of $v$. In another term, $B^{+}(v)$ include all the vertices in G that are connected to $v$ via $B$. An example is for a biconnected component $BCC_B$ in \cref{fig:puzis_bcc}. After removing vertex $v$, the connected component ${BCC_B}^{+}(v) = \{5, 6, u, 1, 2, 3, 4\}$. The \emph{cut with} value is defined to be the size of $D_B^{v+} = B^{+}(v)$. 

144  
145 
Define $B^{}(v)$ as a set of vertices that can only be reach through $v$ from $B$. Therefore, when $v$ is cut out from the graph, those vertices in $B^{+}(v)$ (a connected component including $B$ without the vertex $v$) cannot reach $B^{}(v)$. The ``cut without'' value is defined to be $D_B^{v} = B^{}(v)$. For example, when removing vertex $v$ from $BCC_B$, ${BCC_B}^{}(v) = \{7, 8, 9\}$. Those vertices cannot be reach from $B^{+}(v)$ anymore. 

146  
147 
The relation between ``cut with'' and ``cut without'' is as follow: $B^{+}(v) + 1 + B^{+}(v) = V$. It can be thought as the number of vertices. 

148  
149 
\cref{tab:component_tree_weight} shows the value assigned to Component Tree Weight after the Step 2. 

150  
151 
\begin{table}[h] 

152 
\caption{Component Tree Weight calculation for \cref{fig:puzis_bcc}} 

153 
\label{tab:component_tree_weight} 

154 
\centering 

155 
\begin{tabular}{l l l l} 

156 
\toprule 

157 
Component & Cutpoint & ``Cut with'' value & ``Cut without'' value \\ 

158 
\midrule 

159 
$BCC_A$ & u & 2 & 8 \\ 

160 
$BCC_B$ & u & 8 & 2 \\ 

161 
$BCC_C$ & v & 7 & 3 \\ 

162 
$BCC_D$ & v & 1 & 9 \\ 

163 
\bottomrule 

164 
\end{tabular} 

165 
\end{table} 

166  
167  
168 
\subsection*{Communication intensity} 

169 
For each biconnected component $B_i$, we calculate the traffic matrix $h^{i}_{st}$ representing the \emph{communication intensity} between a pair of vertex $s$, $t$. The traffic matrix is symmetric. There are 3 cases: 

170  
171 
\begin{itemize} 

172 
\item If $x$ and $y$ are not cutpoints, then $h_{xy} = 0$ for $x = y$ and $h_{xy} = 1$ for $x \neq y$ 

173 
\item If $x$ is not a cutpoint, and $y$ is a cutpoint, then $h_{xy} = D_B^{y} + 1$ 

174 
\item If both $x$ and $y$ are cutpoints, then $h_{xy} = (D_B^{x} + 1) \cdot (D_B^{y} + 1)$ 

175 
\end{itemize} 

176  
177 
\subsection*{Betweenness computation for biconnected component} 

178 
The betweenness centrality calculated 

179 
\begin{equation} \label{eq:BC_for_v_heuristic} 

180 
c_B(v)_{v \in B_i} = \sum_{s, t \in B_i} \delta_{st}(v) \dot h_{st}^{i} 

181 
\end{equation} 

182 
And the heuristic algorithm also use the faster Brandes's algorithm to calculate betweenness centrality. Recall the equation for summing the dependency of $s \in V$ on any $v \in V$ 

183  
184 
\begin{equation*} 

185 
\delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w)) 

186 
\end{equation*} 

187 
For any biconnected component $B_i$, the dependency of $s \in B_i$ on any $v \in B_i$ can be written as follow: 

188  
189 
\begin{equation} 

190 
\label{eq:dependency_heuristic_partial_sum} 

191 
\delta_{s\LargerCdot}(v) = h^{i}_{sv} + \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot \delta_{s\LargerCdot}(w) 

192 
\end{equation} 

193  
194 
\subsection*{Betweenness computation for original graph $G$} 

195 
It was proven in the paper that for noncutpoint, the betweenness value calculated locally in each $B_i$ is equal to the betweenness value calculated on the whole graph. 

196  
197 
\begin{equation} 

198 
c_B(v) = c_B(v)_{v \in B} 

199 
\end{equation} 

200 
For the cutpoint betweenness value, we first have to figure out the intercomponent communication, denoted as $BC^{inter}(v)$: 

201  
202 
\begin{equation} 

203 
BC^{inter}(v) = \sum_{B \ni v} D^{v}_{B} \cdot D^{v+}_{B} 

204 
\end{equation} 

205 
For cutpoint $v$, the betweenness value on the whole graph is defined as: 

206  
207 
\begin{equation} 

208 
c_B(v) = \sum_{B \ni v} c_B(v)_{v \in B}  BC^{inter}(v) 

209 
\end{equation} 
Also available in: Unified diff