Revision c6e2ce5a thesis/ch3_computation_of_bc.tex

View differences:

thesis/ch3_computation_of_bc.tex
4 4
    Recall the definition of betweenness centrality:
5 5

  
6 6
    \begin{equation*}
7
        \label{eq:betweenness_centrality}
8 7
        c_B(v) = \sum_{s \neq t \neq v \in V} \delta_{st}(v) = \sum_{s \neq t \neq v \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}
9 8
    \end{equation*}
10 9

  
11 10
    Traditionally, just by looking at the equation, we normally divide the task of computing betweenness centrality into 2 steps:
12 11

  
13 12
    \begin{itemize}
14
        \item First we need to compute tables with length and number of shortest paths between all pairs of vertices in the network.
15
        \item Second for each vertex $v$, we need to consider all the pair $s, t$, we use the tables calculated above to determine the fraction of shortest paths between $s, t$ that are passing through $v$. Summing all the fraction for every possible $s, t$ and we arrive at the $c_B(v)$. This way of summing is referred to by the term \emph{naive summation of pair-dependencies}
13
        \item First we need to compute tables with length and number of shortest paths between all pairs of vertices in the network;
14
        \item Second for each vertex $v$, we need to consider all the pair $s, t$, we use the tables calculated above to determine the fraction of shortest paths between $s, t$ that are passing through $v$. Summing all the fraction for every possible $s, t$ and we arrive at the $c_B(v)$. This way of summing is referred to by the term \emph{naive summation of pair-dependencies}.
16 15
    \end{itemize}
17 16

  
18
    The chapter starts with presenting some important concept in \cref{sec:computation_important_concepts}. Then \cref{sec:computation_counting_shortest_paths} gives an overview on classic ways to find the shortest paths. And \cref{sec:computation_summing_pair_dependencies} introduces 2 ways to sum all the pair-dependencies: (i) the naive way, (ii) the faster approach to accummulate pair-dependencies gradually by \cite{Brandes01afaster}. Up until now, the approach in \cite{Brandes01afaster} is the state-of-the-art in calculating the exact betweenness centrality, and it is implemented in many standard graph libraries such as
19
        \emph{networkx} \footnote{https://networkx.github.io/},
20
        \emph{Boost Graph Library} \footnote{www.boost.org/libs/graph/doc/}
21
    The \cref{sec:computation_summary} intends to summarize the result of two previous section with \cref{tab:complexity_of_algorithms} recaping the complexity of some algorithms used in determining betweenness centrality.
17
    The chapter starts with presenting some important concept in \cref{sec:computation_important_concepts}. Then \cref{sec:computation_counting_shortest_paths} gives an overview on classic ways to find the shortest paths. And \cref{sec:computation_summing_pair_dependencies} introduces 2 ways to sum all the pair-dependencies: (i) the naive way, (ii) the faster approach to accumulate pair-dependencies gradually by \cite{Brandes01afaster}. Up until now, the approach in \cite{Brandes01afaster} is the state-of-the-art in calculating the exact betweenness centrality, and it is implemented in many standard graph libraries such as
18
        \emph{networkx}\footnote{https://networkx.github.io/},
19
        \emph{Boost Graph Library}\footnote{www.boost.org/libs/graph/doc/}
20

  
21
    The \cref{sec:computation_summary} intends to summarize the result of two previous section by recapping the complexity of some algorithms used in determining betweenness centrality.
22 22

  
23 23
    After having a throughout background, \cref{sec:computation_heuristic} presents the heuristic way to calculate also the exact betweenness centrality by \cite{PuzisZEDB12}. However, the complexity of the Heuristic Betweenness Centrality is not guaranteed to perform better than the algorithm of \cite{Brandes01afaster}, its performance depends on the topology.
24 24

  
25 25
\section{Important concepts} \label{sec:computation_important_concepts}
26

  
26 27
    Define the set of \emph{predecessors} of a vertex $v$ on the shortest path from $s$ as:
27 28

  
28 29
    \begin{equation} \label{eq:predecessor}
......
46 47
    The APSP can be computed straightforwardly by applying algorithm for SSSP $n$ times, for all vertex $s \in V$. The resulting time complexity thus will be $\mathcal{O}(mn + n^2)$, and $\mathcal{O}(mn + n^2 \log n)$ respectively. In the sparse graph, it is a good practice to solving the SSSP multiple times. However, in non-sparse graph, where the number of edges is close to the maximal number of edges, i.e. $m \simeq n(n-1) \simeq n^2$, then it is better to use Floyd/Warshall algorithm with the complexity of $\mathcal{O}(n^3)$.
47 48

  
48 49

  
49
    \textbf{XXX What is the input}: the source vertex $s$
50
    \textbf{XXX What is the output?}:
50
\section{Summing all the pair-dependencies} \label{sec:computation_summing_pair_dependencies}
51
    \subsubsection*{Naive ways}
52
        In order to calculate the betweenness centrality for vertex $v$, we have to sum all the pair dependencies $\delta_{st}(v)$ for all possible source $s$ and target $t$ that are different from $v$. The first part of the algorithm calculates all the distance $d$ between all pair of vertices. Vertex $v$ lies on the shortest path between a pair $(s, t)$ if $d(s, t) = d(s, v) + d(v, t)$. And the number of shortest path from $s$ to $t$ passing through $v$ is calculated as $\sigma_{st}(v) = \sigma_{sv} \cdot \sigma_{vt}$.
53

  
54
        Therefore, summing of all possible $\delta_{st}(v)$ yields $\mathcal{O}(n^2)$, and we have the betweenness centrality per vertex $v$: $c_B(v)$. Doing for all other vertices in the network, and it yields $\mathcal{O}(n^3)$.
51 55

  
52
    \subsection{Breadth-First Search for unweighted graph}
53
        I understood the algorithm, but how can I explain it to another person?
56
    \subsubsection*{Gradually accumulation of pair-dependencies} \label{sec:computation_accumulation_pair_dependencies}
57
        \citeauthor{Brandes01afaster} noticed that there is a better way to sum all the pair dependencies \cite{Brandes01afaster}. The basic way is to finish completely the first part of finding shortest paths and distance between all pair of vertices, and then start with the summation of pair dependencies. The better way is achieved by exploiting the recursive relation of the \emph{dependency} $\delta_{s\LargerCdot}(v)$ of a vertex $s \in V$ on a single vertex $v \in V$, defined as:
54 58

  
59
        \begin{equation}
60
            \label{eq:dependency}
61
            \delta_{s\LargerCdot}(v) = \sum_{t \in V} \delta_{st}(v)
62
        \end{equation}
55 63

  
64
        \cite{Brandes01afaster} proved that the \emph{dependency} of $s \in V$ on any $v \in V$ obeys:
56 65

  
57
    \subsection{Dijkstra's algorithm for weighted graph}
58
        The Dijkstra's algorithm can be modified so that when given a source vertex $s$, the output will be the set of predecessors and the number of shortest paths from $s$ to other $t \neq s \in V$
66
        \begin{equation}
67
            \label{eq:partial_sum_dependency}
68
            \delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w))
69
        \end{equation}
59 70

  
60
        \begin{algorithm}
61
        \caption{Dijkstra's algorithm} \label{algo:dijkstra}
62
        \begin{algorithmic}[1]
63
            \Procedure{Dijkstra for SSSP with source vertex s}{}
64
                \State $S \leftarrow$ empty stack
65
                \State $P[w] \leftarrow$ empty list, $w \in V$
66
                \State $\sigma[t] \leftarrow 0, t \in V$; $\sigma[s] \leftarrow 1$
67
                \State $d[t] \leftarrow -1, t \in V$; $d[s] \leftarrow 0$
68
                \State $Q \leftarrow $ empty queue
69
                \State enqueue $s \rightarrow Q$
70
                \While{$Q$ \emph{not empty}}
71
                    dequeue $v \leftarrow Q$
72
                    push $v \rightarrow S$
71
        The set of predecessor of vertex $w$ on the shortest path from $s$: $P_s(w)$ can be obtained from the first part of the algorithm. And the dependencies of $s$ on all vertices can be computed in $\mathcal{O}(m)$. Therefore, instead of solving the APSP algorithm from start to end, we solve the SSSP for each source vertex $s \in V$. After completing the SSSP algorithm, we calculate the dependencies of $s$ as laid out in \cref{eq:partial_sum_dependency}.
73 72

  
74
                \EndWhile
75
            \EndProcedure
76
        \end{algorithmic}
77
        \end{algorithm}
73
        Next, we look at the relation between the betweenness centrality for $v \in V$ $c_B(v)$ and dependency $\delta_{s\LargerCdot}(v)$. Summing all the dependencies of $s \ in V$ on a single vertex $v$, and we arrive at the betweenness centrality $c_B(v)$ as in the following equation:
78 74

  
75
        \begin{equation}
76
            \label{eq:accummulate_betweenness_centrality}
77
            c_B(v) = \sum_{s \neq v } \delta_{s\LargerCdot}(v)
78
        \end{equation}
79 79

  
80
\section{Summing all the pair-dependencies} \label{sec:computation_summing_pair_dependencies}
81
    \subsubsection{Naive ways}
82
    \subsubsection{Gradually accummulation of pair-dependencies} \label{sec:computation_accumulation_pair_dependencies}
80
        Therefore, we can then gradually accummulate the betweenness centrality score $c_B(v)$ by summing up the dependency $\delta_{s\LargerCdot}(v)$ to $c_B(v)$.
81

  
82
        Next section is intended to provide the high-level on the algorithm to calculate betweenness centrality
83 83

  
84 84

  
85 85
\section{Complexity of determining betweenness centrality as a whole} \label{sec:computation_summary}
86 86

  
87
     \begin{table} [h]
88
        \caption{Complexity for the 1st step: counting the number and length of shortest paths}
89
        \label{tab:complexity_of_algorithms}
90
        \begin{tabular}{l|l|l}
91
            \toprule
92
            Algorithm name & Unweighted graphs & Weighted graph \\
93
            \midrule
94
            Breadth-First Search & $\mathcal{O}(mn + n^2)$ & not applicable\\
95
            Dijkstra &  & $\mathcal{O}(mn + n^2 \log n)$ \\
96
            Floyd/Warshall & & $\mathcal{O}(n^3)$ \\
97
            \bottomrule
98
        \end{tabular}
99
    \end{table}
100

  
101
    \begin{table} [h]
102
        \caption{Complexity for the 2nd step: summing all pair-dependencies}
103
        \label{tab:complexity_of_algorithms}
104
        \begin{tabular}{l|l|l}
105
            \toprule
106
            Algorithm name & Complexity \\
107
            \midrule
108
            Naive summation & $\mathcal{O}(n^3)$ \\
109
            \cite{Brandes01afaster} &  $\mathcal{O}(mn)$\\
110
            \bottomrule
111
        \end{tabular}
112
    \end{table}
87
    This section summarizes the complexity for naive way, and for Brandes' algorithm to calculate betweenness centrality. The \cref{algo:basic_bc} run in $\mathcal{O}(n^3)$ for both unweighted and weighted graphs. And the Brandes's algorithm in \cref{algo:brandes_bc} can run in $\mathcal{O}(mn)$ for unweighted graph, and $\mathcal{O}(mn + n^2 \log n)$. The breaking down of the computational time of those two algorithms are given below.
88

  
89
    For \cref{algo:basic_bc}, the overall computation is dominated by line 2 to sum all the pair dependencies.
90

  
91
    \begin{algorithm}
92
    \caption{Basic way to calculate betweenness centrality} \label{algo:basic_bc}
93
    \begin{algorithmic}[1]
94
        \Procedure{Basic Betweenness Centrality}{}
95
            \State Solving APSP problem
96
            \State Summing all pair-dependencies
97
        \EndProcedure
98
    \end{algorithmic}
99
    \end{algorithm}
100

  
101
    For \cref{algo:brandes_bc}, within the \texttt{for} loop, for unweighted graph, each execution from line 2 - 3 executes in $\mathcal{O}(m)$. For weighted graph, solving SSSP problem dominates the computation with $O(m + n \log n)$ for unweighted graph. The \texttt{for} loop is executed $n$ times. Thus, we have the running time stated in the beginning of the section.
102

  
103
    \begin{algorithm}
104
    \caption{Brandes betweenness centrality} \label{algo:brandes_bc}
105
    \begin{algorithmic}[1]
106
        \Procedure{Brandes Betweenness Centrality}{}
107
            \For{ $s \in V$ do }
108
                \State Solving SSSP problem for source $s$ (with BFS or Dijkstra's algorithm)
109
                \State Finding all the dependencies of $s$ for all other vertices $v$ \cref{eq:partial_sum_dependency}
110
                \State Accumulate the betweenness centrality for vertices $v$ \cref{eq:accummulate_betweenness_centrality}
111
            \EndFor
112
        \EndProcedure
113
    \end{algorithmic}
114
    \end{algorithm}
115

  
113 116

  
114 117
\section{Heuristic algorithm for the exact betweenness centrality} \label{sec:computation_heuristic}
118
    We start the section with the high-level description of steps in the heuristic algorithm. We also provides definition for some term that deem to be important. For the details and all the proofs, reader can always refer to \cite{PuzisZEDB12}
119

  
120
    \citeauthor{PuzisZEDB12} uses the insight that computing betweenness centrality on tree can be done in linear time in \cite{PuzisZEDB12}. From now on this algorithm is denoted as HBC. He defined a heuristic algorithm to speed up the betweenness centrality calculation. The original graph is partitioned into one or many bi-connected components (BCCs). Then the bi-connected components tree is formed, and several analysis is performed on the BCC tree. The summary of HBC's steps are as follow:
121

  
122
    \begin{enumerate}
123
        \item Divide the graphs into bi-connected components and form BCC tree;
124
        \item Compute the \emph{Component Tree Weight}. This step involves the knowledge of the whole graph;
125
        \item Form the matrix with the \emph{communication intensity} value for each pair of vertices in a BCC;
126
        \item Calculate the betweenness centrality for each vertices in each BCC using \cref{eq:BC_for_v_heuristic};
127
        \item Finalize the betweenness centrality for the original graph.
128
    \end{enumerate}
129

  
130
    \subsection*{Partitioning a graph into bi-connected components}
131

  
132
        Given a graph $G=(V,E)$. If the removal of any $v\in V$ disconnects $G$, then $v$ is said to be a \emph{cut-point} (or \emph{articulation point}). A graph is said to be \emph{bi-connected }if at least 2 vertices must be removed to disconnect the graph. \cref{fig:puzis_bcc} from \cite{PuzisZEDB12} gives an example of partitioning the graph into BCCs.
133

  
134
        \begin{figure}
135
            \centering
136
            \includegraphics[scale=0.5]{images/puzis_bcc.png}
137
            \caption[BCC partition and cut-points]{BCC partition and cut-points. Reproduced from \cite{PuzisZEDB12}.}
138
            \label{fig:puzis_bcc}
139
        \end{figure}
140

  
141

  
142
    \subsection*{Component Tree Weight}
143
        The Component Tree Weight is intended to keep track of how many number of nodes can be reach via a cut-point of a specific BCC when that BCC is separated from the original graph.Let $B$ be a bi-connected component and $v \in B$ be a cut-point. Let $B^{+}(v)$ be the \emph{connected component} in G which includes $B$ after the removal of $v$. In another term, $B^{+}(v)$ include all the vertices in G that are connected to $v$ via $B$. An example is for a bi-connected component $BCC_B$ in \cref{fig:puzis_bcc}. After removing vertex $v$, the connected component ${BCC_B}^{+}(v) = \{5, 6, u, 1, 2, 3, 4\}$. The \emph{cut with} value is defined to be the size of $D_B^{v+} = |B^{+}(v)|$.
144

  
145
        Define $B^{-}(v)$ as a set of vertices that can only be reach through $v$ from $B$. Therefore, when $v$ is cut out from the graph, those vertices in $B^{+}(v)$ (a connected component including $B$ without the vertex $v$) cannot reach $B^{-}(v)$. The ``cut without'' value is defined to be $D_B^{v-} = |B^{-}(v)|$. For example, when removing vertex $v$ from $BCC_B$, ${BCC_B}^{-}(v) = \{7, 8, 9\}$. Those vertices cannot be reach from $B^{+}(v)$ anymore.
146

  
147
        The relation between ``cut with'' and ``cut without'' is as follow: $|B^{+}(v)| + 1 + |B^{+}(v)| = |V|$. It can be thought as the number of vertices.
148

  
149
        \cref{tab:component_tree_weight} shows the value assigned to Component Tree Weight after the Step 2.
150

  
151
        \begin{table}[h]
152
            \caption{Component Tree Weight calculation for \cref{fig:puzis_bcc}}
153
            \label{tab:component_tree_weight}
154
            \centering
155
            \begin{tabular}{l l l l}
156
                \toprule
157
                Component & Cut-point & ``Cut with'' value & ``Cut without'' value \\
158
                \midrule
159
                $BCC_A$ & u & 2 & 8 \\
160
                $BCC_B$ & u & 8 & 2 \\
161
                $BCC_C$ & v & 7 & 3 \\
162
                $BCC_D$ & v & 1 & 9 \\
163
                \bottomrule
164
            \end{tabular}
165
        \end{table}
166

  
167

  
168
    \subsection*{Communication intensity}
169
        For each bi-connected component $B_i$, we calculate the traffic matrix $h^{i}_{st}$ representing the \emph{communication intensity} between a pair of vertex $s$, $t$. The traffic matrix is symmetric. There are 3 cases:
170

  
171
        \begin{itemize}
172
            \item If $x$ and $y$ are not cut-points, then $h_{xy} = 0$ for $x = y$ and $h_{xy} = 1$ for $x \neq y$
173
            \item If $x$ is not a cut-point, and $y$ is a cut-point, then $h_{xy} = D_B^{y-} + 1$
174
            \item If both $x$ and $y$ are cut-points, then $h_{xy} = (D_B^{x-} + 1) \cdot (D_B^{y-} + 1)$
175
        \end{itemize}
176

  
177
    \subsection*{Betweenness computation for bi-connected component}
178
        The betweenness centrality calculated
179
        \begin{equation} \label{eq:BC_for_v_heuristic}
180
            c_B(v)_{v \in B_i} = \sum_{s, t \in B_i} \delta_{st}(v) \dot h_{st}^{i}
181
        \end{equation}
182
        And the heuristic algorithm also use the faster Brandes's algorithm to calculate betweenness centrality. Recall the equation for summing the dependency of $s \in V$ on any $v \in V$
183

  
184
        \begin{equation*}
185
            \delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w))
186
        \end{equation*}
187
        For any bi-connected component $B_i$, the dependency of $s \in B_i$ on any $v \in B_i$ can be written as follow:
188

  
189
        \begin{equation}
190
            \label{eq:dependency_heuristic_partial_sum}
191
            \delta_{s\LargerCdot}(v) = h^{i}_{sv} + \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot \delta_{s\LargerCdot}(w)
192
        \end{equation}
193

  
194
    \subsection*{Betweenness computation for original graph $G$}
195
        It was proven in the paper that for non-cut-point, the betweenness value calculated locally in each $B_i$ is equal to the betweenness value calculated on the whole graph.
196

  
197
        \begin{equation}
198
            c_B(v) = c_B(v)_{v \in B}
199
        \end{equation}
200
        For the cut-point betweenness value, we first have to figure out the inter-component communication, denoted as $BC^{inter}(v)$:
201

  
202
        \begin{equation}
203
            BC^{inter}(v) = \sum_{B \ni v} D^{v-}_{B} \cdot D^{v+}_{B}
204
        \end{equation}
205
        For cut-point $v$, the betweenness value on the whole graph is defined as:
206

  
207
        \begin{equation}
208
            c_B(v) = \sum_{B \ni v} c_B(v)_{v \in B} - BC^{inter}(v)
209
        \end{equation}

Also available in: Unified diff