commit c6e2ce5aa5f9d0997081d855b59065926550e3c2
Author: Quynh PX Nguyen
Date: Fri Apr 1 12:36:16 2016 +0200
[Thesis] Final LaTeX version
diff git a/thesis/ch2_centrality.tex b/thesis/ch2_centrality.tex
index dca23a4..d96ef35 100644
 a/thesis/ch2_centrality.tex
+++ b/thesis/ch2_centrality.tex
@@ 1,122 +1,115 @@
%!TEX root = main.tex
\chapter{Centrality} \label{sec:centrality}
 Centrality\footnote{it is also refered to by other terms: centrality indices or centrality measures} is one the fundamental and useful measure in social network analysis to identify the influential vertices or edges in the network. However, how the vertices (or edges) are considered to be important depend on different centrality measures. In this thesis, we are concerned mainly with betweenness centrality, which belongs to the shortestpath based centrality according to the classification by \cite{Brandes:2005:NAM:1062400}.
+ Centrality\footnote{it is also referred to by other terms: centrality indices or centrality measures} is one the fundamental and useful measure in social network analysis to identify the influential vertices or edges in the network. However, how the vertices (or edges) are considered to be important depend on different centrality measures. In this thesis, we are concerned mainly with betweenness centrality, which belongs to the shortestpath based centrality according to the classification by \cite{Brandes:2005:NAM:1062400}.
 Since \cite{Bavelas1948} first described the concept of centrality, many centrality indices were defined and used for different purposes. Since there is no single centrality indices that are supreme over all others, we should consider of which centrality indices to use in certain application.
+ Since \cite{Bavelas1948} first described the concept of centrality, many centrality indices were defined and used for different purposes. Since there is no single centrality indices that are supreme over all others, we should consider which centrality indices to use in certain application.
 In \cref{sec:centrality_definition} we will begin with the definition of centrality. Then in \cref{sec:centrality_shortest_path_based} we will go deeper on the family of shortestpath based centrality, starting from the stress centrality, moving on to the betweenness centrality, and finally the relative betweenness centrality.
+ In \cref{sec:centrality_definition} we begin with the definition of centrality. Then in \cref{sec:centrality_shortest_path_based} we go deeper on the family of shortestpath based centrality, starting from the stress centrality, moving on to the betweenness centrality, and finally the relative betweenness centrality.
For those who wants to have a broader knowledge on the centrality, then \citep{Brandes:2005:NAM:1062400} classified centrality for vertices and edges into nine families, and summarized them in great detail.
 In different paper, different type of notations are used for the same type of concepts. Hence, to keep this report coherent, I will use the same of notation for the same concept, and this might different from the notation used in the original papers.
+ In different paper, different type of notations are used for the same type of concepts. Hence, to keep this report coherent, I use the same of notation for the same concept, and this might different from the notation used in the original papers.
\section{Definition of centrality} \label{sec:centrality_definition}
 Centrality index is defined by assigning the score representing the importance of nodes or edges in the network. When changing the way of assigning the score, we arrive at different centrality indices. But the underlying concept is that centrality indices must depend only on the structure of the network. \citet{Brandes:2005:NAM:1062400} pointed out the fact that even though there is no strict definition for centrality, we can still find the common ground shared by different centrality indices, and he coined the term \emph{Structure Index}
+ Centrality index can be roughly defined as a function assigning score representing the importance for nodes or edges in the network. When changing the way of assigning the score for each vertex, we arrive at different centrality indices. However, we cannot have arbitrary function assigning random values for nodes in graph, and call those values the centrality.
+
+ \citet{Brandes:2005:NAM:1062400} pointed out that even though there is no strict definition for centrality, we can still find the common ground shared by different centrality indices. The underlying concept is that centrality indices must depend only on the structure of the network. To put it simply, centrality indices a way of assigning a score for each vertex (or edge) in the network, based completely on structure of the graph. Formally, he coined the term \emph{Structural Index}, and stated that a centrality index $c$ is required to be a structural index.
\theoremstyle{definition}
 \begin{definition}{Structure Index}
+ \begin{definition}{\textbf{Structural Index}}
Let $G=(V,E)$ be a weighted, directed or undirected multigraph and let $X$ represent the set of vertices or edges of $G$, respectively. A realvalued function $s$ is called a structural index if and only if the following condition is satisfied: $\forall x \in X: G \simeq H \Rightarrow s_G(x) = s_H(\phi(x)),$ where $s_G(x)$ denotes the value of $s(x)$ in G
\end{definition}
+ Before beginning with the main section describing betweenness centrality, a simple centrality measure called \emph{degree centrality} is going to be presented, so that reader can have an easytounderstand example on what is centrality. The score of degree centrality for each node is equal to the number of connection that node has. See \cref{fig:degree_centrality}. Then later, we move on to more complex centrality measurements based on shortestpaths.
+
+ \begin{figure}[hbp]
+ \centering
+ \includegraphics{images/line_graph_degree_horizontal.png}
+ \caption[Example for degree centrality]{Degree centrality score for each node}
+ \label{fig:degree_centrality}
+ \end{figure}
+
+
\section{Shortestpath based centrality} \label{sec:centrality_shortest_path_based}
 This section presents two types of shortestpath based centrality. Shortest paths are defined on both vertices and edges. Basically those two centrality indices are based on the number of shortest paths between a pair of vertices or edges. \cref{sec:centrality_stress_centrality} and \cref{sec:centrality_bc} will present those two measures in deep.
+ This section presents two types of shortestpath based centrality: \emph{stress centrality}, and \emph{betweenness centrality}. Basically those two centrality indices are based on the set of shortest paths between a pair of vertices or edges. \cref{sec:centrality_stress_centrality} and \cref{sec:centrality_bc} presents those two measures in deep.
 In communication, information does not only flow in the shortest path, where the shortest path might represent the path where the total traverse time is shortest. However, in a lot of application, shortest path are chosen to forward information through the network. For example, in \textbf{XXXX is OLSR use shortest path to forward packets in the network}.
+ In communication, information does not only flow in the shortest path, where the shortest path might represent the path where the total traversing time is shortest. However, in a lot of application, shortest path are chosen to forward information through the network. For example, OLSR routing protocol provides shortest path routes for all nodes in the network. And OLSR routing protocol is used widely in Wireless Community Networks  the network that we want to apply centrality to improve their existing routing protocol.
\subsubsection{Stress Centrality} \label{sec:centrality_stress_centrality}
 The concept was by \cite{Shimblel1953}.
+ The concept was presented by \cite{Shimblel1953}, the idea is the know quantify how much information is flowing through a vertex in a network. With that definition, if a vertex $v$ lies on more shortest paths, then more information is passed through $v$ and it must handle a higher workload, or ``stress''. Formally, it is defined in \cref{eq:stress_centrality_long}. It is the summation over all possible source $s$, and all possible target $t$ that are different than $v$.
+
+ \begin{equation}
+ \label{eq:stress_centrality_long}
+ c_S(v) = \sum_{s \neq v} \sum_{t \neq v} \sigma_{st}(v)
+ \end{equation}
+ where $\sigma_{st}(v)$ denotes the number of shortest paths containing $v$. To simplify the notation for \cref{eq:stress_centrality_long}, we might write it as follow:
+
+ \begin{equation}
+ \label{eq:stress_centrality}
+ c_S(v) = \sum_{s \neq t \neq v} \sigma_{st}(v)
+ \end{equation}
+ However for the clarity, I will stick with the notation in \cref{eq:stress_centrality_long}.
 Different type of network (such as in wireless community network, in social networks) calls for different centrality indices.
+ Note that the formula for stress centrality $c_S(v)$ does not include the shortest paths that start or end in $v$. And the stress centrality is calculated for all shortest paths between any pair of vertices.
 \textbf{XXX Find out the application of stress centrality}
 Formally:
+ \subsubsection{Betweenness Centrality} \label{sec:centrality_bc}
+ The concept of \emph{betweenness centrality} was introduced independently by \citet{Freeman1977} and \citet{Anthonisse1971}. It can be viewed as a relative version of stress centrality. Here we first define the betweenness centrality, then continue with the motivation for betweenness centrality and its application.
+
+ We define $\delta_{st}(v)$  the
+ \emph{pairdependency}\footnote{In \cite{Freeman1978215}, the term pairdependency is equivalent with the term \emph{dependency} used in \cite{Brandes01afaster}. To keep consistency, in this thesis, the definition of pairdependency and dependency follow the \cite{Brandes01afaster} }
+ of a pair $s, t \in V$ on an intermediary $v \in V$.
\begin{equation}
 \label{eq:stress_centrality_long}
 c_S(v) = \sum_{s \neq v \in V} \sum_{t \neq v \in V} \sigma_{st}(v)
+ \label{eq:pair_dependency}
+ \delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}}
\end{equation}
+ where $\sigma_{st}(v)$) is the number of shortest paths from $s$ to $t$ that are passing through $v$. And $\sigma_{st}$ is the number of shortest paths from $s$ to $t$. $\delta_{st}(v)$ represents the probability that vertex $v$ falls randomly on any of the shortest paths connecting $s$ and $t$. Assume that the communication in the network follows the shortest path, then $\delta_{st}(v)$ can be interpreted as the probability that vertex $v$ can involve, intercept, enhance or inhibit the communication between $s$ and $t$.
 where $\sigma_{st}(v)$ denotes the number of shortest paths containing $v$.
+ The betweenness centrality of a single vertex $c_B(v)$ is defined as:
+
+ \begin{equation}
+ \label{eq:betweenness_centrality}
+ c_B(v) = \sum_{s \neq v} \sum_{t \neq v} \delta_{st}(v) = \sum_{s \neq v} \sum_{t \neq v} \frac{\sigma_{st}(v)}{\sigma_{st}}
+ \end{equation}
 To simplify the notation for \cref{eq:stress_centrality_long}, we might write it as follow:
+ From the original formula for betweenness centrality in \cref{eq:betweenness_centrality}, several variants for betweenness centrality was introduced in \cite{Brandes2008136}, such as edge betweenness centrality, group betweenness centrality, etc. The one we are mostly interested in is the betweenness centrality with endpoints vertices included. It means that the $c_S(v)$ take into account even the shortest paths starting or ending with $v$, and it doesn't require the source $s$ or the target vertex $t$ to be different from $v$. \cref{eq:betweenness_centrality_endpoints_inclusion} shows the formula for betweenness centrality with source and target included.
\begin{equation}
 \label{eq:stress_centrality}
 c_S(v) = \sum_{s \neq t \neq v \in V} \sigma_{st}(v)
+ \label{eq:betweenness_centrality_endpoints_inclusion}
+ c_B(v) = \sum_{s \in V} \sum_{t \in V} \delta_{st}(v) = \sum_{s \in V} \sum_{t \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}
\end{equation}
 \subsubsection{Betweenness Centrality} \label{sec:centrality_bc}
+ \subsubsection{Relative Betweenness Centrality}
+ After introducing betweenness centrality, \citeauthor{Freeman1977} presented another measurement \cite{Freeman1977}, called \emph{relative betweenness centrality} under the argument that betweenness centrality cannot be used to compare the influential of vertices from network of different size (e.g. different number of nodes).
+
+ He argued that for vertices $v, u$ to have the same betweenness centrality only mean that they have the same potential for control in absolute terms. That means they can facilitate or inhibit the same number of communications. Note, we implicitly assume that all communications are conducted along shortest paths. However, the $c_B$ does not show the relative potential for control within the network. \cref{fig:bc_vs_bc_relative} illustrate that even though $c_B(v) = c_B(u_i) = 3, i = 1, 2, 3, 4$, the potential for control of vertex $v$ is much larger than vertex $u_i$. For example, removing vertex $v$ and the network would be disconnected and no communication can happen between vertices. Therefore, $v$ have a total control of the network. Meanwhile, removing any $u_i$ does not have that same disastrous effect since each $u_i$ only control part of the communications between pair of vertices.
+
+ \begin{figure}[h]
+ \centering
+ \begin{tabular}{c c}
+ \subfloat[fig 1][BC]{\includegraphics[scale=0.3]{images/star_3_bc.png}} &
+ \subfloat[fig 2][BC]{\includegraphics[scale=0.3]{images/star_6_bc.png}} \\
+ \subfloat[fig 1][Relative BC]{\includegraphics[scale=0.3]{images/star_3_bcrelative.png}} &
+ \subfloat[fig 2][Relative BC]{\includegraphics[scale=0.3]{images/star_6_bcrelative.png}} \\
+ \end{tabular}
+ \caption[Comparision for Betweenness Centrality and Relative Betweenness Centrality]{
+ Comparision for Betweenness Centrality and Relative Betweenness Centrality: We see that even though the red nodes have the same betweenness centrality score, it is clearly that the red node of the left graph has more control over the traffic flow within the network. For example, when the red node of the left graph is destroyed, the graph is disconnected, and no information can flow between blue nodes. On the other hands, if any red node of the right graph is cut out from the graph, the graph is still connected. The relative betweenness centrality can reflect the important of the red node for both graphs better, by setting it to $1$ for the left graph, and $0.2$ for the right graph.
+ }
+ \label{fig:bc_vs_bc_relative}
+ \end{figure}
+
+ When a vertex of interest $v$ is the center node of the star graph, just as \cref{fig:bc_vs_bc_relative} (a), its betweenness centrality score is the biggest score that a graph with $n$ nodes can have. And the maximum betweenness centrality score that a vertex can achieve is shown be to:
+ \begin{equation}
+ \label{eq:max_bc_score}
+ \max c_B(v) = \frac{n^2  3n + 2}{2}
+ \end{equation}
 The concept of \emph{betweenness centrality} was introduced independently by \citet{Freeman1977} and \citet{Anthonisse1971}. It can be viewed as a relative version of stress centrality. Here we will first define the betweenness centrality, then we will go on with the motivation for betweenness centrality and its application.

 We define $\delta_{st}(v)$  the
 \emph{pairdependency}
 \footnote{In \cite{Freeman1978215}, the term pairdependency is equivalent with the term \emph{dependency} used in \cite{Brandes01afaster}. To keep consistency, in this thesis, the definition of pairdependency and dependency follow the \cite{Brandes01afaster} }
 of a pair $s, t \in V$ on an intermediary $v \in V$.

 \begin{equation}
 \label{eq:pair_dependency}
 \delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}}
 \end{equation}

 where $\sigma_{st}(v)$) is the number of shortest paths from $s$ to $t$ that are passing through $v$. And $\sigma_{st}$ is the number of shortest paths from $s$ to $t$. $\delta_{st}(v)$ represents the probability that vertex $v$ falls randomly on any of the shortest paths connecting $s$ and $t$. Assume that the communication in the network follows the shortest path, then $\delta_{st}(v)$ can be interpreted as the probability that vertex $v$ can involve, intercept, enhance or inhibit the communication between $s$ and $t$.

 The betweenness centrality of a single vertex $c_B(v)$ is defined as:

 \begin{equation}
 \label{eq:betweenness_centrality}
 c_B(v) = \sum_{s \neq t \neq v \in V} \delta_{st}(v) = \sum_{s \neq t \neq v \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}
 \end{equation}

 introduced \cite{Freeman1977} \textbf{XXX} the betweenness centrality to address the problem of being unable to compare the potential for control for vertices in different networks with stress centrality. For example, in \cref{fig:stress_vs_bc_centrality}, the stress centrality for vertices $v, u_i$ are the same. But when removing one vertex $v$, all vertices in top row of the graph are disconnected from all the vertices in the bottom row. For the network with $u_i$, for example, removing one or two vertices $u_1, u_2$ does not disconnect the network. The information can still flow freely from the top row to the bottom row following the path passing through the remaining vertex $u_3$. Therefore, using stress centrality alone, we cannot decide on whether some node are more critical for the network than the others, and betweenness centrality came up as a measure to capture the potential for control a vertex has on the network.

 \begin{figure}[h]
 \caption{
 $c_S(u_i) = 16$ and $c_B(u_i) = \frac{1}{3}, i = 1, 2, 3$ and $c_S(v) = 16$ but $c_B(v) = 1$. It shows that stress centrality cannot be used to determine to potential for control a vertex has. This example is taken from \cite{Brandes:2005:NAM:1062400}
 }
 \label{fig:stress_vs_bc_centrality}
 \centering
 \includegraphics[scale=1.0]{images/stress_vs_bc_centrality.png}
 \end{figure}

 \begin{equation}
 \label{eq:max_stress_centrality}
 \max c_B(v) = \frac{n^2  3n + 2}{2}
 \end{equation}

 \textbf{XXX The application of BC}

 \subsubsection{Relative Betweenness Centrality}
 The socalled relative betweenness centrality measure $c'_B(v)$ was also introduced by \cite{Freeman1977}:

 \begin{equation}
 \label{eq:bc_relative}
 c'_B(v) = \frac{2 c_B(v)}{n^2  3n + 2}
 \end{equation}

 where $c_B(v)$ is the betweenness centrality defined above, and $n$ is the number of vertices in the network.

 \citeauthor{Freeman1977} argued that for vertices $v, u$ to have the same betweenness centrality only mean that they have the same potential for control in absolute terms. That means they can facilitate or inhibit the same number of communitions. Note, we implicitly assume that all communications are conducted along shortest paths. However, the $c_B$ does not show the relative potential for control within the network. \cref{fig:bc_vs_bc_relative} illustrate that even though $c_B(v) = c_B(u_i) = 3, i = 1, 2, 3$, the potential for control of vertex $v$ is much larger than vertex $u_i$. For example, removing vertex $v$ and the network will be disconnected and no communication can happen between vertices. Therefore, $v$ have a total control of the network. Meanwhile, removing any $u_i$ does not have that same disastrous effect since each $u_i$ only control part of the communications between pair of vertices.

 \begin{figure}[h]
 \caption{
 $c_B(v) = 3$ and $c'_B(v) = 1$, and $c_B(u_i) = 3, i = 1, 2, 3$ but $c'_B(u_i) = 0.2$. It shows that the same betweenness centrality for vertices $v, u_i$ for 2 different networks does not equal to the same potential for control for their respective networks.
 }
 \label{fig:bc_vs_bc_relative}
 \centering
 \begin{subfigure}{.45\textwidth}
 % \centering
 \includegraphics[scale=0.3]{images/star_3.png}
 \caption{A subfigure}
 \label{fig:sub1}
 \end{subfigure}%
 \begin{subfigure}{.45\textwidth}
 \centering
 \includegraphics[scale=0.3]{images/star_6.png}
 \caption{A subfigure}
 \label{fig:sub2}
 \end{subfigure}
 \end{figure}
+ The socalled \emph{relative betweenness centrality} measure $c'_B(v)$ is the fraction between thet betweenness centrality $c_B(v)$ in \cref{eq:betweenness_centrality} divided by its possible maximum score. See \cref{eq:bc_relative} for the formula of relative betweenness centrality $c'_B(v)$:
+ \begin{equation}
+ \label{eq:bc_relative}
+ c'_B(v) = \frac{2 c_B(v)}{n^2  3n + 2}
+ \end{equation}
+ where $c_B(v)$ is the betweenness centrality defined in \cref{eq:betweenness_centrality}, and $n$ is the number of vertices in the network.
\ No newline at end of file