Statistics
| Branch: | Revision:

## root / thesis / ch2_centrality.tex @ df1b1a42

 1 %!TEX root = main.tex  \chapter{Centrality} \label{sec:centrality}   Centrality\footnote{it is also refered to by other terms: centrality indices or centrality measures} is one the fundamental and useful measure in social network analysis to identify the influential vertices or edges in the network. However, how the vertices (or edges) are considered to be important depend on different centrality measures. In this thesis, we are concerned mainly with betweenness centrality, which belongs to the shortest-path based centrality according to the classification by \cite{Brandes:2005:NAM:1062400}.   Since \cite{Bavelas1948} first described the concept of centrality, many centrality indices were defined and used for different purposes. Since there is no single centrality indices that are supreme over all others, we should consider of which centrality indices to use in certain application.   In \cref{sec:centrality_definition} we will begin with the definition of centrality. Then in \cref{sec:centrality_shortest_path_based} we will go deeper on the family of shortest-path based centrality, starting from the stress centrality, moving on to the betweenness centrality, and finally the relative betweenness centrality.   For those who wants to have a broader knowledge on the centrality, then \citep{Brandes:2005:NAM:1062400} classified centrality for vertices and edges into nine families, and summarized them in great detail.   In different paper, different type of notations are used for the same type of concepts. Hence, to keep this report coherent, I will use the same of notation for the same concept, and this might different from the notation used in the original papers.  \section{Definition of centrality} \label{sec:centrality_definition}   Centrality index is defined by assigning the score representing the importance of nodes or edges in the network. When changing the way of assigning the score, we arrive at different centrality indices. But the underlying concept is that centrality indices must depend only on the structure of the network. \citet{Brandes:2005:NAM:1062400} pointed out the fact that even though there is no strict definition for centrality, we can still find the common ground shared by different centrality indices, and he coined the term \emph{Structure Index}   \theoremstyle{definition}   \begin{definition}{Structure Index}   Let $G=(V,E)$ be a weighted, directed or undirected multigraph and let $X$ represent the set of vertices or edges of $G$, respectively. A real-valued function $s$ is called a structural index if and only if the following condition is satisfied: $\forall x \in X: G \simeq H \Rightarrow s_G(x) = s_H(\phi(x)),$ where $s_G(x)$ denotes the value of $s(x)$ in G   \end{definition}  \section{Shortest-path based centrality} \label{sec:centrality_shortest_path_based}   This section presents two types of shortest-path based centrality. Shortest paths are defined on both vertices and edges. Basically those two centrality indices are based on the number of shortest paths between a pair of vertices or edges. \cref{sec:centrality_stress_centrality} and \cref{sec:centrality_bc} will present those two measures in deep.   In communication, information does not only flow in the shortest path, where the shortest path might represent the path where the total traverse time is shortest. However, in a lot of application, shortest path are chosen to forward information through the network. For example, in \textbf{XXXX is OLSR use shortest path to forward packets in the network}.   \subsubsection{Stress Centrality} \label{sec:centrality_stress_centrality}   The concept was by \cite{Shimblel1953}.   Different type of network (such as in wireless community network, in social networks) calls for different centrality indices.   \textbf{XXX Find out the application of stress centrality}   Formally:     \label{eq:stress_centrality_long}   c_S(v) = \sum_{s \neq v \in V} \sum_{t \neq v \in V} \sigma_{st}(v)     where $\sigma_{st}(v)$ denotes the number of shortest paths containing $v$.   To simplify the notation for \cref{eq:stress_centrality_long}, we might write it as follow:     \label{eq:stress_centrality}   c_S(v) = \sum_{s \neq t \neq v \in V} \sigma_{st}(v)     \subsubsection{Betweenness Centrality} \label{sec:centrality_bc}   The concept of \emph{betweenness centrality} was introduced independently by \citet{Freeman1977} and \citet{Anthonisse1971}. It can be viewed as a relative version of stress centrality. Here we will first define the betweenness centrality, then we will go on with the motivation for betweenness centrality and its application.   We define $\delta_{st}(v)$ - the   \emph{pair-dependency}   \footnote{In \cite{Freeman1978215}, the term pair-dependency is equivalent with the term \emph{dependency} used in \cite{Brandes01afaster}. To keep consistency, in this thesis, the definition of pair-dependency and dependency follow the \cite{Brandes01afaster} }   of a pair $s, t \in V$ on an intermediary $v \in V$.     \label{eq:pair_dependency}   \delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}}     where $\sigma_{st}(v)$) is the number of shortest paths from $s$ to $t$ that are passing through $v$. And $\sigma_{st}$ is the number of shortest paths from $s$ to $t$. $\delta_{st}(v)$ represents the probability that vertex $v$ falls randomly on any of the shortest paths connecting $s$ and $t$. Assume that the communication in the network follows the shortest path, then $\delta_{st}(v)$ can be interpreted as the probability that vertex $v$ can involve, intercept, enhance or inhibit the communication between $s$ and $t$.   The betweenness centrality of a single vertex $c_B(v)$ is defined as:     \label{eq:betweenness_centrality}   c_B(v) = \sum_{s \neq t \neq v \in V} \delta_{st}(v) = \sum_{s \neq t \neq v \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}     introduced \cite{Freeman1977} \textbf{XXX} the betweenness centrality to address the problem of being unable to compare the potential for control for vertices in different networks with stress centrality. For example, in \cref{fig:stress_vs_bc_centrality}, the stress centrality for vertices $v, u_i$ are the same. But when removing one vertex $v$, all vertices in top row of the graph are disconnected from all the vertices in the bottom row. For the network with $u_i$, for example, removing one or two vertices $u_1, u_2$ does not disconnect the network. The information can still flow freely from the top row to the bottom row following the path passing through the remaining vertex $u_3$. Therefore, using stress centrality alone, we cannot decide on whether some node are more critical for the network than the others, and betweenness centrality came up as a measure to capture the potential for control a vertex has on the network.   \begin{figure}[h]   \caption{   $c_S(u_i) = 16$ and $c_B(u_i) = \frac{1}{3}, i = 1, 2, 3$ and $c_S(v) = 16$ but $c_B(v) = 1$. It shows that stress centrality cannot be used to determine to potential for control a vertex has. This example is taken from \cite{Brandes:2005:NAM:1062400}   }   \label{fig:stress_vs_bc_centrality}   \centering   \includegraphics[scale=1.0]{images/stress_vs_bc_centrality.png}   \end{figure}     \label{eq:max_stress_centrality}   \max c_B(v) = \frac{n^2 - 3n + 2}{2}     \textbf{XXX The application of BC}   \subsubsection{Relative Betweenness Centrality}   The so-called relative betweenness centrality measure $c'_B(v)$ was also introduced by \cite{Freeman1977}:     \label{eq:bc_relative}   c'_B(v) = \frac{2 c_B(v)}{n^2 - 3n + 2}     where $c_B(v)$ is the betweenness centrality defined above, and $n$ is the number of vertices in the network.   \citeauthor{Freeman1977} argued that for vertices $v, u$ to have the same betweenness centrality only mean that they have the same potential for control in absolute terms. That means they can facilitate or inhibit the same number of communitions. Note, we implicitly assume that all communications are conducted along shortest paths. However, the $c_B$ does not show the relative potential for control within the network. \cref{fig:bc_vs_bc_relative} illustrate that even though $c_B(v) = c_B(u_i) = 3, i = 1, 2, 3$, the potential for control of vertex $v$ is much larger than vertex $u_i$. For example, removing vertex $v$ and the network will be disconnected and no communication can happen between vertices. Therefore, $v$ have a total control of the network. Meanwhile, removing any $u_i$ does not have that same disastrous effect since each $u_i$ only control part of the communications between pair of vertices.   \begin{figure}[h]   \caption{   $c_B(v) = 3$ and $c'_B(v) = 1$, and $c_B(u_i) = 3, i = 1, 2, 3$ but $c'_B(u_i) = 0.2$. It shows that the same betweenness centrality for vertices $v, u_i$ for 2 different networks does not equal to the same potential for control for their respective networks.   }   \label{fig:bc_vs_bc_relative}   \centering   \begin{subfigure}{.45\textwidth}   % \centering   \includegraphics[scale=0.3]{images/star_3.png}   \caption{A subfigure}   \label{fig:sub1}   \end{subfigure}%   \begin{subfigure}{.45\textwidth}   \centering   \includegraphics[scale=0.3]{images/star_6.png}   \caption{A subfigure}   \label{fig:sub2}   \end{subfigure}   \end{figure}