Statistics
| Branch: | Revision:

## root / thesis / ch2_centrality.tex @ c6e2ce5a

 1 %!TEX root = main.tex  \chapter{Centrality} \label{sec:centrality}   Centrality\footnote{it is also referred to by other terms: centrality indices or centrality measures} is one the fundamental and useful measure in social network analysis to identify the influential vertices or edges in the network. However, how the vertices (or edges) are considered to be important depend on different centrality measures. In this thesis, we are concerned mainly with betweenness centrality, which belongs to the shortest-path based centrality according to the classification by \cite{Brandes:2005:NAM:1062400}.   Since \cite{Bavelas1948} first described the concept of centrality, many centrality indices were defined and used for different purposes. Since there is no single centrality indices that are supreme over all others, we should consider which centrality indices to use in certain application.   In \cref{sec:centrality_definition} we begin with the definition of centrality. Then in \cref{sec:centrality_shortest_path_based} we go deeper on the family of shortest-path based centrality, starting from the stress centrality, moving on to the betweenness centrality, and finally the relative betweenness centrality.   For those who wants to have a broader knowledge on the centrality, then \citep{Brandes:2005:NAM:1062400} classified centrality for vertices and edges into nine families, and summarized them in great detail.   In different paper, different type of notations are used for the same type of concepts. Hence, to keep this report coherent, I use the same of notation for the same concept, and this might different from the notation used in the original papers.  \section{Definition of centrality} \label{sec:centrality_definition}   Centrality index can be roughly defined as a function assigning score representing the importance for nodes or edges in the network. When changing the way of assigning the score for each vertex, we arrive at different centrality indices. However, we cannot have arbitrary function assigning random values for nodes in graph, and call those values the centrality.   \citet{Brandes:2005:NAM:1062400} pointed out that even though there is no strict definition for centrality, we can still find the common ground shared by different centrality indices. The underlying concept is that centrality indices must depend only on the structure of the network. To put it simply, centrality indices a way of assigning a score for each vertex (or edge) in the network, based completely on structure of the graph. Formally, he coined the term \emph{Structural Index}, and stated that a centrality index $c$ is required to be a structural index.   \theoremstyle{definition}   \begin{definition}{\textbf{Structural Index}}   Let $G=(V,E)$ be a weighted, directed or undirected multigraph and let $X$ represent the set of vertices or edges of $G$, respectively. A real-valued function $s$ is called a structural index if and only if the following condition is satisfied: $\forall x \in X: G \simeq H \Rightarrow s_G(x) = s_H(\phi(x)),$ where $s_G(x)$ denotes the value of $s(x)$ in G   \end{definition}   Before beginning with the main section describing betweenness centrality, a simple centrality measure called \emph{degree centrality} is going to be presented, so that reader can have an easy-to-understand example on what is centrality. The score of degree centrality for each node is equal to the number of connection that node has. See \cref{fig:degree_centrality}. Then later, we move on to more complex centrality measurements based on shortest-paths.   \begin{figure}[hbp]   \centering   \includegraphics{images/line_graph_degree_horizontal.png}   \caption[Example for degree centrality]{Degree centrality score for each node}   \label{fig:degree_centrality}   \end{figure}  \section{Shortest-path based centrality} \label{sec:centrality_shortest_path_based}   This section presents two types of shortest-path based centrality: \emph{stress centrality}, and \emph{betweenness centrality}. Basically those two centrality indices are based on the set of shortest paths between a pair of vertices or edges. \cref{sec:centrality_stress_centrality} and \cref{sec:centrality_bc} presents those two measures in deep.   In communication, information does not only flow in the shortest path, where the shortest path might represent the path where the total traversing time is shortest. However, in a lot of application, shortest path are chosen to forward information through the network. For example, OLSR routing protocol provides shortest path routes for all nodes in the network. And OLSR routing protocol is used widely in Wireless Community Networks -- the network that we want to apply centrality to improve their existing routing protocol.   \subsubsection{Stress Centrality} \label{sec:centrality_stress_centrality}   The concept was presented by \cite{Shimblel1953}, the idea is the know quantify how much information is flowing through a vertex in a network. With that definition, if a vertex $v$ lies on more shortest paths, then more information is passed through $v$ and it must handle a higher workload, or stress''. Formally, it is defined in \cref{eq:stress_centrality_long}. It is the summation over all possible source $s$, and all possible target $t$ that are different than $v$.     \label{eq:stress_centrality_long}   c_S(v) = \sum_{s \neq v} \sum_{t \neq v} \sigma_{st}(v)     where $\sigma_{st}(v)$ denotes the number of shortest paths containing $v$. To simplify the notation for \cref{eq:stress_centrality_long}, we might write it as follow:     \label{eq:stress_centrality}   c_S(v) = \sum_{s \neq t \neq v} \sigma_{st}(v)     However for the clarity, I will stick with the notation in \cref{eq:stress_centrality_long}.   Note that the formula for stress centrality $c_S(v)$ does not include the shortest paths that start or end in $v$. And the stress centrality is calculated for all shortest paths between any pair of vertices.   \subsubsection{Betweenness Centrality} \label{sec:centrality_bc}   The concept of \emph{betweenness centrality} was introduced independently by \citet{Freeman1977} and \citet{Anthonisse1971}. It can be viewed as a relative version of stress centrality. Here we first define the betweenness centrality, then continue with the motivation for betweenness centrality and its application.   We define $\delta_{st}(v)$ - the   \emph{pair-dependency}\footnote{In \cite{Freeman1978215}, the term pair-dependency is equivalent with the term \emph{dependency} used in \cite{Brandes01afaster}. To keep consistency, in this thesis, the definition of pair-dependency and dependency follow the \cite{Brandes01afaster} }   of a pair $s, t \in V$ on an intermediary $v \in V$.     \label{eq:pair_dependency}   \delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}}     where $\sigma_{st}(v)$) is the number of shortest paths from $s$ to $t$ that are passing through $v$. And $\sigma_{st}$ is the number of shortest paths from $s$ to $t$. $\delta_{st}(v)$ represents the probability that vertex $v$ falls randomly on any of the shortest paths connecting $s$ and $t$. Assume that the communication in the network follows the shortest path, then $\delta_{st}(v)$ can be interpreted as the probability that vertex $v$ can involve, intercept, enhance or inhibit the communication between $s$ and $t$.   The betweenness centrality of a single vertex $c_B(v)$ is defined as:     \label{eq:betweenness_centrality}   c_B(v) = \sum_{s \neq v} \sum_{t \neq v} \delta_{st}(v) = \sum_{s \neq v} \sum_{t \neq v} \frac{\sigma_{st}(v)}{\sigma_{st}}     From the original formula for betweenness centrality in \cref{eq:betweenness_centrality}, several variants for betweenness centrality was introduced in \cite{Brandes2008136}, such as edge betweenness centrality, group betweenness centrality, etc. The one we are mostly interested in is the betweenness centrality with endpoints vertices included. It means that the $c_S(v)$ take into account even the shortest paths starting or ending with $v$, and it doesn't require the source $s$ or the target vertex $t$ to be different from $v$. \cref{eq:betweenness_centrality_endpoints_inclusion} shows the formula for betweenness centrality with source and target included.     \label{eq:betweenness_centrality_endpoints_inclusion}   c_B(v) = \sum_{s \in V} \sum_{t \in V} \delta_{st}(v) = \sum_{s \in V} \sum_{t \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}     \subsubsection{Relative Betweenness Centrality}   After introducing betweenness centrality, \citeauthor{Freeman1977} presented another measurement \cite{Freeman1977}, called \emph{relative betweenness centrality} under the argument that betweenness centrality cannot be used to compare the influential of vertices from network of different size (e.g. different number of nodes).   He argued that for vertices $v, u$ to have the same betweenness centrality only mean that they have the same potential for control in absolute terms. That means they can facilitate or inhibit the same number of communications. Note, we implicitly assume that all communications are conducted along shortest paths. However, the $c_B$ does not show the relative potential for control within the network. \cref{fig:bc_vs_bc_relative} illustrate that even though $c_B(v) = c_B(u_i) = 3, i = 1, 2, 3, 4$, the potential for control of vertex $v$ is much larger than vertex $u_i$. For example, removing vertex $v$ and the network would be disconnected and no communication can happen between vertices. Therefore, $v$ have a total control of the network. Meanwhile, removing any $u_i$ does not have that same disastrous effect since each $u_i$ only control part of the communications between pair of vertices.   \begin{figure}[h]   \centering   \begin{tabular}{c c}   \subfloat[fig 1][BC]{\includegraphics[scale=0.3]{images/star_3_bc.png}} &   \subfloat[fig 2][BC]{\includegraphics[scale=0.3]{images/star_6_bc.png}} \\   \subfloat[fig 1][Relative BC]{\includegraphics[scale=0.3]{images/star_3_bcrelative.png}} &   \subfloat[fig 2][Relative BC]{\includegraphics[scale=0.3]{images/star_6_bcrelative.png}} \\   \end{tabular}   \caption[Comparision for Betweenness Centrality and Relative Betweenness Centrality]{   Comparision for Betweenness Centrality and Relative Betweenness Centrality: We see that even though the red nodes have the same betweenness centrality score, it is clearly that the red node of the left graph has more control over the traffic flow within the network. For example, when the red node of the left graph is destroyed, the graph is disconnected, and no information can flow between blue nodes. On the other hands, if any red node of the right graph is cut out from the graph, the graph is still connected. The relative betweenness centrality can reflect the important of the red node for both graphs better, by setting it to $1$ for the left graph, and $0.2$ for the right graph.   }   \label{fig:bc_vs_bc_relative}   \end{figure}   When a vertex of interest $v$ is the center node of the star graph, just as \cref{fig:bc_vs_bc_relative} (a), its betweenness centrality score is the biggest score that a graph with $n$ nodes can have. And the maximum betweenness centrality score that a vertex can achieve is shown be to:     \label{eq:max_bc_score}   \max c_B(v) = \frac{n^2 - 3n + 2}{2}     The so-called \emph{relative betweenness centrality} measure $c'_B(v)$ is the fraction between thet betweenness centrality $c_B(v)$ in \cref{eq:betweenness_centrality} divided by its possible maximum score. See \cref{eq:bc_relative} for the formula of relative betweenness centrality $c'_B(v)$:     \label{eq:bc_relative}   c'_B(v) = \frac{2 c_B(v)}{n^2 - 3n + 2}     where $c_B(v)$ is the betweenness centrality defined in \cref{eq:betweenness_centrality}, and $n$ is the number of vertices in the network.