Revision c6e2ce5a
thesis/acknowledgements.tex  

1 
%!TEX root = main.tex 

2 
\vspace*{\fill} 

3  
4 
\chapter*{Acknowledgements} 

5 
This work is done under the supervision of Prof. Renato Lo Cigno Antonio and cosupervior Leonardo Maccari. I would like to express my sincere gratitude for their throughout guidance. They have been immensely patient with my progress, given me a little push every now and then to keep me on track. 

6  
7 
I would like to thank Opera Universita, University of Trento and Erasmus+ program for funding my Master study, giving me a chance to study in a worldclass university, and also in University of Edinburgh in an exchange program. Thank you for giving me education. 

8  
9 
Thank you friends, for your supports and encouragement. 

10  
11 
And finally, I want to dedicate these last lines to my family in Vietnam. 

12  
13 
[Written in Vietnamese] Cam on Ba Me da tin tuong con, khong tao ap luc len chuyen hoc hanh cua con, cho con tu do lam moi thu con thich. Va quan trong la luon dung o dang sau de giup do con. 

14  
15 
\vspace*{\fill} 

16  
17 
\clearpage 

18 
thesis/appendix.tex  

1 
%!TEX root = main.tex 

2 
\begin{appendices} 

3  
4 
\chapter{Raw Data from the experiment} 

5 
This chapter shows the raw analysis from the experiment. 

6 
\begin{table}[hbp] 

7 
\centering 

8 
\begin{tabular}{l l l l} 

9 
\toprule 

10 
Number of nodes in a graph&CN&PL&ER\\ 

11 
\midrule 

12 
100&87.5&1.1&2.4\\ 

13 
200&181.9&1.0&3.4\\ 

14 
300&273.0&1.1&5.5\\ 

15 
400&366.1&1.1&6.5\\ 

16 
500&477.0&1.0&8.9\\ 

17 
600&572.3&1.1&9.8\\ 

18 
700&666.8&1.0&12.4\\ 

19 
800&763.7&1.0&12.6\\ 

20 
900&859.2&1.0&15.9\\ 

21 
1000&983.1&1.0&15.5\\ 

22 
\bottomrule 

23 
\end{tabular} 

24 
\caption{Average number of BCCs found for each category of synthetic graph} 

25 
\label{tab:num_of_bcc_for_graphs} 

26 
\end{table} 

27  
28  
29 
\begin{table}[hbp] 

30 
\centering 

31 
\begin{tabular}{l l l l} 

32 
\toprule 

33 
Number of nodes&CN&PL&ER\\ 

34 
\midrule 

35 
100&13.3&99.9&98.6\\ 

36 
200&18.9&200.0&197.6\\ 

37 
300&27.6&299.9&295.5\\ 

38 
400&34.6&399.9&394.5\\ 

39 
500&23.9&500.0&492.1\\ 

40 
600&28.5&599.9&591.2\\ 

41 
700&34.1&700.0&688.6\\ 

42 
800&37.2&800.0&788.4\\ 

43 
900&41.7&900.0&885.1\\ 

44 
1000&17.7&1000.0&985.5\\ 

45 
\bottomrule 

46 
\end{tabular} 

47 
\caption{Average number of node in the largest found BCC for each category of synthetic graph} 

48 
\label{tab:max_nodes_in_graph} 

49 
\end{table} 

50  
51 
\begin{table}[hbp] 

52 
\centering 

53 
\begin{tabular}{l l l} 

54 
\toprule 

55 
Wireless Community Network & HBC & BBC\\ 

56 
\midrule 

57 
Ninux & 4.8 & 1.1 \\ 

58 
FFW & 22.1 & 7 \\ 

59 
FFG & 5.3 & 0.97 \\ 

60 
\bottomrule 

61 
\end{tabular} 

62 
\caption{Running time of HBC and BBC on the router} 

63 
\label{tab:CN_real_router_BC_HBC} 

64 
\end{table} 

65  
66 
\begin{table}[hbp] 

67 
\centering 

68 
\begin{tabular}{l l l} 

69 
\toprule 

70 
Wireless Community Network & HBC & BBC\\ 

71 
\midrule 

72 
Ninux & 0.15 & 0.03 \\ 

73 
FFW & 0.7 & 0.19 \\ 

74 
FFG & 0.17 & 0.03 \\ 

75 
\bottomrule 

76 
\end{tabular} 

77 
\caption{Running time of HBC and BBC on the computer} 

78 
\label{tab:CN_real_server_BC_HBC} 

79 
\end{table} 

80 
\end{appendices} 
thesis/ch1_introduction.tex  

1  1 
%!TEX root = main.tex 
2  2  
3  3 
\chapter{Introduction} 
4 
People have a strong need for acquiring information. The faster, the more reliable, the more secured way, the more affordable mean to communicate, the better that communication technique is. During the 90s we have seen the boom of the Internet, at that time most devices was connected to the Internet through a cable with an ISP standing in between to provide services. In the late 2000s, connecting to the Internet without plugging anything in had became a norm. Most of the wireless networks that we are familiar with now such as 3G, LTE, 4G, WiFi, they are centralized network, which means those networks rely on the base stations to broadcast signal, or access points in a managed wireless network to provide the service. And then, \textbf{XXX any year?}, Wireless Mesh Networks (WMNs) have merged as a key technology for nextgeneration wireless networking.


4 
People have a strong need for acquiring information, and they want faster, more reliable, more secured, and more affordable mean to communicate. During the 90s we have seen the boom of the Internet, at that time most devices was connected to the Internet through a cable with an ISP standing in between to provide services. In the late 2000s, connecting to the Internet without plugging anything in became a norm. Most of the wireless networks that we are familiar with now, such as 3G, LTE, 4G, WiFi are centralized networks, which means those networks rely on base stations to broadcast signal, or access points in a managed wireless network to give the service. Wireless Mesh Networks (WMNs) in recent years are emerging as a key technology for nextgeneration wireless networking. They do not have any centralized control to route and forward packets, but rather all nodes cooperate together to relay each others' packets. The efficient protocol is important than ever in WMNs since it would affect the performance largely. And several papers \cite{VazquezRodas:2013:TCW:2507248.2507257}, \cite{Maccari2014670} have proposed the use of centrality to enhance routing in WMNs.


5  5  
6 
We begin, in this introductory chapter with the short overview of WMNs in \cref{sec:introduction_wmns}. Then we will present one application of WMNs in reality  the Wireless Community Networks (WCNs). We will cover the definition plus some challenging problems of WCNs in \cref{sec:introduction_wcns}. The chapter ends with the contribution, and the outline of the thesis.


6 
\emph{Centrality}\footnote{it is also referred to by other terms: centrality indices or centrality measures} is a measurement to quantify how influential a node is, given its position in the network. We concerns with centrality measurements since in WMNs, the ability to find central node can serve many great purposes such as adding more routers around those most central one to balance the load, and to lessen the impact when one of the most central routers fail. Or information about the central node can help with routing, to allow information to be spreaded faster in the network.


7  7  
8 
\section{Wireless Mesh Networks} \label{sec:introduction_wmns} 

9 
Wireless Mesh Network (WMN) is any wireless network having a network topology of either a partial or full mesh topology \cite{zhang2006wireless}. In full mesh topology, every pair of nodes is directly connected with each other, while in partial mesh topology, not all nodes are directly interconnected with each other by other nodes. Eventhough the definition of WMN mentions the full mesh topology, in practice, WMN is usually built over a partial mesh topology. This is due to the limited coverage of signal from wireless devices making it impossible for each node to directly conect with all other nodes. 

8 
In this thesis, the main focus is on one specific centrality measurement called \emph{betweenness centrality} $c_B(v)$. Intuitively, a node with high betweenness centrality has a large control over the traffic flowing within the network, given that the information is transfered using the shotest paths. And betweenness centrality defined as the sum of the fraction of total number of shortest paths that pass through that node. The best known algorithm \cite{Brandes01afaster} to calculate $c_B(v)$ runs in quadratic time, making the calculation for a large graph impractical. Another algorithm in focus is heuristicbased \cite{PuzisZEDB12} which demonstrated its effectiveness in several input graphs of choice. The main work of thesis is implementing those forementioned algorithms, and carrying out the performance evaluation for both of them on a computer, and on a lowcost wireless router. We want to know whether using the heuristicbased algorithm to calculate the betweenness centrality for every nodes in topology of WCNs would scale better than using the original algorithm. 

9  
10 
We begin with a short overview of WMNs in \cref{sec:introduction_wmns}. Then we present one application of WMNs in reality  the Wireless Community Networks (WCNs). We cover the definition plus some challenging problems of WCNs in \cref{sec:introduction_wcns}. After having all the background knowledge for WMNs and WCNs, the contribution of this thesis for WCNs is laid out in \cref{sec:introduction_contribution}. The outline for the subsequent chapters can be found in \cref{sec:introduction_outline}. 

10  11  
11 
For classification, WMNs belong to a category of multihop wireless network, where information traverses from a source to a destination using two or more wireless hops. 

12 
\section{Wireless Mesh Networks} \label{sec:introduction_wmns} 

13 
A WMN is any wireless network having a mesh topology\footnote{Mesh topology is used to denote any network that every nodes in it participating in the distribution of data}. In full mesh topology, every pair of nodes is directly connected with each other, while in partial mesh topology, not all nodes are directly interconnected with each other. Even though the definition of WMN mentions the full mesh topology, in practice, WMN is usually built over a partial mesh topology. This is due to the limited coverage of signal from wireless devices making it impossible for each node to directly connect with all other nodes. 

12  14  
13 
The main advantages of a WMN is its capability of fault intolerance. Since WMN is a decentralized wireless network, when several nodes is taking down, the network can find a new path way to link the network together. The term \emph{autonomic system} is normally used to describe properties of WMN.


15 
For classification, WMNs belong to a class of multihop wireless network, where information traverses from a source to a destination using two or more wireless hops.


14  16  
15 
WMN has several properties: 

16 
\begin{itemize} 

17 
\item Selfforming 

18 
\item Self healing 

19 
\item Self optimization 

20 
\item Multi hop 

21 
\end{itemize} 

17 
The main advantages of a WMN is its ability of fault intolerance. Since WMN is a decentralized wireless network, when several nodes fail, the network can find a new path way to link the network together. The term \emph{autonomic system} is often used to describe properties of WMN. 

22  18  
23 
WMNs find a great applications in many areas \citep{Akyildiz2005445} such as: broadband home networking, enterprise networking, community networking, citywide wireless coverage, spontaneous mesh network, transporation systems. However, all different areas for application can be classified into 2 categories. Firstly, WMN acts as a backbone network to extend the Internet connectivity for nodes that are not in the coverage of Internet access. This might due to the fact that installing wired infrastructure is too expensive for rural areas, or areas with challenging terrains. Another scenario is in a diasterstroke area where the current communication facilities (cables, base stations) were destroyed, then WMNs can provide temporary solution for communication. Secondly, WMNs is used to enable the communication between nodes within the mesh network.


19 
WMNs find interesting applications in many areas \citep{Akyildiz2005445} such as: broadband home networking, enterprise networking, community networking, citywide wireless coverage, spontaneous mesh network, transportation systems. All the different areas of application can be classified into two categories. First, the WMN acts as a backbone network to extend the Internet connectivity for nodes that are not in the coverage of Internet access. This might due to the fact that installing wired infrastructure is too expensive for rural areas, or areas with challenging terrains. Another scenario is in a disasterstroke area where the current communication facilities (cables, base stations) were destroyed, then WMNs can provide temporary solution for communication. There are many examples for this usage in reality. It has been used in Dharamsala \citep{jardin2006} to provide internet access for refugee monks living in the mountainous area in Dharamsala. \cite{Suzuki2006} introduced \emph{SKYMESH} to provide Internet access in a disaster area. It use heliumfilled balloons carrying WiFi transceivers, and together they form the mesh network.


24  20  
25 
For the first category  extending the Internet connectivity, there are many examples on its usage in reality. It has been used in Dharamsala \citep{jardin2006} to provide internet access for refugee monks living in the mountainous area in Dharamsala. \cite{Suzuki2006} introduced \emph{SKYMESH} to provide Internet access in a disaster area. It use heliumfilled balloons carrying WiFi transceivers, and together they form the mesh network.


21 
The second application of WMNs focus on the communication within the network. For example, in transportation management, \cite{Ahmed2008} proposed a prototype using wireless mesh networks to help with monitoring traffic. The car has sensors to collect data, and the realtime data is gathered using the WMN. Wireless Community Networks are another applications making use of mesh topology. The WCN contains a group of local users setting up and managing a network for themselves. Actually, WCNs can have a gateway to the Internet, and it can act as a backbone to deliver Internet connectivity. However, in reality, most WCNs were set up to promote an alternative communication channel that offers higher degree of privacy and neutrality \cite{Maccari2015175}.WCNs is discussed more in \cref{sec:introduction_wcns}.


26  22  
27 
The second usage of WMNs focus on the communication within the network. For example, in transportation management, \cite{Ahmed2008} proposed a prototype using wireless mesh networks to help with monitoring traffic. The car has sensors to collect data, and the realtime data is gathered using the WMN. Wireless Community Networks are another applications of WMNs, where a group of local users set up and manage a network for themselves. WCNs can have a gateway (or not) to the Internet. This WCNs will be discussed more in \cref{sec:introduction_wcns}. 

28  23  
29  24 
\section{Wireless Community Networks} \label{sec:introduction_wcns} 
30  25 
Application: 
31 
Wireless Community Network (WCN) \cite{Jain2003} belongs to wireless mesh network. It also servers 2 purposes as WMN: to provide internet access for those nodes which do not have it, and to create an environment for communication that offer higher privacy.


26 
Wireless Community Network (WCN) \cite{Jain2003} is one application of Wireless Mesh Network. WCN is created by a group of local users to have another channel of communication that is selfmanaged and offer higher degree of privacy. WCNs have their presents on all continents, but most of them are in Europe and North America. The largest WCN all over the world now is Guifi\footnote{https://guifi.net} in Spain with over 30000 nodes in 2015.


32  27  
33 
What


28 
For WCNs, new router can join the existing WCNs independently by following the technical information listed on the Web. Every nodes in WCNs must have a way to know the existence of the new router, or the unavailability of some other router when it leaves the network. There are several popular routing protocols used in WCNs, taking into account the everchanging topology of WCNs to deal with that problem. They are Optimized Link State Routing (OLSR), Babel, and B.A.T.M.A.N. But probably OLSR is the most widespread routing protocol in used. More thorough discussion for the mentioned protocols can be found in \cite{Pisa2013}.


34  29  
35 
http://www.npr.org/2006/08/10/5631353/awirelessnetworkforlittlelhasa


30 
In this thesis, we focus on three WCNs: Funk Feuer Wien (FFW)\footnote{https://www.funkfeuer.at/} and Funk Feuer Graz (FFG)\footnote{graz.funkfeuer.at/en/index.html} in Austria and Ninux (NNX)\footnote{wiki.ninux.org/} in Italy. And they all use Optimized Link State Routing Protocol (OLSR), where every nodes in the WCNs have the topology information, and compute the next hop destinations for all nodes in the network using shortest hop forwarding paths.


36  31  
37 
\cite{Maccari2015175} presented an argument that closeness centrality can be used 

38  32  
39  33 
\section{Contributions of this thesis} \label{sec:introduction_contribution} 
40 
In graph theory, centrality metrics (which will be explored in depth in \cref{sec:centrality}) is used to determine how influential a node is in the network. \cite{Katsaros2010} presented two types of centrality metrics that he considered them to have significant applications in analysing wireless networks. They are degree based centrality, and shortestpath based centrality. 

41  
42 
\cite{Maccari2015175} proposed to use \emph{closeness group centrality} $C_C(\gamma)$  one type of degree based centrality  to identify the central group in the WCNs. $\gamma$ is a group of nodes. Knowing the central group, the group with the lowest $C_C(\gamma)$ can be useful in, for example, determining where to place the Internet gateway. Suppose that this WCNs is designed to provide Internet access for its client, then the most central group will give the best average Internet connectivity to the nodes in the network. 

43  
44 
The type of centrality metrics to choose will depend on the purpose of analysis. \cite{Maccari2014670} also mentioned shortestpath based centrality $C_sp(k)$ should be employed when we want choose a set of nodes that have control a fraction $\eta$ of the entire traffic of the network. \citeauthor{Maccari2014670} gave 3 scenarios on why knowing betweenness centrality would help in resolving those 3 problems. Given that WCNs is not a static network, nodes can come and leave the network as they wish. Therefore, the betweenness centrality would change over time. Moreover, routers do not have enough computational power to excecute complex algorithm. Therefore, to make it feasible for WCNs to detect the set of most influential nodes, the algorithm must run fast enough \textbf{XXX how fast?} on the computationally limited physical device. 

45  
46 
In this thesis, I test out the performance of shortestpath based betweenness centrality (from now on it will only be called \emph{betweenness centrality} algorithm on the real router. The stateoftheart for calculating betweenness centrality is $\mathcal{O}(mn + n^2 \log n)$ \citep{Brandes01afaster}. Eventhough it is a huge improvement comparing to $\mathcal{O}(n^3)$, it is still not feasible to compute that centrality for large graph. The second algorithm is the heuristicbased by \cite{PuzisZEDB12}. This algorithm follow a divideandconquer design paradigm, the network is divided into biconnected components. Betweenness centrality is calculated for each biconnected components with a little bit of modification, and then they are summed up to achieve the same result as the stateoftheart algorithm. The presentation of both algorithms can be found in \cref{sec:computation_betweenness} 

47  
48 
Since the algorithm of \citeauthor{PuzisZEDB12} is a heuristicbased, there is no guarantee on the running time. Therefore, I carry out an extensive experiments to compare 2 algorithms. The experiment is run on random graphs: Erdos, Power Law, Community Network \footnote{This is not the real wireless community networks, but the generation algorithm was designed such that the topology is similar to the topology found in WCNs}. And the performance is measured in the virtualized environment with QEMU \footnote{http://wiki.qemu.org}, and on the real router. 

49  
50 
This is the first time that both algorithms of betweenness centrality has been tested on the real device. 

51  
52 
\section{Outline of this thesis} \label{sec:introduction_outline} 

53 
The outline of this paper is as follows. The first two chapters cover all the necessasry background information. Section \cref{sec:centrality} examines the concept of centrality. Since there are numereous measures for centrality, only those centrality measures that are most relevant will be presented. And betweenness centrality will be discussed in deep. Section \cref{sec:computation_betweenness} presents 2 algorithms to calculate the exact betweenness. 

34 
In graph theory, centrality metrics (which is explained in depth in \cref{sec:centrality}) is used to determine how influential a node is in the network. The authors in \cite{Katsaros2010} presented two types of centrality metrics that he considered them to have significant applications in analysing wireless networks. They are degree based centrality, and shortestpath based centrality. 

54  35  
55 
The next two chapters deals with the setting up and results of the simulation. Details on the dataset, repetition will be provided in \cref{sec:sim_scenario}. \cref{sec:sim_experiment} summarises the result of the simulation.


36 
The authors in \cite{Maccari2015175} proposed to use \emph{closeness group centrality} $C_C(\gamma)$  one type of degree based centrality  to identify the central group in the WCNs. $\gamma$ is a group of nodes. Knowing the central group, the group with the lowest $C_C(\gamma)$ can be useful in, such as, determining where to place the Internet gateway. Suppose that this WCNs is designed to provide Internet access for its client, then the most central group will give the best average Internet connectivity to the nodes in the network.


56  37  
57 
\clearpage


38 
The type of centrality metrics to choose depends on the purpose of analysis. \cite{Maccari2014670} also mentioned shortestpath based centrality $C_{sp}(k)$ should be employed when we want choose a set of nodes that have control a fraction $\eta$ of the entire traffic of the network. There are 3 scenarios in which betweenness centrality would be employed. Given that WCNs is not a static network, nodes can join and leave the network as they wish. Moreover, there is no single machine managing the whole network, but rather every router has the whole information of the topology and cooperates together to keep the information flowing freely in the network. In sum, we know that betweenness centrality is a useful measure for WCN, and that computing betweenness centrality must not be too demanding for the routers since routers do not have enough computational power to execute as quick as the normal computer. Therefore, to make it feasible for WCNs to detect the set of most influential nodes, the algorithm must run fast enough on the computationally limited physical device.


58  39  
59 
\section{Scope} 

60 
The scope of the paper: 

61 
\begin{itemize} 

62 
\item no selfloop 

63 
\item no multigraph: graph with no parallel edges 

64 
\item unweighted and weighted graph 

65 
\item can apply to connected or disconnected graph. In the disconnected graph, certain vertices are not reachable from vertices in a different component. This yeilds the zero shortest paths counts, and as a result those 2 endpoints contributes 0 toward the betweenness. 

66 
\end{itemize} 

40 
In this thesis, I test the performance of shortestpath based betweenness centrality (from now on it is called \emph{betweenness centrality} algorithm on a real router typically used in WCNs. The stateoftheart for calculating betweenness centrality is $\mathcal{O}(mn + n^2 \log n)$ \citep{Brandes01afaster}. Even though it is a huge improvement comparing to $\mathcal{O}(n^3)$, it is still not feasible to compute centrality in large graphs. The second algorithm is the heuristicbased presented in \cite{PuzisZEDB12}. This algorithm follow a divideandconquer design paradigm, the network is divided into biconnected components. Betweenness centrality is calculated for each biconnected components with a bit of modification, and then they are summed up to achieve the same result as the latest algorithm. The presentation of both algorithms can be found in \cref{sec:computation_betweenness}. 

67  41  
68 
connected graph: since the betweenness centrality is calculated based on shortest paths. In the disconnected graph, certain vertices are not reachable from vertices in a different component. This yields the zero shortest paths counts for betweenness centrality.


42 
Since the algorithm of \cite{PuzisZEDB12} is heuristicbased, there is no guarantee on the running time. I carry out an extensive experiments to compare the two algorithms. The experiment is run on synthetic graphs: Erdos, Power Law, Community Network\footnote{This is not the real wireless community networks, but the generation algorithm was designed such that the topology is similar to the topology found in WCNs}. And it is also run on three real wireless community networks: Funk Feuer Wien (FFW)\footnote{https://www.funkfeuer.at/} and Funk Feuer Graz (FFG)\footnote{graz.funkfeuer.at/en/index.html} in Austria, and Ninux\footnote{wiki.ninux.org/} in Italy. And the performance is measured in the virtualized environment with QEMU\footnote{http://wiki.qemu.org}, and on the real router. More details can be found in \cref{sec:experiment_input}.


69  43  
44 
This is the first time that both algorithms for calculating betweenness centrality has been tested on the real device. They both have the same polynomial timebound, but we show later that heuristic modification can greatly improve the running time for the Community Network graphs. 

70  45  
71 
\section{Notation} 

72 
A graph $G=(V,E)$ contains a set of vertices (nodes) $V$ and a set of edges (links) $E$. We assume we assume that al graphs are undirected and connected. \textbf{XXX check the modification for directed graph} 

73  46  
74 
$m = E$ is the number of edges in graph $G$ 

47 
\section{Outline of the thesis} \label{sec:introduction_outline} 

48 
The outline of this paper is as follows. The first two chapters cover all the necessary background information. Section \cref{sec:centrality} examines the concept of centrality. Since there are numerous measures for centrality, only those centrality measures that are most relevant is presented. And betweenness centrality is going to be discussed in deep since it is the measure that we we choose to evaluate the performance for this thesis. Section \cref{sec:computation_betweenness} presents two algorithms to calculate the exact betweenness. One is the stateoftheart algorithm, with the mathematically proven upper bound at $\mathcal{O}(mn + n^2 \log n)$. Another algorithm is based on heuristic for speeding up the betweenness centrality computation. And since there is no upper bound for the heuristic algorithm, we want to compare both algorithms to see the performance of both. 

75  49  
76 
$n = V$ is the number of vertices in graph $G$


50 
The next two chapters deals with the experimental to evaluate performance of these two forementioned algorithms. \cref{sec:experiment} gives detail on the experiment setup. This includes the description of the synthetic graphs, and real WCNs topology that we use in the experiment. \cref{sec:experiment_result} summarises the results of the experiment and gives the justification for the results.


77  51  
78 
Let $\omega$ be a weight function on the edges. We assume that $\omega(e) > 0, e \in E$ for weighted graphs. And we define $\omega(e) = 1, e \in E$ for unweighted graphs. If for $s, t \in V$ and ${s, t} \in E$, then we can also use $\omega(s, t)$ to denote the weight of the edge between $s$ and $t$. 

79  52  
80 
The \emph{length} of a path is the sum of weights of all edges along the path. We use $d(s, t)$ to denote the \emph{distance} between vertices $s$ and $t$, i.e. the minimum length of any path connecting $s$ and $t$. 
thesis/ch2_centrality.tex  

1  1 
%!TEX root = main.tex 
2  2  
3  3 
\chapter{Centrality} \label{sec:centrality} 
4 
Centrality\footnote{it is also refered to by other terms: centrality indices or centrality measures} is one the fundamental and useful measure in social network analysis to identify the influential vertices or edges in the network. However, how the vertices (or edges) are considered to be important depend on different centrality measures. In this thesis, we are concerned mainly with betweenness centrality, which belongs to the shortestpath based centrality according to the classification by \cite{Brandes:2005:NAM:1062400}. 

4 
Centrality\footnote{it is also referred to by other terms: centrality indices or centrality measures} is one the fundamental and useful measure in social network analysis to identify the influential vertices or edges in the network. However, how the vertices (or edges) are considered to be important depend on different centrality measures. In this thesis, we are concerned mainly with betweenness centrality, which belongs to the shortestpath based centrality according to the classification by \cite{Brandes:2005:NAM:1062400}.


5  5  
6 
Since \cite{Bavelas1948} first described the concept of centrality, many centrality indices were defined and used for different purposes. Since there is no single centrality indices that are supreme over all others, we should consider of which centrality indices to use in certain application.


6 
Since \cite{Bavelas1948} first described the concept of centrality, many centrality indices were defined and used for different purposes. Since there is no single centrality indices that are supreme over all others, we should consider which centrality indices to use in certain application. 

7  7  
8 
In \cref{sec:centrality_definition} we will begin with the definition of centrality. Then in \cref{sec:centrality_shortest_path_based} we will go deeper on the family of shortestpath based centrality, starting from the stress centrality, moving on to the betweenness centrality, and finally the relative betweenness centrality.


8 
In \cref{sec:centrality_definition} we begin with the definition of centrality. Then in \cref{sec:centrality_shortest_path_based} we go deeper on the family of shortestpath based centrality, starting from the stress centrality, moving on to the betweenness centrality, and finally the relative betweenness centrality.


9  9  
10  10 
For those who wants to have a broader knowledge on the centrality, then \citep{Brandes:2005:NAM:1062400} classified centrality for vertices and edges into nine families, and summarized them in great detail. 
11  11  
12 
In different paper, different type of notations are used for the same type of concepts. Hence, to keep this report coherent, I will use the same of notation for the same concept, and this might different from the notation used in the original papers.


12 
In different paper, different type of notations are used for the same type of concepts. Hence, to keep this report coherent, I use the same of notation for the same concept, and this might different from the notation used in the original papers. 

13  13  
14  14 
\section{Definition of centrality} \label{sec:centrality_definition} 
15 
Centrality index is defined by assigning the score representing the importance of nodes or edges in the network. When changing the way of assigning the score, we arrive at different centrality indices. But the underlying concept is that centrality indices must depend only on the structure of the network. \citet{Brandes:2005:NAM:1062400} pointed out the fact that even though there is no strict definition for centrality, we can still find the common ground shared by different centrality indices, and he coined the term \emph{Structure Index} 

15 
Centrality index can be roughly defined as a function assigning score representing the importance for nodes or edges in the network. When changing the way of assigning the score for each vertex, we arrive at different centrality indices. However, we cannot have arbitrary function assigning random values for nodes in graph, and call those values the centrality. 

16  
17 
\citet{Brandes:2005:NAM:1062400} pointed out that even though there is no strict definition for centrality, we can still find the common ground shared by different centrality indices. The underlying concept is that centrality indices must depend only on the structure of the network. To put it simply, centrality indices a way of assigning a score for each vertex (or edge) in the network, based completely on structure of the graph. Formally, he coined the term \emph{Structural Index}, and stated that a centrality index $c$ is required to be a structural index. 

16  18  
17  19 
\theoremstyle{definition} 
18 
\begin{definition}{Structure Index}


20 
\begin{definition}{\textbf{Structural Index}}


19  21 
Let $G=(V,E)$ be a weighted, directed or undirected multigraph and let $X$ represent the set of vertices or edges of $G$, respectively. A realvalued function $s$ is called a structural index if and only if the following condition is satisfied: $\forall x \in X: G \simeq H \Rightarrow s_G(x) = s_H(\phi(x)),$ where $s_G(x)$ denotes the value of $s(x)$ in G 
20  22 
\end{definition} 
21  23  
24 
Before beginning with the main section describing betweenness centrality, a simple centrality measure called \emph{degree centrality} is going to be presented, so that reader can have an easytounderstand example on what is centrality. The score of degree centrality for each node is equal to the number of connection that node has. See \cref{fig:degree_centrality}. Then later, we move on to more complex centrality measurements based on shortestpaths. 

25  
26 
\begin{figure}[hbp] 

27 
\centering 

28 
\includegraphics{images/line_graph_degree_horizontal.png} 

29 
\caption[Example for degree centrality]{Degree centrality score for each node} 

30 
\label{fig:degree_centrality} 

31 
\end{figure} 

32  
33  
22  34 
\section{Shortestpath based centrality} \label{sec:centrality_shortest_path_based} 
23 
This section presents two types of shortestpath based centrality. Shortest paths are defined on both vertices and edges. Basically those two centrality indices are based on the number of shortest paths between a pair of vertices or edges. \cref{sec:centrality_stress_centrality} and \cref{sec:centrality_bc} will present those two measures in deep.


35 
This section presents two types of shortestpath based centrality: \emph{stress centrality}, and \emph{betweenness centrality}. Basically those two centrality indices are based on the set of shortest paths between a pair of vertices or edges. \cref{sec:centrality_stress_centrality} and \cref{sec:centrality_bc} presents those two measures in deep.


24  36  
25 
In communication, information does not only flow in the shortest path, where the shortest path might represent the path where the total traverse time is shortest. However, in a lot of application, shortest path are chosen to forward information through the network. For example, in \textbf{XXXX is OLSR use shortest path to forward packets in the network}.


37 
In communication, information does not only flow in the shortest path, where the shortest path might represent the path where the total traversing time is shortest. However, in a lot of application, shortest path are chosen to forward information through the network. For example, OLSR routing protocol provides shortest path routes for all nodes in the network. And OLSR routing protocol is used widely in Wireless Community Networks  the network that we want to apply centrality to improve their existing routing protocol.


26  38  
27  39 
\subsubsection{Stress Centrality} \label{sec:centrality_stress_centrality} 
28 
The concept was by \cite{Shimblel1953}. 

40 
The concept was presented by \cite{Shimblel1953}, the idea is the know quantify how much information is flowing through a vertex in a network. With that definition, if a vertex $v$ lies on more shortest paths, then more information is passed through $v$ and it must handle a higher workload, or ``stress''. Formally, it is defined in \cref{eq:stress_centrality_long}. It is the summation over all possible source $s$, and all possible target $t$ that are different than $v$. 

41  
42 
\begin{equation} 

43 
\label{eq:stress_centrality_long} 

44 
c_S(v) = \sum_{s \neq v} \sum_{t \neq v} \sigma_{st}(v) 

45 
\end{equation} 

46 
where $\sigma_{st}(v)$ denotes the number of shortest paths containing $v$. To simplify the notation for \cref{eq:stress_centrality_long}, we might write it as follow: 

47  
48 
\begin{equation} 

49 
\label{eq:stress_centrality} 

50 
c_S(v) = \sum_{s \neq t \neq v} \sigma_{st}(v) 

51 
\end{equation} 

52 
However for the clarity, I will stick with the notation in \cref{eq:stress_centrality_long}. 

29  53  
30 
Different type of network (such as in wireless community network, in social networks) calls for different centrality indices.


54 
Note that the formula for stress centrality $c_S(v)$ does not include the shortest paths that start or end in $v$. And the stress centrality is calculated for all shortest paths between any pair of vertices.


31  55  
32 
\textbf{XXX Find out the application of stress centrality} 

33  56  
34 
Formally: 

57 
\subsubsection{Betweenness Centrality} \label{sec:centrality_bc} 

58 
The concept of \emph{betweenness centrality} was introduced independently by \citet{Freeman1977} and \citet{Anthonisse1971}. It can be viewed as a relative version of stress centrality. Here we first define the betweenness centrality, then continue with the motivation for betweenness centrality and its application. 

59  
60 
We define $\delta_{st}(v)$  the 

61 
\emph{pairdependency}\footnote{In \cite{Freeman1978215}, the term pairdependency is equivalent with the term \emph{dependency} used in \cite{Brandes01afaster}. To keep consistency, in this thesis, the definition of pairdependency and dependency follow the \cite{Brandes01afaster} } 

62 
of a pair $s, t \in V$ on an intermediary $v \in V$. 

35  63  
36  64 
\begin{equation} 
37 
\label{eq:stress_centrality_long}


38 
c_S(v) = \sum_{s \neq v \in V} \sum_{t \neq v \in V} \sigma_{st}(v)


65 
\label{eq:pair_dependency}


66 
\delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}}


39  67 
\end{equation} 
68 
where $\sigma_{st}(v)$) is the number of shortest paths from $s$ to $t$ that are passing through $v$. And $\sigma_{st}$ is the number of shortest paths from $s$ to $t$. $\delta_{st}(v)$ represents the probability that vertex $v$ falls randomly on any of the shortest paths connecting $s$ and $t$. Assume that the communication in the network follows the shortest path, then $\delta_{st}(v)$ can be interpreted as the probability that vertex $v$ can involve, intercept, enhance or inhibit the communication between $s$ and $t$. 

40  69  
41 
where $\sigma_{st}(v)$ denotes the number of shortest paths containing $v$. 

70 
The betweenness centrality of a single vertex $c_B(v)$ is defined as: 

71  
72 
\begin{equation} 

73 
\label{eq:betweenness_centrality} 

74 
c_B(v) = \sum_{s \neq v} \sum_{t \neq v} \delta_{st}(v) = \sum_{s \neq v} \sum_{t \neq v} \frac{\sigma_{st}(v)}{\sigma_{st}} 

75 
\end{equation} 

42  76  
43 
To simplify the notation for \cref{eq:stress_centrality_long}, we might write it as follow:


77 
From the original formula for betweenness centrality in \cref{eq:betweenness_centrality}, several variants for betweenness centrality was introduced in \cite{Brandes2008136}, such as edge betweenness centrality, group betweenness centrality, etc. The one we are mostly interested in is the betweenness centrality with endpoints vertices included. It means that the $c_S(v)$ take into account even the shortest paths starting or ending with $v$, and it doesn't require the source $s$ or the target vertex $t$ to be different from $v$. \cref{eq:betweenness_centrality_endpoints_inclusion} shows the formula for betweenness centrality with source and target included.


44  78  
45  79 
\begin{equation} 
46 
\label{eq:stress_centrality}


47 
c_S(v) = \sum_{s \neq t \neq v \in V} \sigma_{st}(v)


80 
\label{eq:betweenness_centrality_endpoints_inclusion}


81 
c_B(v) = \sum_{s \in V} \sum_{t \in V} \delta_{st}(v) = \sum_{s \in V} \sum_{t \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}


48  82 
\end{equation} 
49  83  
50 
\subsubsection{Betweenness Centrality} \label{sec:centrality_bc} 

84 
\subsubsection{Relative Betweenness Centrality} 

85 
After introducing betweenness centrality, \citeauthor{Freeman1977} presented another measurement \cite{Freeman1977}, called \emph{relative betweenness centrality} under the argument that betweenness centrality cannot be used to compare the influential of vertices from network of different size (e.g. different number of nodes). 

86  
87 
He argued that for vertices $v, u$ to have the same betweenness centrality only mean that they have the same potential for control in absolute terms. That means they can facilitate or inhibit the same number of communications. Note, we implicitly assume that all communications are conducted along shortest paths. However, the $c_B$ does not show the relative potential for control within the network. \cref{fig:bc_vs_bc_relative} illustrate that even though $c_B(v) = c_B(u_i) = 3, i = 1, 2, 3, 4$, the potential for control of vertex $v$ is much larger than vertex $u_i$. For example, removing vertex $v$ and the network would be disconnected and no communication can happen between vertices. Therefore, $v$ have a total control of the network. Meanwhile, removing any $u_i$ does not have that same disastrous effect since each $u_i$ only control part of the communications between pair of vertices. 

88  
89 
\begin{figure}[h] 

90 
\centering 

91 
\begin{tabular}{c c} 

92 
\subfloat[fig 1][BC]{\includegraphics[scale=0.3]{images/star_3_bc.png}} & 

93 
\subfloat[fig 2][BC]{\includegraphics[scale=0.3]{images/star_6_bc.png}} \\ 

94 
\subfloat[fig 1][Relative BC]{\includegraphics[scale=0.3]{images/star_3_bcrelative.png}} & 

95 
\subfloat[fig 2][Relative BC]{\includegraphics[scale=0.3]{images/star_6_bcrelative.png}} \\ 

96 
\end{tabular} 

97 
\caption[Comparision for Betweenness Centrality and Relative Betweenness Centrality]{ 

98 
Comparision for Betweenness Centrality and Relative Betweenness Centrality: We see that even though the red nodes have the same betweenness centrality score, it is clearly that the red node of the left graph has more control over the traffic flow within the network. For example, when the red node of the left graph is destroyed, the graph is disconnected, and no information can flow between blue nodes. On the other hands, if any red node of the right graph is cut out from the graph, the graph is still connected. The relative betweenness centrality can reflect the important of the red node for both graphs better, by setting it to $1$ for the left graph, and $0.2$ for the right graph. 

99 
} 

100 
\label{fig:bc_vs_bc_relative} 

101 
\end{figure} 

102  
103 
When a vertex of interest $v$ is the center node of the star graph, just as \cref{fig:bc_vs_bc_relative} (a), its betweenness centrality score is the biggest score that a graph with $n$ nodes can have. And the maximum betweenness centrality score that a vertex can achieve is shown be to: 

104 
\begin{equation} 

105 
\label{eq:max_bc_score} 

106 
\max c_B(v) = \frac{n^2  3n + 2}{2} 

107 
\end{equation} 

51  108  
52 
The concept of \emph{betweenness centrality} was introduced independently by \citet{Freeman1977} and \citet{Anthonisse1971}. It can be viewed as a relative version of stress centrality. Here we will first define the betweenness centrality, then we will go on with the motivation for betweenness centrality and its application. 

53  
54 
We define $\delta_{st}(v)$  the 

55 
\emph{pairdependency} 

56 
\footnote{In \cite{Freeman1978215}, the term pairdependency is equivalent with the term \emph{dependency} used in \cite{Brandes01afaster}. To keep consistency, in this thesis, the definition of pairdependency and dependency follow the \cite{Brandes01afaster} } 

57 
of a pair $s, t \in V$ on an intermediary $v \in V$. 

58  
59 
\begin{equation} 

60 
\label{eq:pair_dependency} 

61 
\delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}} 

62 
\end{equation} 

63  
64 
where $\sigma_{st}(v)$) is the number of shortest paths from $s$ to $t$ that are passing through $v$. And $\sigma_{st}$ is the number of shortest paths from $s$ to $t$. $\delta_{st}(v)$ represents the probability that vertex $v$ falls randomly on any of the shortest paths connecting $s$ and $t$. Assume that the communication in the network follows the shortest path, then $\delta_{st}(v)$ can be interpreted as the probability that vertex $v$ can involve, intercept, enhance or inhibit the communication between $s$ and $t$. 

65  
66 
The betweenness centrality of a single vertex $c_B(v)$ is defined as: 

67  
68 
\begin{equation} 

69 
\label{eq:betweenness_centrality} 

70 
c_B(v) = \sum_{s \neq t \neq v \in V} \delta_{st}(v) = \sum_{s \neq t \neq v \in V} \frac{\sigma_{st}(v)}{\sigma_{st}} 

71 
\end{equation} 

72  
73 
introduced \cite{Freeman1977} \textbf{XXX} the betweenness centrality to address the problem of being unable to compare the potential for control for vertices in different networks with stress centrality. For example, in \cref{fig:stress_vs_bc_centrality}, the stress centrality for vertices $v, u_i$ are the same. But when removing one vertex $v$, all vertices in top row of the graph are disconnected from all the vertices in the bottom row. For the network with $u_i$, for example, removing one or two vertices $u_1, u_2$ does not disconnect the network. The information can still flow freely from the top row to the bottom row following the path passing through the remaining vertex $u_3$. Therefore, using stress centrality alone, we cannot decide on whether some node are more critical for the network than the others, and betweenness centrality came up as a measure to capture the potential for control a vertex has on the network. 

74  
75 
\begin{figure}[h] 

76 
\caption{ 

77 
$c_S(u_i) = 16$ and $c_B(u_i) = \frac{1}{3}, i = 1, 2, 3$ and $c_S(v) = 16$ but $c_B(v) = 1$. It shows that stress centrality cannot be used to determine to potential for control a vertex has. This example is taken from \cite{Brandes:2005:NAM:1062400} 

78 
} 

79 
\label{fig:stress_vs_bc_centrality} 

80 
\centering 

81 
\includegraphics[scale=1.0]{images/stress_vs_bc_centrality.png} 

82 
\end{figure} 

83  
84 
\begin{equation} 

85 
\label{eq:max_stress_centrality} 

86 
\max c_B(v) = \frac{n^2  3n + 2}{2} 

87 
\end{equation} 

88  
89 
\textbf{XXX The application of BC} 

90  
91 
\subsubsection{Relative Betweenness Centrality} 

92 
The socalled relative betweenness centrality measure $c'_B(v)$ was also introduced by \cite{Freeman1977}: 

93  
94 
\begin{equation} 

95 
\label{eq:bc_relative} 

96 
c'_B(v) = \frac{2 c_B(v)}{n^2  3n + 2} 

97 
\end{equation} 

98  
99 
where $c_B(v)$ is the betweenness centrality defined above, and $n$ is the number of vertices in the network. 

100  
101 
\citeauthor{Freeman1977} argued that for vertices $v, u$ to have the same betweenness centrality only mean that they have the same potential for control in absolute terms. That means they can facilitate or inhibit the same number of communitions. Note, we implicitly assume that all communications are conducted along shortest paths. However, the $c_B$ does not show the relative potential for control within the network. \cref{fig:bc_vs_bc_relative} illustrate that even though $c_B(v) = c_B(u_i) = 3, i = 1, 2, 3$, the potential for control of vertex $v$ is much larger than vertex $u_i$. For example, removing vertex $v$ and the network will be disconnected and no communication can happen between vertices. Therefore, $v$ have a total control of the network. Meanwhile, removing any $u_i$ does not have that same disastrous effect since each $u_i$ only control part of the communications between pair of vertices. 

102  
103 
\begin{figure}[h] 

104 
\caption{ 

105 
$c_B(v) = 3$ and $c'_B(v) = 1$, and $c_B(u_i) = 3, i = 1, 2, 3$ but $c'_B(u_i) = 0.2$. It shows that the same betweenness centrality for vertices $v, u_i$ for 2 different networks does not equal to the same potential for control for their respective networks. 

106 
} 

107 
\label{fig:bc_vs_bc_relative} 

108 
\centering 

109 
\begin{subfigure}{.45\textwidth} 

110 
% \centering 

111 
\includegraphics[scale=0.3]{images/star_3.png} 

112 
\caption{A subfigure} 

113 
\label{fig:sub1} 

114 
\end{subfigure}% 

115 
\begin{subfigure}{.45\textwidth} 

116 
\centering 

117 
\includegraphics[scale=0.3]{images/star_6.png} 

118 
\caption{A subfigure} 

119 
\label{fig:sub2} 

120 
\end{subfigure} 

121 
\end{figure} 

109 
The socalled \emph{relative betweenness centrality} measure $c'_B(v)$ is the fraction between thet betweenness centrality $c_B(v)$ in \cref{eq:betweenness_centrality} divided by its possible maximum score. See \cref{eq:bc_relative} for the formula of relative betweenness centrality $c'_B(v)$: 

122  110  
111 
\begin{equation} 

112 
\label{eq:bc_relative} 

113 
c'_B(v) = \frac{2 c_B(v)}{n^2  3n + 2} 

114 
\end{equation} 

115 
where $c_B(v)$ is the betweenness centrality defined in \cref{eq:betweenness_centrality}, and $n$ is the number of vertices in the network. 
thesis/ch3_computation_of_bc.tex  

4  4 
Recall the definition of betweenness centrality: 
5  5  
6  6 
\begin{equation*} 
7 
\label{eq:betweenness_centrality} 

8  7 
c_B(v) = \sum_{s \neq t \neq v \in V} \delta_{st}(v) = \sum_{s \neq t \neq v \in V} \frac{\sigma_{st}(v)}{\sigma_{st}} 
9  8 
\end{equation*} 
10  9  
11  10 
Traditionally, just by looking at the equation, we normally divide the task of computing betweenness centrality into 2 steps: 
12  11  
13  12 
\begin{itemize} 
14 
\item First we need to compute tables with length and number of shortest paths between all pairs of vertices in the network.


15 
\item Second for each vertex $v$, we need to consider all the pair $s, t$, we use the tables calculated above to determine the fraction of shortest paths between $s, t$ that are passing through $v$. Summing all the fraction for every possible $s, t$ and we arrive at the $c_B(v)$. This way of summing is referred to by the term \emph{naive summation of pairdependencies} 

13 
\item First we need to compute tables with length and number of shortest paths between all pairs of vertices in the network;


14 
\item Second for each vertex $v$, we need to consider all the pair $s, t$, we use the tables calculated above to determine the fraction of shortest paths between $s, t$ that are passing through $v$. Summing all the fraction for every possible $s, t$ and we arrive at the $c_B(v)$. This way of summing is referred to by the term \emph{naive summation of pairdependencies}.


16  15 
\end{itemize} 
17  16  
18 
The chapter starts with presenting some important concept in \cref{sec:computation_important_concepts}. Then \cref{sec:computation_counting_shortest_paths} gives an overview on classic ways to find the shortest paths. And \cref{sec:computation_summing_pair_dependencies} introduces 2 ways to sum all the pairdependencies: (i) the naive way, (ii) the faster approach to accummulate pairdependencies gradually by \cite{Brandes01afaster}. Up until now, the approach in \cite{Brandes01afaster} is the stateoftheart in calculating the exact betweenness centrality, and it is implemented in many standard graph libraries such as 

19 
\emph{networkx} \footnote{https://networkx.github.io/}, 

20 
\emph{Boost Graph Library} \footnote{www.boost.org/libs/graph/doc/} 

21 
The \cref{sec:computation_summary} intends to summarize the result of two previous section with \cref{tab:complexity_of_algorithms} recaping the complexity of some algorithms used in determining betweenness centrality. 

17 
The chapter starts with presenting some important concept in \cref{sec:computation_important_concepts}. Then \cref{sec:computation_counting_shortest_paths} gives an overview on classic ways to find the shortest paths. And \cref{sec:computation_summing_pair_dependencies} introduces 2 ways to sum all the pairdependencies: (i) the naive way, (ii) the faster approach to accumulate pairdependencies gradually by \cite{Brandes01afaster}. Up until now, the approach in \cite{Brandes01afaster} is the stateoftheart in calculating the exact betweenness centrality, and it is implemented in many standard graph libraries such as 

18 
\emph{networkx}\footnote{https://networkx.github.io/}, 

19 
\emph{Boost Graph Library}\footnote{www.boost.org/libs/graph/doc/} 

20  
21 
The \cref{sec:computation_summary} intends to summarize the result of two previous section by recapping the complexity of some algorithms used in determining betweenness centrality. 

22  22  
23  23 
After having a throughout background, \cref{sec:computation_heuristic} presents the heuristic way to calculate also the exact betweenness centrality by \cite{PuzisZEDB12}. However, the complexity of the Heuristic Betweenness Centrality is not guaranteed to perform better than the algorithm of \cite{Brandes01afaster}, its performance depends on the topology. 
24  24  
25  25 
\section{Important concepts} \label{sec:computation_important_concepts} 
26  
26  27 
Define the set of \emph{predecessors} of a vertex $v$ on the shortest path from $s$ as: 
27  28  
28  29 
\begin{equation} \label{eq:predecessor} 
...  ...  
46  47 
The APSP can be computed straightforwardly by applying algorithm for SSSP $n$ times, for all vertex $s \in V$. The resulting time complexity thus will be $\mathcal{O}(mn + n^2)$, and $\mathcal{O}(mn + n^2 \log n)$ respectively. In the sparse graph, it is a good practice to solving the SSSP multiple times. However, in nonsparse graph, where the number of edges is close to the maximal number of edges, i.e. $m \simeq n(n1) \simeq n^2$, then it is better to use Floyd/Warshall algorithm with the complexity of $\mathcal{O}(n^3)$. 
47  48  
48  49  
49 
\textbf{XXX What is the input}: the source vertex $s$ 

50 
\textbf{XXX What is the output?}: 

50 
\section{Summing all the pairdependencies} \label{sec:computation_summing_pair_dependencies} 

51 
\subsubsection*{Naive ways} 

52 
In order to calculate the betweenness centrality for vertex $v$, we have to sum all the pair dependencies $\delta_{st}(v)$ for all possible source $s$ and target $t$ that are different from $v$. The first part of the algorithm calculates all the distance $d$ between all pair of vertices. Vertex $v$ lies on the shortest path between a pair $(s, t)$ if $d(s, t) = d(s, v) + d(v, t)$. And the number of shortest path from $s$ to $t$ passing through $v$ is calculated as $\sigma_{st}(v) = \sigma_{sv} \cdot \sigma_{vt}$. 

53  
54 
Therefore, summing of all possible $\delta_{st}(v)$ yields $\mathcal{O}(n^2)$, and we have the betweenness centrality per vertex $v$: $c_B(v)$. Doing for all other vertices in the network, and it yields $\mathcal{O}(n^3)$. 

51  55  
52 
\subsection{BreadthFirst Search for unweighted graph}


53 
I understood the algorithm, but how can I explain it to another person?


56 
\subsubsection*{Gradually accumulation of pairdependencies} \label{sec:computation_accumulation_pair_dependencies}


57 
\citeauthor{Brandes01afaster} noticed that there is a better way to sum all the pair dependencies \cite{Brandes01afaster}. The basic way is to finish completely the first part of finding shortest paths and distance between all pair of vertices, and then start with the summation of pair dependencies. The better way is achieved by exploiting the recursive relation of the \emph{dependency} $\delta_{s\LargerCdot}(v)$ of a vertex $s \in V$ on a single vertex $v \in V$, defined as:


54  58  
59 
\begin{equation} 

60 
\label{eq:dependency} 

61 
\delta_{s\LargerCdot}(v) = \sum_{t \in V} \delta_{st}(v) 

62 
\end{equation} 

55  63  
64 
\cite{Brandes01afaster} proved that the \emph{dependency} of $s \in V$ on any $v \in V$ obeys: 

56  65  
57 
\subsection{Dijkstra's algorithm for weighted graph} 

58 
The Dijkstra's algorithm can be modified so that when given a source vertex $s$, the output will be the set of predecessors and the number of shortest paths from $s$ to other $t \neq s \in V$ 

66 
\begin{equation} 

67 
\label{eq:partial_sum_dependency} 

68 
\delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w)) 

69 
\end{equation} 

59  70  
60 
\begin{algorithm} 

61 
\caption{Dijkstra's algorithm} \label{algo:dijkstra} 

62 
\begin{algorithmic}[1] 

63 
\Procedure{Dijkstra for SSSP with source vertex s}{} 

64 
\State $S \leftarrow$ empty stack 

65 
\State $P[w] \leftarrow$ empty list, $w \in V$ 

66 
\State $\sigma[t] \leftarrow 0, t \in V$; $\sigma[s] \leftarrow 1$ 

67 
\State $d[t] \leftarrow 1, t \in V$; $d[s] \leftarrow 0$ 

68 
\State $Q \leftarrow $ empty queue 

69 
\State enqueue $s \rightarrow Q$ 

70 
\While{$Q$ \emph{not empty}} 

71 
dequeue $v \leftarrow Q$ 

72 
push $v \rightarrow S$ 

71 
The set of predecessor of vertex $w$ on the shortest path from $s$: $P_s(w)$ can be obtained from the first part of the algorithm. And the dependencies of $s$ on all vertices can be computed in $\mathcal{O}(m)$. Therefore, instead of solving the APSP algorithm from start to end, we solve the SSSP for each source vertex $s \in V$. After completing the SSSP algorithm, we calculate the dependencies of $s$ as laid out in \cref{eq:partial_sum_dependency}. 

73  72  
74 
\EndWhile 

75 
\EndProcedure 

76 
\end{algorithmic} 

77 
\end{algorithm} 

73 
Next, we look at the relation between the betweenness centrality for $v \in V$ $c_B(v)$ and dependency $\delta_{s\LargerCdot}(v)$. Summing all the dependencies of $s \ in V$ on a single vertex $v$, and we arrive at the betweenness centrality $c_B(v)$ as in the following equation: 

78  74  
75 
\begin{equation} 

76 
\label{eq:accummulate_betweenness_centrality} 

77 
c_B(v) = \sum_{s \neq v } \delta_{s\LargerCdot}(v) 

78 
\end{equation} 

79  79  
80 
\section{Summing all the pairdependencies} \label{sec:computation_summing_pair_dependencies}


81 
\subsubsection{Naive ways} 

82 
\subsubsection{Gradually accummulation of pairdependencies} \label{sec:computation_accumulation_pair_dependencies}


80 
Therefore, we can then gradually accummulate the betweenness centrality score $c_B(v)$ by summing up the dependency $\delta_{s\LargerCdot}(v)$ to $c_B(v)$.


81  
82 
Next section is intended to provide the highlevel on the algorithm to calculate betweenness centrality


83  83  
84  84  
85  85 
\section{Complexity of determining betweenness centrality as a whole} \label{sec:computation_summary} 
86  86  
87 
\begin{table} [h] 

88 
\caption{Complexity for the 1st step: counting the number and length of shortest paths} 

89 
\label{tab:complexity_of_algorithms} 

90 
\begin{tabular}{lll} 

91 
\toprule 

92 
Algorithm name & Unweighted graphs & Weighted graph \\ 

93 
\midrule 

94 
BreadthFirst Search & $\mathcal{O}(mn + n^2)$ & not applicable\\ 

95 
Dijkstra & & $\mathcal{O}(mn + n^2 \log n)$ \\ 

96 
Floyd/Warshall & & $\mathcal{O}(n^3)$ \\ 

97 
\bottomrule 

98 
\end{tabular} 

99 
\end{table} 

100  
101 
\begin{table} [h] 

102 
\caption{Complexity for the 2nd step: summing all pairdependencies} 

103 
\label{tab:complexity_of_algorithms} 

104 
\begin{tabular}{lll} 

105 
\toprule 

106 
Algorithm name & Complexity \\ 

107 
\midrule 

108 
Naive summation & $\mathcal{O}(n^3)$ \\ 

109 
\cite{Brandes01afaster} & $\mathcal{O}(mn)$\\ 

110 
\bottomrule 

111 
\end{tabular} 

112 
\end{table} 

87 
This section summarizes the complexity for naive way, and for Brandes' algorithm to calculate betweenness centrality. The \cref{algo:basic_bc} run in $\mathcal{O}(n^3)$ for both unweighted and weighted graphs. And the Brandes's algorithm in \cref{algo:brandes_bc} can run in $\mathcal{O}(mn)$ for unweighted graph, and $\mathcal{O}(mn + n^2 \log n)$. The breaking down of the computational time of those two algorithms are given below. 

88  
89 
For \cref{algo:basic_bc}, the overall computation is dominated by line 2 to sum all the pair dependencies. 

90  
91 
\begin{algorithm} 

92 
\caption{Basic way to calculate betweenness centrality} \label{algo:basic_bc} 

93 
\begin{algorithmic}[1] 

94 
\Procedure{Basic Betweenness Centrality}{} 

95 
\State Solving APSP problem 

96 
\State Summing all pairdependencies 

97 
\EndProcedure 

98 
\end{algorithmic} 

99 
\end{algorithm} 

100  
101 
For \cref{algo:brandes_bc}, within the \texttt{for} loop, for unweighted graph, each execution from line 2  3 executes in $\mathcal{O}(m)$. For weighted graph, solving SSSP problem dominates the computation with $O(m + n \log n)$ for unweighted graph. The \texttt{for} loop is executed $n$ times. Thus, we have the running time stated in the beginning of the section. 

102  
103 
\begin{algorithm} 

104 
\caption{Brandes betweenness centrality} \label{algo:brandes_bc} 

105 
\begin{algorithmic}[1] 

106 
\Procedure{Brandes Betweenness Centrality}{} 

107 
\For{ $s \in V$ do } 

108 
\State Solving SSSP problem for source $s$ (with BFS or Dijkstra's algorithm) 

109 
\State Finding all the dependencies of $s$ for all other vertices $v$ \cref{eq:partial_sum_dependency} 

110 
\State Accumulate the betweenness centrality for vertices $v$ \cref{eq:accummulate_betweenness_centrality} 

111 
\EndFor 

112 
\EndProcedure 

113 
\end{algorithmic} 

114 
\end{algorithm} 

115  
113  116  
114  117 
\section{Heuristic algorithm for the exact betweenness centrality} \label{sec:computation_heuristic} 
118 
We start the section with the highlevel description of steps in the heuristic algorithm. We also provides definition for some term that deem to be important. For the details and all the proofs, reader can always refer to \cite{PuzisZEDB12} 

119  
120 
\citeauthor{PuzisZEDB12} uses the insight that computing betweenness centrality on tree can be done in linear time in \cite{PuzisZEDB12}. From now on this algorithm is denoted as HBC. He defined a heuristic algorithm to speed up the betweenness centrality calculation. The original graph is partitioned into one or many biconnected components (BCCs). Then the biconnected components tree is formed, and several analysis is performed on the BCC tree. The summary of HBC's steps are as follow: 

121  
122 
\begin{enumerate} 

123 
\item Divide the graphs into biconnected components and form BCC tree; 

124 
\item Compute the \emph{Component Tree Weight}. This step involves the knowledge of the whole graph; 

125 
\item Form the matrix with the \emph{communication intensity} value for each pair of vertices in a BCC; 

126 
\item Calculate the betweenness centrality for each vertices in each BCC using \cref{eq:BC_for_v_heuristic}; 

127 
\item Finalize the betweenness centrality for the original graph. 

128 
\end{enumerate} 

129  
130 
\subsection*{Partitioning a graph into biconnected components} 

131  
132 
Given a graph $G=(V,E)$. If the removal of any $v\in V$ disconnects $G$, then $v$ is said to be a \emph{cutpoint} (or \emph{articulation point}). A graph is said to be \emph{biconnected }if at least 2 vertices must be removed to disconnect the graph. \cref{fig:puzis_bcc} from \cite{PuzisZEDB12} gives an example of partitioning the graph into BCCs. 

133  
134 
\begin{figure} 

135 
\centering 

136 
\includegraphics[scale=0.5]{images/puzis_bcc.png} 

137 
\caption[BCC partition and cutpoints]{BCC partition and cutpoints. Reproduced from \cite{PuzisZEDB12}.} 

138 
\label{fig:puzis_bcc} 

139 
\end{figure} 

140  
141  
142 
\subsection*{Component Tree Weight} 

143 
The Component Tree Weight is intended to keep track of how many number of nodes can be reach via a cutpoint of a specific BCC when that BCC is separated from the original graph.Let $B$ be a biconnected component and $v \in B$ be a cutpoint. Let $B^{+}(v)$ be the \emph{connected component} in G which includes $B$ after the removal of $v$. In another term, $B^{+}(v)$ include all the vertices in G that are connected to $v$ via $B$. An example is for a biconnected component $BCC_B$ in \cref{fig:puzis_bcc}. After removing vertex $v$, the connected component ${BCC_B}^{+}(v) = \{5, 6, u, 1, 2, 3, 4\}$. The \emph{cut with} value is defined to be the size of $D_B^{v+} = B^{+}(v)$. 

144  
145 
Define $B^{}(v)$ as a set of vertices that can only be reach through $v$ from $B$. Therefore, when $v$ is cut out from the graph, those vertices in $B^{+}(v)$ (a connected component including $B$ without the vertex $v$) cannot reach $B^{}(v)$. The ``cut without'' value is defined to be $D_B^{v} = B^{}(v)$. For example, when removing vertex $v$ from $BCC_B$, ${BCC_B}^{}(v) = \{7, 8, 9\}$. Those vertices cannot be reach from $B^{+}(v)$ anymore. 

146  
147 
The relation between ``cut with'' and ``cut without'' is as follow: $B^{+}(v) + 1 + B^{+}(v) = V$. It can be thought as the number of vertices. 

148  
149 
\cref{tab:component_tree_weight} shows the value assigned to Component Tree Weight after the Step 2. 

150  
151 
\begin{table}[h] 

152 
\caption{Component Tree Weight calculation for \cref{fig:puzis_bcc}} 

153 
\label{tab:component_tree_weight} 

154 
\centering 

155 
\begin{tabular}{l l l l} 

156 
\toprule 

157 
Component & Cutpoint & ``Cut with'' value & ``Cut without'' value \\ 

158 
\midrule 

159 
$BCC_A$ & u & 2 & 8 \\ 

160 
$BCC_B$ & u & 8 & 2 \\ 

161 
$BCC_C$ & v & 7 & 3 \\ 

162 
$BCC_D$ & v & 1 & 9 \\ 

163 
\bottomrule 

164 
\end{tabular} 

165 
\end{table} 

166  
167  
168 
\subsection*{Communication intensity} 

169 
For each biconnected component $B_i$, we calculate the traffic matrix $h^{i}_{st}$ representing the \emph{communication intensity} between a pair of vertex $s$, $t$. The traffic matrix is symmetric. There are 3 cases: 

170  
171 
\begin{itemize} 

172 
\item If $x$ and $y$ are not cutpoints, then $h_{xy} = 0$ for $x = y$ and $h_{xy} = 1$ for $x \neq y$ 

173 
\item If $x$ is not a cutpoint, and $y$ is a cutpoint, then $h_{xy} = D_B^{y} + 1$ 

174 
\item If both $x$ and $y$ are cutpoints, then $h_{xy} = (D_B^{x} + 1) \cdot (D_B^{y} + 1)$ 

175 
\end{itemize} 

176  
177 
\subsection*{Betweenness computation for biconnected component} 

178 
The betweenness centrality calculated 

179 
\begin{equation} \label{eq:BC_for_v_heuristic} 

180 
c_B(v)_{v \in B_i} = \sum_{s, t \in B_i} \delta_{st}(v) \dot h_{st}^{i} 

181 
\end{equation} 

182 
And the heuristic algorithm also use the faster Brandes's algorithm to calculate betweenness centrality. Recall the equation for summing the dependency of $s \in V$ on any $v \in V$ 

183  
184 
\begin{equation*} 

185 
\delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w)) 

186 
\end{equation*} 

187 
For any biconnected component $B_i$, the dependency of $s \in B_i$ on any $v \in B_i$ can be written as follow: 

188  
189 
\begin{equation} 

190 
\label{eq:dependency_heuristic_partial_sum} 

191 
\delta_{s\LargerCdot}(v) = h^{i}_{sv} + \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot \delta_{s\LargerCdot}(w) 

192 
\end{equation} 

193  
194 
\subsection*{Betweenness computation for original graph $G$} 

195 
It was proven in the paper that for noncutpoint, the betweenness value calculated locally in each $B_i$ is equal to the betweenness value calculated on the whole graph. 

196  
197 
\begin{equation} 

198 
c_B(v) = c_B(v)_{v \in B} 

199 
\end{equation} 

200 
For the cutpoint betweenness value, we first have to figure out the intercomponent communication, denoted as $BC^{inter}(v)$: 

201  
202 
\begin{equation} 

203 
BC^{inter}(v) = \sum_{B \ni v} D^{v}_{B} \cdot D^{v+}_{B} 

204 
\end{equation} 

205 
For cutpoint $v$, the betweenness value on the whole graph is defined as: 

206  
207 
\begin{equation} 

208 
c_B(v) = \sum_{B \ni v} c_B(v)_{v \in B}  BC^{inter}(v) 

209 
\end{equation} 
thesis/ch4_experiment_setup.tex  

1  1 
%!TEX root = main.tex 
2  2  
3 
\chapter{Simulation scenario} \label{sec:sim_scenario} 

3 
\chapter{Experiment} \label{sec:experiment} 

4 
The previous chapter presents two algorithms to calculate Betweenness Centrality (BC). One algorithm was found out by \citeauthor{Brandes01afaster} in \cite{Brandes01afaster}. For the ease of notation, from now on it is called \emph{BBC}. The other heuristic version to calculate betweenness centrality was devised by \citeauthor{PuzisZEDB12} \cite{PuzisZEDB12}, and its shortform name from this point is \emph{HBC}. 

4  5  
5 
http://stackoverflow.com/questions/10455905/clockspersecnotactuallyclockspersec 

6 
This chapter presents all the setup information to compare the running time of two algorithms. In order to perform the analysis, there are few things needed to address first. To analyze the running time of the algorithm, we need to know about all the factors that can influence the running time. Those factors are the implementation, the type and size of graph, the system that runs the algorithms. The following section addresses those factors in depth. \cref{sec:experiment_implementation} presents the implementation of BBC and HBC were run on the physical router and server. \cref{sec:experiment_hardware} provides the hardware specifications for the system we use. \cref{sec:experiment_openwrt} concerns with the technical side of porting the program to embedded device, in our case is the Ubiquiti router. This section contains a lot of technical note, which might be served as a guideline on how to make a custom package for embedded system. \cref{sec:experiment_input} presents about two types of input that we use in the experiment: the randomly generated graphs, and the real topologies of the WCNs. Lastly, there are fluctuation when measuring the running time of the algorithms. \cref{sec:experiment_design} presents the method to get the average running time, how the experiment was carried out, e.g. how many graphs was generated, how many times the algorithm was run for one specific input. 

7  
8 
\section{Implementation} \label{sec:experiment_implementation} 

9 
The BBC is implemented in the Boost Graph Library\footnote{http://www.boost.org/doc/libs/1_60_0/libs/graph/doc/index.html} (BGL). That is the generic library with implementation for many classical graph algorithms. 

10  
11 
From \cref{sec:computation_heuristic}, the implementation for HBC contains four main steps. The first step involves dividing the main graph into some biconnected components $B_i$. And then for each articulation points in all $B_i$, the link weight is computed. To calculate the link weight, we need to use the information from the original graph. Second, for each $B_i$, create the traffic matrix $h^i$ to represent the \emph{communication intensity} between all pair of vertices in each $B_i$. 

12  
13 
The third, and the final part is the procedure to calculate betweenness for each $B_i$. Since the original graph is divided into some biconnected components $B_i$, performing the original BBC for each $B_i$ will not generate the correct result. Actually, the accumulation step, as shown in \cref{eq:BC_for_v_heuristic} involves multiplying the pairdependency $\delta_{st}(v)$ with its corresponding communication intensity value $h_{st}^{i}$. For an ease of comparison, the pseudo code of the accumulation step for BBC and HBC are presented in \cref{algo:accummulation_brandes} and \cref{algo:accumulation_heuristic}, respectively. 

14  
15 
\begin{algorithm}[h] 

16 
\caption{Accumulation procedure for BBC} \label{algo:accummulation_brandes} 

17 
\begin{algorithmic}[1] 

18 
\Procedure{accumulation}{} 

19 
\State $c_B[s] \leftarrow c_B[s] + (S  1)$ 

20 
\For{$v \in V$} 

21 
$\delta[v] \leftarrow 0$ 

22 
\EndFor 

23  
24 
\While{$S$ not empty} 

25 
\State pop $w \leftarrow S$ 

26 
\For{$v \in Pred[w]$} 

27 
$\delta[v] \leftarrow \delta[v] + \frac{\sigma[v]}{\sigma[w]} (1 + \delta[w])$ 

28 
\EndFor 

29 
\If{$w \neq s$} 

30 
$c_B[w] \leftarrow c_B[w] + \delta[w] + 1$ \Comment{$w$ is target of $s$ once} 

31 
\EndIf 

32 
\EndWhile 

33 
\EndProcedure 

34 
\end{algorithmic} 

35 
\end{algorithm} 

36  
37 
\begin{algorithm} 

38 
\caption{Accumulation procedure for HBC}\label{algo:accumulation_heuristic} 

39 
\begin{algorithmic}[1] 

40 
\Procedure{accumulation}{} 

41 
\For{$v \in V$} 

42 
$\delta[v] \leftarrow 0$ 

43 
\EndFor 

44 
\While{$S$ not empty} 

45 
\State pop $w \leftarrow S$ 

46 
\State $h \leftarrow $ get_the_communication_intensity_between $(w, s)$ 

47 
\State $c_B[s] += h$ 

48 
\For{$v \in Pred[w]$} 

49 
$\delta[v] \leftarrow \delta[v] + \frac{\sigma[v]}{\sigma[w]} (h + \delta[w])$ 

50 
\EndFor 

51 
\If{$w \neq s$} 

52 
$c_B[w] \leftarrow c_B[w] + \delta[w] + h$ 

53 
\EndIf 

54 
\EndWhile 

55 
\EndProcedure 

56 
\end{algorithmic} 

57 
\end{algorithm} 

58  
59  
60 
\section{Hardware specifications} \label{sec:experiment_hardware} 

61 
\subsection*{Server} 

62 
The experiment is run on the machine provided by Advanced Networking Systems (ANS) lab\footnote{https://ans.disi.unitn.it/}, with the hardware components as in \cref{tab:specification_computer} 

63  
64 
\begin{table} 

65 
\caption{Specification of a computer} 

66 
\label{tab:specification_computer} 

67 
\centering 

68 
\begin{tabular}{l l} 

69 
\toprule 

70 
CPUs: & 2 $\times$ Intel(R) Xeon(R) CPU E52630 v3 \@ 2.40GHz \\ 

71 
Memory: & 4 $\times$ 16GB \@ 1866 MHz \\ 

72 
\bottomrule 

73 
\end{tabular} 

74 
\end{table} 

75  
76 
The machine was running GNU/Linux with kernel 3.16. 

77  
78 
\subsection*{WiFi Router} 

79 
We use Ubiquiti NanoStation M2 router to test the performance. The specification is in \cref{tab:specification_router}. Ubiquiti router is flashed with a firmware obtained from OpenWrt as described in the next section. 

80  
81 
\begin{table} 

82 
\caption{Specification of Ubiquiti NanoStation M2 router} 

83 
\label{tab:specification_router} 

84 
\centering 

85 
\begin{tabular}{l l} 

86 
\toprule 

87 
Platform: & Atheros AR7240 \\ 

88 
CPU MHz: & 390 \\ 

89 
Flash MB: & 8 \\ 

90 
RAM MB: & 32 \\ 

91 
\bottomrule 

92 
\end{tabular} 

93 
\end{table} 

94  
95  
96 
\section{Porting a program to OpenWrt} \label{sec:experiment_openwrt} 

97 
OpenWrt\footnote{https://openwrt.org/  the most widely used opensouce operating system for wireless routers.} ``is described as a Linux distribution for embedded devices.''. It allows people to build custom firmware to run on many supported devices. And it aids with cross compiling the package which can be installed on those devices with custom firmware. 

98  
99 
Even though we can build the firmware image with OpenWrt toolchain. However, there is the firmware image available on the Download pageootnote{https://wiki.openwrt.org/toh/hwdata/ubiquiti/ubiquiti_nanostationm2} for Ubiquiti NanaStation M2 router, so we use it and flash the device with that firmware. 

100  
101 
The source code of the program running on the machine with GNU/Linux kernel 3.16, and the program running on Ubiquiti router are the same. To make it possible to run a program in the Ubiquiti router, OpenWrt helps with compiling the source code into a package that can be installed on Ubiquiti router. 

102  
103  
104 
\section{Input data} \label{sec:experiment_input} 

105 
The experiment involves testing with 2 main sources of input: the artificially generated graphs, and the real topologies from the Wireless Community Networks. 

106  
107 
\subsection{Synthetic graphs} 

108 
For randomly generated graphs, we are interested in three types: Erdos, Power Law, and Community Graph. All the graphs were generated following the opensource code made available by \citeauthor{Baldesi2016} \cite{Baldesi2016} in a framework called \emph{NePA TesT}. NePA TesT provides a way to generate synthetic graph for different graph type such as Erdos, Power Law, and Community Network. 

109  
110 
The reason why we wanted to test on synthetic graphs is because we want to know how BBC and HBC would perform with different graph structures, and graph sizes. And as will see later in \cref{sec:experiment_result}, performance of BBC and HBC depends on the structure and property of each graph type. 

111  
112 
\begin{table} 

113 
\caption{All synthetic graphs generated for the experiment} 

114 
\label{tab:graph_generator} 

115 
\centering 

116 
\begin{tabular}{lll} 

117 
\toprule 

118 
Graph type & Number of nodes & Edges / Nodes Ratio \\ 

119 
\midrule 

120 
CN & from [100, 1000] with the step 100 & 1.2 \\ 

121 
ER & from [100, 1000] with the step 100 & 3 \\ 

122 
PL & from [100, 1000] with the step 100 & 1.2 \\ 

123 
\bottomrule 

124 
\end{tabular} 

125 
\end{table} 

126  
127 
For every line in \cref{tab:graph_generator}, 30 different graphs were generated per size. So for CN graphs, we have 10 different sizes multiplies by 30 graphs, and we have 300 CN graphs in total. For all three graph types, we have 900 graphs. We use the notation \emph{CN100}, \emph{PL100}, \emph{ER100} to denotes the Community Network, Power Law and Erdos graphs with 100 nodes, respectively. The same format of notation is used for graphs with larger number of nodes. 

128  
129  
130 
\subsection{Real topologies of WCNs} 

131 
In conjunction with the synthetic networks, we also want to evaluate BBC and HBC on the real networks. Since in the end, what matters is which algorithm would work better in reality. In this thesis, we focus on three WCNs: Funk Feuer Wien (FFW)\footnote{https://www.funkfeuer.at/} and Funk Feuer Graz (FFG)\footnote{graz.funkfeuer.at/en/index.html} in Austria and Ninux (NNX)\footnote{wiki.ninux.org/} in Italy. 

132  
133 
The WCNs are ever evolving with some current node leaving the network, while some new nodes might join it. \cite{Maccari2015175} provides a depth and throughout analysis on these networks for a week \cite{Maccari2015175}. He found out that the fluctuation in the number of nodes and the number of links are small. According to him, those small changes are from a small subset of leaf nodes leaving and entering network sporadically. Since that fluctuations happen at peripheral nodes, the shortest paths for pair of nodes did not vary much with other timeperiods. The reason is that those peripheral leaf nodes does not lies in the shortest paths when routing for other nodes. Hence the performance of shortestpath based betweenness measurement varies slightly. There are other measurements taken as well, but the conclusion from the paper is that for a week, the network features of these real WCNs are stable. With the above analysis, we limit the performance testing of BBC and HBC on one snapshot for each WCNs in consideration. 

134  
135  
136 
\section{Experiment Design} \label{sec:experiment_design} 

137 
Those previous subsections present all the component of the experiment. We have the input data composing of synthetic and real graphs. We have 2 algorithms needed to evaluate: BBC and HBC. We specified the target system to evaluate the running time of the 2 mentioned algorithms. We start the section with definition of running time. The execution step of the experiment is presented after, aka how many times one single algorithm (BBC or HBC) is evaluated for each input graph. 

138  
139 
Every time we mention about the running time, or execution time of BBC or HBC algorithm, we are talking about the time that a processor spent on calculating result for each algorithm. In C++, there is the function call in standard library to help measure the number of CPU clocks consumped by the program. 

140  
141 
\begin{figure} 

142 
\begin{lstlisting} 

143 
#include<time.h> 

144  
145 
clock_t begin = clock() 

146  
147 
// Run BBC algorithm / HBC algorithm for one graph 

148  
149 
clock_t end = clock() 

150  
151 
double running_time = double (end  begin) / CLOCKS_PER_SECOND 

152 
\end{lstlisting} 

153 
\caption{Code to calculate the CPU time} 

154 
\end{figure} 

155  
156  
157 
The experiment consists of running each algorithm 10 times for each input graph. In this experiment, the measurement we want to get is the running time for each algorithm. And we want that measure to be as close to the actual value as possible. When running for one time, there are various things which might affect the result. And we need to repeat the experiment multiple times to make sure the measurement we get is not a coincidence. 

158  
159 
\textbf{Note:} Though we run the program on the machine with CPU containing 8 cores and supporting hyper threading, the program does not support parallelism. One program only run on one single thread from the beginning to the end. 

160  
161 
For example, for the Ninux graph, we will run BBC 10 times with it, and then we will run HBC 10 times. The running time were all noted down for the analysis. 

162  
163 
\begin{table} 

164 
\centering 

165 
\begin{tabular}{lll} 

166 
\toprule 

167 
Graph type & On the computer & On Ubiquiti router \\ 

168 
\midrule 

169 
CN with nodes in the range [100, 1000] & Yes & Yes \\ 

170 
ER with nodes in the range [100, 1000] & Yes & No \\ 

171 
PL with nodes in the range [100, 1000] & Yes & No \\ 

172 
Ninux & Yes & Yes \\ 

173 
FFG & Yes & Yes \\ 

174 
FFW & Yes & Yes \\ 

175 
\bottomrule 

176 
\end{tabular} 

177 
\caption{Summary of the number of runs (repetitions) for different input graph} 

178 
\label{tab:experiment_design} 

179 
\end{table} 
thesis/ch5_experiment_result.tex  

1  1 
%!TEX root = main.tex 
2  2  
3 
\chapter{Simulation experiments} \label{sec:sim_experiment} 

3 
\chapter{Results} \label{sec:experiment_result} 

4 
The chapter begins with the result from the synthetic graphs in \cref{sec:result_synthetic}. Since HBC only improve performance for one type of synthetic graphs  the Community Network graphs, then in \cref{sec:result_synthetic_analysis} we provides commentary on the reason HBC works for CN, but not the other topology types ER and PL. Then we moves on to the next section presenting about the three real topologies of WCNs  Ninux, FFG, FFW in \cref{sec:result_wcn}. The chapter ends with the running time analysis for all those WCNs in both the Ubiquity router and a computer. 

5  
6 
\section{Running time for synthetic graphs} \label{sec:result_synthetic} 

7 
As mentioned in \cref{tab:experiment_design}, only the Community Network synthetic graphs are run on both the computer and router. While the remaining two synthetic graphs  Erdos and Power Law  are only tested for performance with the computer. 

8  
9 
The value on the yaxis for the running time of HBC and BBC for those figures \ref{fig:CN_both_BC_HBC}, \ref{fig:ER_PL_server_BC_HBC}, \ref{fig:CN_real_BC_HBC} is the average of average. 

10  
11 
Let $r_{ij}^{(100)}, i \in [1, 10]$ (because we repeat the same measurement 10 times) denote the running time of BBC algorithm for a CN graph with 100 nodes $g_j^{(100)}, j \in [1, 30]$ since for each type of graphs, we generate 30 graphs in total. Then the value on the yaxis corresponding to graph with 100 nodes is the average running time of all the repetitions for all CN graphs with 100 nodes. Formally, check out \cref{eq:avg_running_time} 

12  
13 
\begin{equation} 

14 
\label{eq:avg_running_time} 

15 
v^{(100)} = \frac{1}{300} \sum_{j=1}^{30} \sum_{i = 1}^{10} r_{ij}^{(100)} 

16 
\end{equation} 

17  
18 
For the CN graphs, \cref{fig:CN_both_BC_HBC} summarizes the running time observed for graphs with different node types. We see a huge improvement of time of HBC over BBC when performing the calculation on both the computer and the router. The difference between two graphs is the scale of the yaxis. The program executes much faster in the computer than in the router, but this is simply due to the higher processing power of the computer. What is interesting from \cref{fig:CN_both_BC_HBC} is how fast HBC complete the calculation comparing to BBC. We see that for CN graphs with 1000 nodes, HBC finishes the computation 100 faster than BBC with the computer, and 60 times fater with the router. Also for CN graphs with 1000 nodes, with the router, BBC takes more than 6 minutes to compute betweenness centrality, while HBC can finish in less than 7 seconds. This is the stark constrast on the performance. Given the router has small processing power, and it has other tasks to do such as forwarding packets, faster execution time can result in better performance overall. From looking at a graph, we can see that HBC running time increase linearly, while BBC running time quickly soars. 

19  
20 
\begin{figure}[hpt] 

21 
\begin{tabular}{cc} 

22 
\subfloat[fig 1][]{\includegraphics[width=0.45\textwidth]{images/CN_server_BC_HBC.png}} & 

23 
\subfloat[fig 2][]{\includegraphics[width=0.45\textwidth]{images/CN_server_speedup_ratio.png}} \\ 

24 
\subfloat[fig 3][]{\includegraphics[width=0.45\textwidth]{images/CN_artificial_router_BC_HBC.png}} & 

25 
\subfloat[fig 4][]{\includegraphics[width=0.45\textwidth]{images/CN_artificial_router_speedup_ratio.png}} 

26 
\end{tabular} 

27 
\caption[Performance for CN graphs on the computer and the Ubiquiti router]{Performance for CN graphs on the computer and the Ubiquiti router. Left column shows the running time for both algorithms. Right column shows the speedup ratio between HBC and BBC.} 

28 
\label{fig:CN_both_BC_HBC} 

29 
\end{figure} 

30  
31 
We have just looked at one type of graph where HBC outperforms BBC  the CN graphs. Next we are going to look at two other types of graph where using HBC worsens the overall performance. \cref{fig:ER_PL_server_BC_HBC} shows the running time on server for Power Law and Erdos graphs. Both lines follow the same pattern: the running time of BBC is a bit smaller than HBC's running time. 

32  
33  
34 
\begin{figure} 

35 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/ER_server_BC_HBC.png}} 

36 
\subfloat[fig 2][]{\includegraphics[width=0.5\textwidth]{images/PL_server_BC_HBC.png}} 

37 
\caption{Performance for Erdos and Power Law graphs on the computer.} 

38 
\label{fig:ER_PL_server_BC_HBC} 

39 
\end{figure} 

40  
41 
\begin{figure} 

42 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/ER_server_speedup_ratio.png}} 

43 
\subfloat[fig 2][]{\includegraphics[width=0.5\textwidth]{images/PL_server_speedup_ratio.png}} 

44 
\caption{Speed up ratio of HBC over BBC} 

45 
\label{fig:ER_PL_speed_up_ratio_BC_HBC} 

46 
\end{figure} 

47  
48 
The question then arises is when HBC can offer improvement over BBC, what kind of topologies should we use HBC to calculate betweenness centrality. 

49  
50 
\section{Synthetic graphs analysis} \label{sec:result_synthetic_analysis} 

51 
In the previous section, we noticed that HBC helps improving the performance of Community Network graphs. While for Erdos and Power Lab graphs, HBC takes longer time to run. This section analyzes all the 900 synthetic graphs to give an explanation on why HBC improves performance for some graphs, but not for the others. 

52  
53 
The section has two parts, the \cref{sec:result_analysis_num_of_bcc} seeks answer for whether the number of biconnected components (BCCs) found in a single graph affect the performance. \cref{sec:result_analysis_num_of_nodes} justifies that the number of vertices (or nodes) in any single biconnected components is the main factor affecting the performance. 

54  
55 
\textbf{Notations:} For each graph type [CN, PL, ER] and for each graph size [100, 200, ..., 1000], we generated 30 graphs in total. We call those graphs $G^{(i)}, i \in [1, 30]$. For each $G^{(i)}$, suppose that we can find $n^{(i)}$ biconnected components, $B_j^{(i)}, j \in [1, n^{(i)}]$. 

56  
57 
\subsection*{The number of biconnected components} \label{sec:result_analysis_num_of_bcc} 

58 
The average number of BCCs for each type of graphs is summarized in \cref{tab:num_of_bcc_for_graphs}, and visualized in \cref{fig:num_of_bcc_for_graphs}. 

59  
60 
For each graph type, and for each graph size: 

61 
\begin{equation} 

62 
N_{BCC} = \frac{1}{30} \sum_{i = 1}^{30} n^{(i)} 

63 
\end{equation} 

64  
65  
66 
\begin{figure}[hpt] 

67 
\centering 

68 
\includegraphics[width=0.6\textwidth]{images/num_of_bcc_for_graphs.png} 

69 
\caption[Average number of BCCs for synthetic graphs]{Average number of BCCs for synthetic graphs. Note that the yaxis is drawn in logscale with base 2.} 

70 
\label{fig:num_of_bcc_for_graphs} 

71 
\end{figure} 

72  
73 
For CN graphs, the number of BCCs is almost equal to the number of nodes in the graph. While for PL graphs, in general we find only 1 BCC, and this BCC is the original PL graph. The case for ER graphs is a bit different, there are several BCCs found, and the number of BCCs found increase with the number of nodes in ER graphs, e.g from 2.4 average number of BCCs for graph with 100 nodes, up to almost 16 BCCs for graph with 1000 nodes. 

74  
75 
Our hypothesis is that if the original graph can be divided into many BCCs, then HBC will save the running time. However, the data shows us that even though ER graphs can be divided into many BCCs, HBC does not perform better for ER graph, see more in \cref{fig:ER_PL_server_BC_HBC}. Hence, the number of BCCs can not be used as a prediction for whether or not HBC would decrease the calculating time for betweenness centrality. 

76  
77  
78 
\subsection*{The number of node in each biconnected component} \label{sec:result_analysis_num_of_nodes} 

79 
In the previous \cref{sec:result_analysis_num_of_bcc}, we showed that the number of BCCs found in each graph cannot predict whether HBC would run faster comparing to BBC. This subsection takes a different approach, it analyzes the biconnected components in term of the its size  the number of nodes in a biconnected component. More specifically, we focus on the biggest BCC. 

80  
81 
After the preprocessing step done in HBC algorithm, then it uses a modified version (see \cref{algo:accumulation_heuristic} of the original Brandes Betweenness Centrality to accumulate the betweenness centrality score. Therefore, if the size of each BCC is as big as the original graphs, then HBC does not save time when calculating betweenness. Not to mention the fact that HBC adds overhead to the overall running time when it preprocesses the original graph. 

82  
83 
\begin{figure}[hpt] 

84 
\centering 

85 
\includegraphics[width=0.6\textwidth]{images/max_nodes_in_graph.png} 

86 
\caption{The biggest biconnected components for each synthetic graph} 

87 
\label{fig:max_nodes_in_graph} 

88 
\end{figure} 

89  
90 
Let $MB^{(i)}$ denotes the maximum number of nodes for a biconnected component in a graph $G^{(i)}$: 

91  
92 
\begin{equation} 

93 
MaxB^{(i)} = \max(B_j^{(i)}), j \in [1, n^{(i)}] 

94 
\end{equation} 

95  
96 
Let $B_j^{(i)}$ denotes the number of nodes within each biconnected component $B_j^{(i)}$. Then the yvalue $N_{max}$ in the \cref{fig:max_nodes_in_graph} is defined as follow: 

97  
98 
\begin{equation} 

99 
\label{eq:avg_max_num_of_node_in_bcc} 

100 
N_{max} = \frac{1}{30} \sum_{i = 1}^{30} MaxB^{(i)} 

101 
\end{equation} 

102  
103 
For each graph type [CN, PL, ER] and for each graph size [100, 200, ..., 1000], we calculate the average size of the biggest biconnected component for each graph following the \cref{eq:avg_max_num_of_node_in_bcc}. And the result is plotted in \cref{fig:max_nodes_in_graph}. The graph shows that for PL and ER, the largest BCC contains almost all the nodes in the original graph (see the scatter points for PL and ER seems to follow the straight line dividing the Q1 into two halves). 

104  
105 
In the case of CN graphs, even for the graph size of 1000 nodes, the largest BCC on average only have 17 vertices. Therefore, computing the betweenness centrality for such the small BCC will improve the running time of HBC significantly. 

106  
107 
For the raw data, see \cref{tab:max_nodes_in_graph} 

108  
109  
110  
111 
\section{Three real WCN} \label{sec:result_wcn} 

112 
\cref{tab:real_wcn_analysis} shows the highlevel features of the FFG, FFW, Ninux such as the number of vertices and nodes, and the analysis on the biconnected components. It shows that the number of nodes in the largest biconnected component accounted for $27\%$, $46\%$, $35\%$ the total nodes in the FFG, FFW and Ninux network, respectively. 

113  
114 
\begin{table}[hpt] 

115 
\centering 

116 
\begin{tabular}{l c c c } 

117 
\toprule 

118 
& FFG & FFW & Ninux \\ 

119 
\midrule 

120 
number of vertices & 126 & 235 & 126 \\ 

121 
number of edges & 181 & 370 & 143 \\ 

122 
edges/nodes ratio & 1.44 & 1.57 & 1.13 \\ 

123 
density & 0.022 & 0.013 & 0.018 \\ 

124 
number of biconnected components & 67 & 120 & 80 \\ 

125 
max number of nodes in BCC & 35 & 107 & 44 \\ 

126 
\bottomrule 

127 
\end{tabular} 

128 
\caption{Features of the network under analysis} 

129 
\label{tab:real_wcn_analysis} 

130 
\end{table} 

131  
132 
For the real wireless community networks, HBC consistently outperform BBC in the running time. In the case of FFG, where the largest biconnected component includes 27\% of total nodes, we also observe the largest speedup ratio between HBC and BBC  more than 6 times when running on the router. On another hand, for FFW, where the biggest biconnected component has 46\% of total nodes, we only observe the speedup ratio around 3. But in all cases, HBC reduces the time to calculate betweenness centrality for all real topologies. For the running time of HBC and BBC, see more in \cref{fig:CN_real_BC_HBC}; for the speedup ratio, check out \cref{fig:CN_real_speedup_ratio_BC_HBC}. The raw data is provided in \cref{tab:CN_real_router_BC_HBC} and \cref{tab:CN_real_server_BC_HBC} 

133  
134 
\begin{figure}[hpt] 

135 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_router_BC_HBC.png}} 

136 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_server_BC_HBC.png}} 

137 
\caption[Running time of HBC and BBC for real WCNs]{Running time of HBC and BBC for real WCNs. The Xaxis denotes name of Wireless Community Networks.} 

138 
\label{fig:CN_real_BC_HBC} 

139 
\end{figure} 

140  
141 
\begin{figure}[hpt] 

142 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_router_speedup_ratio.png}} 

143 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_server_speedup_ratio.png}} 

144 
\caption[Speed up ratio of HBC over BBC for real WCNs]{Speed up ratio of HBC over BBC for real WCNs. The Xaxis is the name of Wireless Community Networks} 

145 
\label{fig:CN_real_speedup_ratio_BC_HBC} 

146 
\end{figure} 

147  
148 
The raw data is also provided in \cref{tab:CN_real_server_BC_HBC} and \cref{tab:CN_real_router_BC_HBC} 

149  
150  
151  
152  
153  
154 
thesis/ch6_conclusion.tex  

1 
%!TEX root = main.tex 

2  
3 
\chapter{Conclusion} \label{sec:conclusion} 

4  
5 
In this work, we presented the performance evaluation for two algorithms to calculate betweenness centrality on Ubiquiti router. One state of the art algorithm was found out by Brandes in \cite{Brandes01afaster} (called BBC). Another algorithm is the heuristic version aimed to improve the Brandes's algorithm (denoted HBC). On the router, we tested with three real topology of Wireless Community Networks (WCNs), and it was shown that those topology, HBC were able to calculate faster than BBC from 3 to 7 times in \cref{fig:CN_real_speedup_ratio_BC_HBC}. 

6  
7 
We also evaluate the performance for synthetic Community Networks graphs as well. Those graphs were generated algorithmically using \emph{NePa TesT} \cite{Baldesi2016}. The result for the synthetic CN networks is promising. When running on routers for synthetic CN graphs with 1000 nodes, HBC achieves a speed up ratio of 60 times over BBC. On average, when running on the Ubiquiti router, a CN network with 1000 nodes take almost 7 minutes with BBC, and less than 7 seconds with HBC. This is the huge improvement given the limited capacity of the router. Testing with synthetic CN graphs, the result also showed that HBC improved the performance over BBC. 

8  
9 
Besides CN graphs (both the real topology and the randomly generated ones), we also performed the analysis with synthetic Erdos and Power Law graph. HBC takes longer time than BBC to finish the calculation. For example, for Erdos graph with 1000 nodes, BBC takes 24 seconds, and HBC takes 26.4 seconds. HBC even though does not guarantee that the performance would be improved over the BCC, its complexity in worst case is also the same as BBC ($\mathcal{O}(mn)$ for an unweighted graph, and $\mathcal{O}(mn + n^2 \log n)$ for a weighted graph). 

10  
11 
We also carried out an analysis for CN, Erdos and Power Law graphs to see the reason why HBC works for CN graphs, but not for the others. We concluded that the size of the biconnected components (BCCs) is the main factor affecting the performance of HBC. For Funk Feuer Wien  a WCN in Vienna, the experiment shows that even though the biggest biconnected components found contains almost half of nodes in the networks (46\% of total nodes), HBC can still run 3 times faster than BBC. 

12  
13 
So far, we have shown that HBC is an algorithm to choose when we need to calculate betweenness centrality of the Wireless Community Networks real topologies. There are ways that we think can improve the performance further, which may be implemented in the future. One property of WCN is that the topology evolves over time since any nodes can leave and join the network. However, in general, the new incoming or recentlyleft nodes are those in peripheral of the network \cite{Maccari2015175}. We can thus keep track of the betweenness centrality in each biconnected component, and we would only calculate the betweenness centrality again when there are modification in that BCC. 

14  
15 
For this thesis, both algorithms were run on the set of predefined graph. In the future, we can extend the functionality of the program, so that it can extract the realtime topology of the WCN of interest, and perform all the necessary calculation. Also for the router in the testing environment, its sole task was to run the experiment. The router was not busy with forwarding packets or finding routes for packets as it has to do in the reality. It will be interesting to see how the HBC work with the constantly evolving topology, and with the busy router. 
thesis/ideas.tex  

187  187  
188  188 
feel of 
189  189  
190 
\section{Experiment setup} 

191 
Cores is a hardware term that describes the number of independent central processing units in a single computing component (die or chip). 

192  
193 
A Thread, or thread of execution, is a software term for the basic ordered sequence of instructions that can be passed through or processed by a single CPU core. 

194  
195 
Intel® HyperThreading Technology (Intel® HT Technology) delivers two processing threads per physical core. Highly threaded applications can get more work done in parallel, completing tasks sooner. 

190  196  
191  197  
192  198 
\section{Questions} 
...  ...  
230  236 
Then, 1 chapter about the result 
231  237  
232  238 
Bibliography: 20  3.0 
239  
240  
241 
\section{Scope} 

242 
The scope of the paper: 

243 
\begin{itemize} 

244 
\item no selfloop 

245 
\item no multigraph: graph with no parallel edges 

246 
\item unweighted and weighted graph 

247 
\item can apply to connected or disconnected graph. In the disconnected graph, certain vertices are not reachable from vertices in a different component. This yeilds the zero shortest paths counts, and as a result those 2 endpoints contributes 0 toward the betweenness. 

248 
\end{itemize} 

249  
250 
connected graph: since the betweenness centrality is calculated based on shortest paths. In the disconnected graph, certain vertices are not reachable from vertices in a different component. This yields the zero shortest paths counts for betweenness centrality. 

251  
252  
253 
\section{Notation} 

254 
A graph $G=(V,E)$ contains a set of vertices (nodes) $V$ and a set of edges (links) $E$. We assume we assume that al graphs are undirected and connected. \textbf{XXX check the modification for directed graph} 

255  
256 
$m = E$ is the number of edges in graph $G$ 

257  
258 
$n = V$ is the number of vertices in graph $G$ 

259  
260 
Let $\omega$ be a weight function on the edges. We assume that $\omega(e) > 0, e \in E$ for weighted graphs. And we define $\omega(e) = 1, e \in E$ for unweighted graphs. If for $s, t \in V$ and ${s, t} \in E$, then we can also use $\omega(s, t)$ to denote the weight of the edge between $s$ and $t$. 

261  
262 
The \emph{length} of a path is the sum of weights of all edges along the path. We use $d(s, t)$ to denote the \emph{distance} between vertices $s$ and $t$, i.e. the minimum length of any path connecting $s$ and $t$. 

263  
264 
\section{Result} 

265  
266 
\begin{figure}[h] 

267 
\caption{} 

268 
\label{fig:histogram_num_of_nodes_CN200} 

269 
\centering 

270 
\includegraphics[scale=0.6]{images/histogram_num_of_nodes_CN200.png} 

271 
\end{figure} 

272  
273 
\begin{figure}[h] 

274 
\caption{} 

275 
\label{fig:histogram_num_of_nodes_ER200} 

276 
\centering 

277 
\includegraphics[scale=0.6]{images/histogram_num_of_nodes_ER200.png} 

278 
\end{figure} 

279  
280 
\begin{figure}[h] 

281 
\caption{} 

282 
\label{fig:histogram_num_of_nodes_PL200} 

283 
\centering 

284 
\includegraphics[scale=0.6]{images/histogram_num_of_nodes_PL200.png} 

285 
\end{figure} 

286  
287  
288 
\section{Counting shortest paths} 

289 
In the end, I choose not to describe those algorithms in the final version of my thesis 

290  
291 
\textbf{XXX What is the input}: the source vertex $s$ 

292 
\textbf{XXX What is the output?}: 

293  
294 
\subsection{BreadthFirst Search for unweighted graph} 

295 
I understood the algorithm, but how can I explain it to another person? 

296  
297  
298  
299 
\subsection{Dijkstra's algorithm for weighted graph} 

300 
The Dijkstra's algorithm can be modified so that when given a source vertex $s$, the output will be the set of predecessors and the number of shortest paths from $s$ to other $t \neq s \in V$ 

301  
302 
\begin{algorithm} 

303 
\caption{Dijkstra's algorithm} \label{algo:dijkstra} 

304 
\begin{algorithmic}[1] 

305 
\Procedure{Dijkstra for SSSP with source vertex s}{} 

306 
\State $S \leftarrow$ empty stack 

307 
\State $P[w] \leftarrow$ empty list, $w \in V$ 

308 
\State $\sigma[t] \leftarrow 0, t \in V$; $\sigma[s] \leftarrow 1$ 

309 
\State $d[t] \leftarrow 1, t \in V$; $d[s] \leftarrow 0$ 

310 
\State $Q \leftarrow $ empty queue 

311 
\State enqueue $s \rightarrow Q$ 

312 
\While{$Q$ \emph{not empty}} 

313 
dequeue $v \leftarrow Q$ 

314 
push $v \rightarrow S$ 

315  
316 
\EndWhile 

317 
\EndProcedure 

318 
\end{algorithmic} 

319 
\end{algorithm} 

320  
233  321 
\clearpage 
234  322 
\bibliography{references}{} 
235  323 
\bibliographystyle{plain} 
thesis/main.tex  

4  4 
% between \documentclass and \begin{document}. 
5  5  
6  6 
\usepackage[margin=1in]{geometry} % set the margins to 1in on all sides 
7 
\usepackage{emptypage} 

8 
\usepackage{float} % For inserting figures sidebyside 

9 
\usepackage{subfig} 

10  
7  11 
\usepackage{graphicx} % to include figures 
8  12 
\graphicspath{ {report_images/ images/} } % '/' is needed 
13 
\newcommand*{\LargerCdot}{\raisebox{1.5ex}{\scalebox{3}{$\cdot$}}} 

9  14 
\usepackage{caption} 
10 
\usepackage{subcaption} 

11  15 
\usepackage{booktabs} % for beautiful tables 
12  16  
13 
\usepackage{epstopdf} % for the .ps image 

14 
\epstopdfsetup{update} 

15 
\DeclareGraphicsExtensions{.eps} 

16 
\epstopdfDeclareGraphicsRule{.eps}{pdf}{.pdf}{ps2pdf dEPSCrop dNOSAFER #1 \OutputFile} 

17  17 
\usepackage{amsmath} % great math stuff 
18  18 
\usepackage{amsfonts} % for blackboard bold, etc 
19  19  
...  ...  
22  22 
\newtheorem{definition}{Definition}[section] 
23  23  
24  24 
\usepackage{listings} % to add code 
25  
25  26 
\lstset{language=Java} 
26  27 
\renewcommand{\lstlistingname}{Code} % change "Listing 1.1" to "Code 1.1" 
27  28  
28 
\usepackage[utf8]{inputenc} 

29  29 
\usepackage{color} 
30  30  
31  31 
\usepackage{cite} 
...  ...  
75  75 
\catcode`_=12 
76  76 
} 
77  77  
78 
\newcommand{\HRule}{\rule{\linewidth}{0.5mm}} % New command to make the lines in the title page 

78  79 
\usepackage{booktabs} % To thicken table lines 
79  80  
80  81 
\usepackage{algorithm} 
81  82 
\usepackage[noend]{algpseudocode} % For pseudocode 
82  83  
84 
\usepackage[toc]{appendix} 

85  
86 
% \usepackage[lofdepth,lotdepth]{subfig} 

87  
83  88 
\usepackage[noabbrev,capitalise]{cleveref} 
84  89  
90 
% Font for Unicode 

91 
\usepackage[utf8]{inputenc} 

92 
% \usepackage[vietnam,english]{babel} 

93  
94 
\newcommand*{\blankpage}{% 

95 
\vspace*{\fill} 

96 
\begin{center} 

97 
% This page is intentionally left blank. 

98 
\end{center} 

99 
\vspace{\fill}} 

100  
85  101 
\begin{document} 
86  102  
103 
\pagenumbering{gobble} 

87  104 
\include{titlepage} 
105 
\include{acknowledgements} 

106 
\blankpage 

107  
108 
\pagenumbering{roman} 

88  109 
\include{toc} 
89  110  
90  111 
\pagenumbering{arabic} 
91  112 
\include{ch1_introduction} 
113 
\blankpage 

92  114 
\include{ch2_centrality} 
115 
\blankpage 

93  116 
\include{ch3_computation_of_bc} 
94  117 
\include{ch4_experiment_setup} 
118 
\blankpage 

95  119 
\include{ch5_experiment_result} 
120 
\blankpage 

121 
\include{ch6_conclusion} 

122 
\include{appendix} 

96  123  
124 
\addcontentsline{toc}{chapter}{\listfigurename} 

125 
\listoffigures 

126 
\addcontentsline{toc}{chapter}{\listtablename} 

127 
\listoftables 

97  128  
98 
\clearpage 

99  129 
\bibliography{references}{} 
100  130 
% \bibliographystyle{IEEEtranSN} 
101  131 
\bibliographystyle{IEEEtranN} 
102  132  
133 
\cleardoublepage 

103  134 
\end{document} 
thesis/references.bib  

6  6 
Ulrik Brandes}, 
7  7 
title = {Heuristics for Speeding Up Betweenness Centrality Computation}, 
8  8 
booktitle = {2012 International Conference on Privacy, Security, Risk and Trust, 
9 
{PASSAT} 2012, and 2012 International Confernece on Social Computing, 

10 
SocialCom 2012, Amsterdam, Netherlands, September 35, 2012}, 

9 
{PASSAT} 2012}, 

11  10 
pages = {302311}, 
12  11 
year = {2012}, 
13  12 
crossref = {DBLP:conf/socialcom/2012}, 
14 
url = {http://dx.doi.org/10.1109/SocialComPASSAT.2012.66}, 

15  13 
doi = {10.1109/SocialComPASSAT.2012.66}, 
16  14 
timestamp = {Mon, 14 Jan 2013 18:11:28 +0100}, 
17 
biburl = {http://dblp2.unitrier.de/rec/bib/conf/socialcom/PuzisZEDB12}, 

18  15 
bibsource = {dblp computer science bibliography, http://dblp.org} 
19  16 
} 
20  17  
...  ...  
28  25 
location = {Esbjerg, Denmark}, 
29  26 
pages = {191203}, 
30  27 
numpages = {13}, 
31 
url = {http://dx.doi.org/10.1007/9783540899006_20}, 

32  28 
doi = {10.1007/9783540899006_20}, 
33  29 
acmid = {1485471}, 
34  30 
publisher = {SpringerVerlag}, 
...  ...  
55  51 
note = "", 
56  52 
issn = "03788733", 
57  53 
doi = "http://dx.doi.org/10.1016/j.socnet.2007.11.001", 
58 
url = "http://www.sciencedirect.com/science/article/pii/S0378873307000731", 

59  54 
author = "Ulrik Brandes", 
60  55 
keywords = "Betweenness centrality", 
61  56 
keywords = "Algorithms", 
...  ...  
74  69 
note = "Special Issue on Wireless Network Intrusion ", 
75  70 
issn = "00220000", 
76  71 
doi = "http://dx.doi.org/10.1016/j.jcss.2013.06.018", 
77 
url = "http://www.sciencedirect.com/science/article/pii/S002200001300127X", 

78 
author = "Leonardo Maccari and Renato Lo Cigno", 

72 
author = "Leonardo Maccari and Renato {Lo Cigno}", 

79  73 
keywords = "Shortest path betweenness", 
80  74 
keywords = "OLSR", 
81  75 
keywords = "Multihop networks", 
...  ...  
95  89 
note = "Modeling and Performance Evaluation of Wireless AdHoc Networks ", 
96  90 
issn = "15708705", 
97  91 
doi = "http://dx.doi.org/10.1016/j.adhoc.2014.07.016", 
98 
url = "http://www.sciencedirect.com/science/article/pii/S1570870514001474", 

99 
author = "Leonardo Maccari and Renato Lo Cigno", 

92 
author = "Leonardo Maccari and Renato {Lo Cigno}", 

100  93 
keywords = "Mesh networks", 
101  94 
keywords = "Community networks", 
102  95 
keywords = "Network topology", 
...  ...  
117  110 
priority = {2}, 
118  111 
publisher = {American Sociological Association}, 
119  112 
title = {{A Set of Measures of Centrality Based on Betweenness}}, 
120 
url = {http://links.jstor.org/sici?sici=00380431\%28197703\%2940\%3A1\%3C35\%3AASOMOC\%3E2.0.CO\%3B2H}, 

121  113 
volume = {40}, 
122  114 
year = {1977} 
123  115 
} 
...  ...  
132  124 
note = "", 
133  125 
issn = "03788733", 
134  126 
doi = "http://dx.doi.org/10.1016/03788733(78)900217", 
135 
url = "http://www.sciencedirect.com/science/article/pii/0378873378900217", 

136  127 
author = "Linton C. Freeman", 
137  128 
abstract = "The intuitive background for measures of structural centrality in social networks is reviewed and existing measures are evaluated in terms of their consistency with intuitions and their interpretability. Three distinct intuitive conceptions of centrality are uncovered and existing measures are refined to embody these conceptions. Three measures are developed for each concept, one absolute and one relative measure of the centrality of positions in a network, and one reflecting the degree of centralization of the entire network. The implications of these measures for the experimental study of small groups is examined." 
138  129 
} 
...  ...  
147  138 
abstract="This is a note to introduce a new measure of a kind of structural centrality called pairdependency. Pairdependency explicates the centralityrelated notion of the gatekeeper. Moreover, it turns out to be a fundamental structural property of communication networks that provides the basis for the derivation of two standard measures of structural centrality.", 
148  139 
issn="15737845", 
149  140 
doi="10.1007/BF00184720", 
150 
url="http://dx.doi.org/10.1007/BF00184720", 

151  141 
year="1980" 
152  142 
} 
153  143  
...  ...  
162  152 
note = "", 
163  153 
issn = "03788733", 
164  154 
doi = "http://dx.doi.org/10.1016/03788733(91)90017N", 
165 
url = "http://www.sciencedirect.com/science/article/pii/037887339190017N", 

166  155 
author = {Linton C. Freeman and Stephen P. Borgatti and Douglas R. White}, 
167  156 
abstract = "A new measure of centrality, CF, is introduced. It is based on the concept of network flows. While conceptually similar to Freeman's original measure, CB, the new measure differs from the original in two important ways. First, CF is defined for both valued and nonvalued graphs. This makes CF applicable to a wider variety of network datasets. Second, the computation of CF is not based on geodesic paths as is CB but on all the independent paths between all pairs of points in the network." 
168  157 
} 
...  ...  
179  168 
issn = {01464833}, 
180  169 
pages = {6873}, 
181  170 
numpages = {6}, 
182 
url = {http://doi.acm.org/10.1145/2500098.2500108}, 

183  171 
doi = {10.1145/2500098.2500108}, 
184  172 
acmid = {2500108}, 
Also available in: Unified diff