## Revision c6e2ce5a

View differences:

thesis/acknowledgements.tex
1
%!TEX root = main.tex

2
\vspace*{\fill}

3

4
\chapter*{Acknowledgements}

5
This work is done under the supervision of Prof. Renato Lo Cigno Antonio and co-supervior Leonardo Maccari. I would like to express my sincere gratitude for their throughout guidance. They have been immensely patient with my progress, given me a little push every now and then to keep me on track.

6

7
I would like to thank Opera Universita, University of Trento and Erasmus+ program for funding my Master study, giving me a chance to study in a world-class university, and also in University of Edinburgh in an exchange program. Thank you for giving me education.

8

9
Thank you friends, for your supports and encouragement.

10

11
And finally, I want to dedicate these last lines to my family in Vietnam.

12

13
[Written in Vietnamese] Cam on Ba Me da tin tuong con, khong tao ap luc len chuyen hoc hanh cua con, cho con tu do lam moi thu con thich. Va quan trong la luon dung o dang sau de giup do con.

14

15
\vspace*{\fill}

16

17
\clearpage

18

thesis/appendix.tex
1
%!TEX root = main.tex

2
\begin{appendices}

3

4
\chapter{Raw Data from the experiment}

5
This chapter shows the raw analysis from the experiment.

6
\begin{table}[hbp]

7
            \centering

8
            \begin{tabular}{l l l l}

9
                \toprule

10
                Number of nodes in a graph&CN&PL&ER\\

11
                \midrule

12
                100&87.5&1.1&2.4\\

13
                200&181.9&1.0&3.4\\

14
                300&273.0&1.1&5.5\\

15
                400&366.1&1.1&6.5\\

16
                500&477.0&1.0&8.9\\

17
                600&572.3&1.1&9.8\\

18
                700&666.8&1.0&12.4\\

19
                800&763.7&1.0&12.6\\

20
                900&859.2&1.0&15.9\\

21
                1000&983.1&1.0&15.5\\

22
                \bottomrule

23
            \end{tabular}

24
            \caption{Average number of BCCs found for each category of synthetic graph}

25
            \label{tab:num_of_bcc_for_graphs}

26
        \end{table}

27

28

29
        \begin{table}[hbp]

30
            \centering

31
            \begin{tabular}{l l l l}

32
                \toprule

33
                Number of nodes&CN&PL&ER\\

34
                \midrule

35
                100&13.3&99.9&98.6\\

36
                200&18.9&200.0&197.6\\

37
                300&27.6&299.9&295.5\\

38
                400&34.6&399.9&394.5\\

39
                500&23.9&500.0&492.1\\

40
                600&28.5&599.9&591.2\\

41
                700&34.1&700.0&688.6\\

42
                800&37.2&800.0&788.4\\

43
                900&41.7&900.0&885.1\\

44
                1000&17.7&1000.0&985.5\\

45
                \bottomrule

46
            \end{tabular}

47
            \caption{Average number of node in the largest found BCC for each category of synthetic graph}

48
            \label{tab:max_nodes_in_graph}

49
        \end{table}

50

51
    \begin{table}[hbp]

52
        \centering

53
        \begin{tabular}{l l l}

54
            \toprule

55
            Wireless Community Network & HBC & BBC\\

56
            \midrule

57
            Ninux & 4.8 & 1.1 \\

58
            FFW & 22.1 & 7 \\

59
            FFG & 5.3 & 0.97 \\

60
            \bottomrule

61
        \end{tabular}

62
        \caption{Running time of HBC and BBC on the router}

63
        \label{tab:CN_real_router_BC_HBC}

64
    \end{table}

65

66
    \begin{table}[hbp]

67
        \centering

68
        \begin{tabular}{l l l}

69
            \toprule

70
            Wireless Community Network & HBC & BBC\\

71
            \midrule

72
            Ninux & 0.15 & 0.03 \\

73
            FFW & 0.7 & 0.19 \\

74
            FFG & 0.17 & 0.03 \\

75
            \bottomrule

76
        \end{tabular}

77
        \caption{Running time of HBC and BBC on the computer}

78
        \label{tab:CN_real_server_BC_HBC}

79
    \end{table}

80
\end{appendices}

thesis/ch1_introduction.tex
1 1
%!TEX root = main.tex

2 2

3 3
\chapter{Introduction}

4
    People have a strong need for acquiring information. The faster, the more reliable, the more secured way, the more affordable mean to communicate, the better that communication technique is. During the 90s we have seen the boom of the Internet, at that time most devices was connected to the Internet through a cable with an ISP standing in between to provide services. In the late 2000s, connecting to the Internet without plugging anything in had became a norm. Most of the wireless networks that we are familiar with now such as 3G, LTE, 4G, Wi-Fi, they are centralized network, which means those networks rely on the base stations to broadcast signal, or access points in a managed wireless network to provide the service. And then, \textbf{XXX any year?}, Wireless Mesh Networks (WMNs) have merged as a key technology for next-generation wireless networking.

4
    People have a strong need for acquiring information, and they want faster, more reliable, more secured, and more affordable mean to communicate. During the 90s we have seen the boom of the Internet, at that time most devices was connected to the Internet through a cable with an ISP standing in between to provide services. In the late 2000s, connecting to the Internet without plugging anything in became a norm. Most of the wireless networks that we are familiar with now, such as 3G, LTE, 4G, Wi-Fi are centralized networks, which means those networks rely on base stations to broadcast signal, or access points in a managed wireless network to give the service. Wireless Mesh Networks (WMNs) in recent years are emerging as a key technology for next-generation wireless networking. They do not have any centralized control to route and forward packets, but rather all nodes cooperate together to relay each others' packets. The efficient protocol is important than ever in WMNs since it would affect the performance largely. And several papers \cite{Vazquez-Rodas:2013:TCW:2507248.2507257}, \cite{Maccari2014670} have proposed the use of centrality to enhance routing in WMNs.

5 5

6
    We begin, in this introductory chapter with the short overview of WMNs in \cref{sec:introduction_wmns}. Then we will present one application of WMNs in reality - the Wireless Community Networks (WCNs). We will cover the definition plus some challenging problems of WCNs in \cref{sec:introduction_wcns}. The chapter ends with the contribution, and the outline of the thesis.

6
    \emph{Centrality}\footnote{it is also referred to by other terms: centrality indices or centrality measures} is a measurement to quantify how influential a node is, given its position in the network. We concerns with centrality measurements since in WMNs, the ability to find central node can serve many great purposes such as adding more routers around those most central one to balance the load, and to lessen the impact when one of the most central routers fail. Or information about the central node can help with routing, to allow information to be spreaded faster in the network.

7 7

8
\section{Wireless Mesh Networks} \label{sec:introduction_wmns}

9
        Wireless Mesh Network (WMN) is any wireless network having a network topology of either a partial or full mesh topology \cite{zhang2006wireless}. In full mesh topology, every pair of nodes is directly connected with each other, while in partial mesh topology, not all nodes are directly interconnected with each other by other nodes. Eventhough the definition of WMN mentions the full mesh topology, in practice, WMN is usually built over a partial mesh topology. This is due to the limited coverage of signal from wireless devices making it impossible for each node to directly conect with all other nodes.

8
    In this thesis, the main focus is on one specific centrality measurement called \emph{betweenness centrality} $c_B(v)$. Intuitively, a node with high betweenness centrality has a large control over the traffic flowing within the network, given that the information is transfered using the shotest paths. And betweenness centrality defined as the sum of the fraction of total number of shortest paths that pass through that node. The best known algorithm \cite{Brandes01afaster} to calculate $c_B(v)$ runs in quadratic time, making the calculation for a large graph impractical. Another algorithm in focus is heuristic-based \cite{PuzisZEDB12} which demonstrated its effectiveness in several input graphs of choice. The main work of thesis is implementing those forementioned algorithms, and carrying out the performance evaluation for both of them on a computer, and on a low-cost wireless router. We want to know whether using the heuristic-based algorithm to calculate the betweenness centrality for every nodes in topology of WCNs would scale better than using the original algorithm.

9

10
    We begin with a short overview of WMNs in \cref{sec:introduction_wmns}. Then we present one application of WMNs in reality -- the Wireless Community Networks (WCNs). We cover the definition plus some challenging problems of WCNs in \cref{sec:introduction_wcns}. After having all the background knowledge for WMNs and WCNs, the contribution of this thesis for WCNs is laid out in \cref{sec:introduction_contribution}. The outline for the subsequent chapters can be found in \cref{sec:introduction_outline}.

10 11

11
        For classification, WMNs belong to a category of multi-hop wireless network, where information traverses from a source to a destination using two or more wireless hops.

12
\section{Wireless Mesh Networks} \label{sec:introduction_wmns}

13
        A WMN is any wireless network having a mesh topology\footnote{Mesh topology is used to denote any network that every nodes in it participating in the distribution of data}. In full mesh topology, every pair of nodes is directly connected with each other, while in partial mesh topology, not all nodes are directly interconnected with each other. Even though the definition of WMN mentions the full mesh topology, in practice, WMN is usually built over a partial mesh topology. This is due to the limited coverage of signal from wireless devices making it impossible for each node to directly connect with all other nodes.

12 14

13
        The main advantages of a WMN is its capability of fault intolerance. Since WMN is a decentralized wireless network, when several nodes is taking down, the network can find a new path way to link the network together. The term \emph{autonomic system} is normally used to describe properties of WMN.

15
        For classification, WMNs belong to a class of multi-hop wireless network, where information traverses from a source to a destination using two or more wireless hops.

14 16

15
        WMN has several properties:

16
        \begin{itemize}

17
            \item Self-forming

18
            \item Self healing

19
            \item Self optimization

20
            \item Multi hop

21
        \end{itemize}

17
        The main advantages of a WMN is its ability of fault intolerance. Since WMN is a decentralized wireless network, when several nodes fail, the network can find a new path way to link the network together. The term \emph{autonomic system} is often used to describe properties of WMN.

22 18

23
        WMNs find a great applications in many areas \citep{Akyildiz2005445} such as: broadband home networking, enterprise networking, community networking, city-wide wireless coverage, spontaneous mesh network, transporation systems. However, all different areas for application can be classified into 2 categories. Firstly, WMN acts as a backbone network to extend the Internet connectivity for nodes that are not in the coverage of Internet access. This might due to the fact that installing wired infrastructure is too expensive for rural areas, or areas with challenging terrains. Another scenario is in a diaster-stroke area where the current communication facilities (cables, base stations) were destroyed, then WMNs can provide temporary solution for communication. Secondly, WMNs is used to enable the communication between nodes within the mesh network.

19
        WMNs find interesting applications in many areas \citep{Akyildiz2005445} such as: broadband home networking, enterprise networking, community networking, city-wide wireless coverage, spontaneous mesh network, transportation systems. All the different areas of application can be classified into two categories. First, the WMN acts as a backbone network to extend the Internet connectivity for nodes that are not in the coverage of Internet access. This might due to the fact that installing wired infrastructure is too expensive for rural areas, or areas with challenging terrains. Another scenario is in a disaster-stroke area where the current communication facilities (cables, base stations) were destroyed, then WMNs can provide temporary solution for communication. There are many examples for this usage in reality. It has been used in Dharamsala \citep{jardin2006} to provide internet access for refugee monks living in the mountainous area in Dharamsala. \cite{Suzuki2006} introduced \emph{SKYMESH} to provide Internet access in a disaster area. It use helium-filled balloons carrying WiFi transceivers, and together they form the mesh network.

24 20

25
        For the first category - extending the Internet connectivity, there are many examples on its usage in reality. It has been used in Dharamsala \citep{jardin2006} to provide internet access for refugee monks living in the mountainous area in Dharamsala. \cite{Suzuki2006} introduced \emph{SKYMESH} to provide Internet access in a disaster area. It use helium-filled balloons carrying WiFi transceivers, and together they form the mesh network.

21
        The second application of WMNs focus on the communication within the network. For example, in transportation management, \cite{Ahmed2008} proposed a prototype using wireless mesh networks to help with monitoring traffic. The car has sensors to collect data, and the real-time data is gathered using the WMN. Wireless Community Networks are another applications making use of mesh topology. The WCN contains a group of local users setting up and managing a network for themselves. Actually, WCNs can have a gateway to the Internet, and it can act as a backbone to deliver Internet connectivity. However, in reality, most WCNs were set up to promote an alternative communication channel that offers higher degree of privacy and neutrality \cite{Maccari2015175}.WCNs is discussed more in \cref{sec:introduction_wcns}.

26 22

27
        The second usage of WMNs focus on the communication within the network. For example, in transportation management, \cite{Ahmed2008} proposed a prototype using wireless mesh networks to help with monitoring traffic. The car has sensors to collect data, and the real-time data is gathered using the WMN. Wireless Community Networks are another applications of WMNs, where a group of local users set up and manage a network for themselves. WCNs can have a gateway (or not) to the Internet. This WCNs will be discussed more in \cref{sec:introduction_wcns}.

28 23

29 24
\section{Wireless Community Networks} \label{sec:introduction_wcns}

30 25
Application:

31
    Wireless Community Network (WCN) \cite{Jain2003} belongs to wireless mesh network. It also servers 2 purposes as WMN: to provide internet access for those nodes which do not have it, and to create an environment for communication that offer higher privacy.

26
    Wireless Community Network (WCN) \cite{Jain2003} is one application of Wireless Mesh Network. WCN is created by a group of local users to have another channel of communication that is self-managed and offer higher degree of privacy. WCNs have their presents on all continents, but most of them are in Europe and North America. The largest WCN all over the world now is Guifi\footnote{https://guifi.net} in Spain with over 30000 nodes in 2015.

32 27

33
    What

28
    For WCNs, new router can join the existing WCNs independently by following the technical information listed on the Web. Every nodes in WCNs must have a way to know the existence of the new router, or the unavailability of some other router when it leaves the network. There are several popular routing protocols used in WCNs, taking into account the ever-changing topology of WCNs to deal with that problem. They are Optimized Link State Routing (OLSR), Babel, and B.A.T.M.A.N. But probably OLSR is the most widespread routing protocol in used. More thorough discussion for the mentioned protocols can be found in \cite{Pisa2013}.

34 29

35
http://www.npr.org/2006/08/10/5631353/a-wireless-network-for-little-lhasa

30
    In this thesis, we focus on three WCNs: Funk Feuer Wien (FFW)\footnote{https://www.funkfeuer.at/} and Funk Feuer Graz (FFG)\footnote{graz.funkfeuer.at/en/index.html} in Austria and Ninux (NNX)\footnote{wiki.ninux.org/} in Italy. And they all use Optimized Link State Routing Protocol (OLSR), where every nodes in the WCNs have the topology information, and compute the next hop destinations for all nodes in the network using shortest hop forwarding paths.

36 31

37
    \cite{Maccari2015175} presented an argument that closeness centrality can be used

38 32

39 33
\section{Contributions of this thesis} \label{sec:introduction_contribution}

40
    In graph theory, centrality metrics (which will be explored in depth in \cref{sec:centrality}) is used to determine how influential a node is in the network. \cite{Katsaros2010} presented two types of centrality metrics that he considered them to have significant applications in analysing wireless networks. They are degree based centrality, and shortest-path based centrality.

41

42
    \cite{Maccari2015175} proposed to use \emph{closeness group centrality} $C_C(\gamma)$ - one type of degree based centrality - to identify the central group in the WCNs. $\gamma$ is a group of nodes. Knowing the central group, the group with the lowest $C_C(\gamma)$ can be useful in, for example, determining where to place the Internet gateway. Suppose that this WCNs is designed to provide Internet access for its client, then the most central group will give the best average Internet connectivity to the nodes in the network.

43

44
    The type of centrality metrics to choose will depend on the purpose of analysis. \cite{Maccari2014670} also mentioned shortest-path based centrality $C_sp(k)$ should be employed when we want choose a set of nodes that have control a fraction $\eta$ of the entire traffic of the network. \citeauthor{Maccari2014670} gave 3 scenarios on why knowing betweenness centrality would help in resolving those 3 problems. Given that WCNs is not a static network, nodes can come and leave the network as they wish. Therefore, the betweenness centrality would change over time. Moreover, routers do not have enough computational power to excecute complex algorithm. Therefore, to make it feasible for WCNs to detect the set of most influential nodes, the algorithm must run fast enough \textbf{XXX how fast?} on the computationally limited physical device.

45

46
    In this thesis, I test out the performance of shortest-path based betweenness centrality (from now on it will only be called \emph{betweenness centrality} algorithm on the real router. The state-of-the-art for calculating betweenness centrality is $\mathcal{O}(mn + n^2 \log n)$ \citep{Brandes01afaster}. Eventhough it is a huge improvement comparing to $\mathcal{O}(n^3)$, it is still not feasible to compute that centrality for large graph. The second algorithm is the heuristic-based by \cite{PuzisZEDB12}. This algorithm follow a divide-and-conquer design paradigm, the network is divided into bi-connected components. Betweenness centrality is calculated for each bi-connected components with a little bit of modification, and then they are summed up to achieve the same result as the state-of-the-art algorithm. The presentation of both algorithms can be found in \cref{sec:computation_betweenness}

47

48
    Since the algorithm of \citeauthor{PuzisZEDB12} is a heuristic-based, there is no guarantee on the running time. Therefore, I carry out an extensive experiments to compare 2 algorithms. The experiment is run on random graphs: Erdos, Power Law, Community Network \footnote{This is not the real wireless community networks, but the generation algorithm was designed such that the topology is similar to the topology found in WCNs}. And the performance is measured in the virtualized environment with QEMU \footnote{http://wiki.qemu.org}, and on the real router.

49

50
    This is the first time that both algorithms of betweenness centrality has been tested on the real device.

51

52
\section{Outline of this thesis} \label{sec:introduction_outline}

53
    The outline of this paper is as follows. The first two chapters cover all the necessasry background information. Section \cref{sec:centrality} examines the concept of centrality. Since there are numereous measures for centrality, only those centrality measures that are most relevant  will be presented. And betweenness centrality will be discussed in deep. Section \cref{sec:computation_betweenness} presents 2 algorithms to calculate the exact betweenness.

34
    In graph theory, centrality metrics (which is explained in depth in \cref{sec:centrality}) is used to determine how influential a node is in the network. The authors in \cite{Katsaros2010} presented two types of centrality metrics that he considered them to have significant applications in analysing wireless networks. They are degree based centrality, and shortest-path based centrality.

54 35

55
    The next two chapters deals with the setting up and results of the simulation. Details on the dataset, repetition will be provided in \cref{sec:sim_scenario}. \cref{sec:sim_experiment} summarises the result of the simulation.

36
    The authors in \cite{Maccari2015175} proposed to use \emph{closeness group centrality} $C_C(\gamma)$ -- one type of degree based centrality -- to identify the central group in the WCNs. $\gamma$ is a group of nodes. Knowing the central group, the group with the lowest $C_C(\gamma)$ can be useful in, such as, determining where to place the Internet gateway. Suppose that this WCNs is designed to provide Internet access for its client, then the most central group will give the best average Internet connectivity to the nodes in the network.

56 37

57
\clearpage

38
    The type of centrality metrics to choose depends on the purpose of analysis. \cite{Maccari2014670} also mentioned shortest-path based centrality $C_{sp}(k)$ should be employed when we want choose a set of nodes that have control a fraction $\eta$ of the entire traffic of the network. There are 3 scenarios in which betweenness centrality would be employed. Given that WCNs is not a static network, nodes can join and leave the network as they wish. Moreover, there is no single machine managing the whole network, but rather every router has the whole information of the topology and cooperates together to keep the information flowing freely in the network. In sum, we know that betweenness centrality is a useful measure for WCN, and that computing betweenness centrality must not be too demanding for the routers since routers do not have enough computational power to execute as quick as the normal computer. Therefore, to make it feasible for WCNs to detect the set of most influential nodes, the algorithm must run fast enough on the computationally limited physical device.

58 39

59
\section{Scope}

60
        The scope of the paper:

61
        \begin{itemize}

62
            \item no self-loop

63
            \item no multigraph: graph with no parallel edges

64
            \item unweighted and weighted graph

65
            \item can apply to connected or disconnected graph. In the disconnected graph, certain vertices are not reachable from vertices in a different component. This yeilds the zero shortest paths counts, and as a result those 2 endpoints contributes 0 toward the betweenness.

66
        \end{itemize}

40
    In this thesis, I test the performance of shortest-path based betweenness centrality (from now on it is called \emph{betweenness centrality} algorithm on a real router typically used in WCNs. The state-of-the-art for calculating betweenness centrality is $\mathcal{O}(mn + n^2 \log n)$ \citep{Brandes01afaster}. Even though it is a huge improvement comparing to $\mathcal{O}(n^3)$, it is still not feasible to compute centrality in large graphs. The second algorithm is the heuristic-based presented in \cite{PuzisZEDB12}. This algorithm follow a divide-and-conquer design paradigm, the network is divided into bi-connected components. Betweenness centrality is calculated for each bi-connected components with a bit of modification, and then they are summed up to achieve the same result as the latest algorithm. The presentation of both algorithms can be found in \cref{sec:computation_betweenness}.

67 41

68
    connected graph: since the betweenness centrality is calculated based on shortest paths. In the disconnected graph, certain vertices are not reachable from vertices in a different component. This yields the zero shortest paths counts for betweenness centrality.

42
    Since the algorithm of \cite{PuzisZEDB12} is heuristic-based, there is no guarantee on the running time.  I carry out an extensive experiments to compare the two algorithms. The experiment is run on synthetic graphs: Erdos, Power Law, Community Network\footnote{This is not the real wireless community networks, but the generation algorithm was designed such that the topology is similar to the topology found in WCNs}. And it is also run on three real wireless community networks: Funk Feuer Wien (FFW)\footnote{https://www.funkfeuer.at/} and Funk Feuer Graz (FFG)\footnote{graz.funkfeuer.at/en/index.html} in Austria, and Ninux\footnote{wiki.ninux.org/} in Italy. And the performance is measured in the virtualized environment with QEMU\footnote{http://wiki.qemu.org}, and on the real router. More details can be found in \cref{sec:experiment_input}.

69 43

44
    This is the first time that both algorithms for calculating betweenness centrality has been tested on the real device. They both have the same polynomial time-bound, but we show later that heuristic modification can greatly improve the running time for the Community Network graphs.

70 45

71
\section{Notation}

72
    A graph $G=(V,E)$ contains a set of vertices (nodes) $V$ and a set of edges (links) $E$. We assume we assume that al graphs are undirected and connected. \textbf{XXX check the modification for directed graph}

73 46

74
    $m = |E|$ is the number of edges in graph $G$

47
\section{Outline of the thesis} \label{sec:introduction_outline}

48
    The outline of this paper is as follows. The first two chapters cover all the necessary background information. Section \cref{sec:centrality} examines the concept of centrality. Since there are numerous measures for centrality, only those centrality measures that are most relevant is presented. And betweenness centrality is going to be discussed in deep since it is the measure that we we choose to evaluate the performance for this thesis. Section \cref{sec:computation_betweenness} presents two algorithms to calculate the exact betweenness. One is the state-of-the-art algorithm, with the mathematically proven upper bound at $\mathcal{O}(mn + n^2 \log n)$. Another algorithm is based on heuristic for speeding up the betweenness centrality computation. And since there is no upper bound for the heuristic algorithm, we want to compare both algorithms to see the performance of both.

75 49

76
    $n = |V|$ is the number of vertices in graph $G$

50
    The next two chapters deals with the experimental to evaluate performance of these two forementioned algorithms. \cref{sec:experiment} gives detail on the experiment setup. This includes the description of the synthetic graphs, and real WCNs topology that we use in the experiment. \cref{sec:experiment_result} summarises the results of the experiment and gives the justification for the results.

77 51

78
    Let $\omega$ be a weight function on the edges. We assume that $\omega(e) > 0, e \in E$ for weighted graphs. And we define $\omega(e) = 1, e \in E$ for unweighted graphs. If for $s, t \in V$ and ${s, t} \in E$, then we can also use $\omega(s, t)$ to denote the weight of the edge between $s$ and $t$.

79 52

80
    The \emph{length} of a path is the sum of weights of all edges along the path. We use $d(s, t)$ to denote the \emph{distance} between vertices $s$ and $t$, i.e. the minimum length of any path connecting $s$ and $t$.

thesis/ch2_centrality.tex
1 1
%!TEX root = main.tex

2 2

3 3
\chapter{Centrality} \label{sec:centrality}

4
    Centrality\footnote{it is also refered to by other terms: centrality indices or centrality measures} is one the fundamental and useful measure in social network analysis to identify the influential vertices or edges in the network. However, how the vertices (or edges) are considered to be important depend on different centrality measures. In this thesis, we are concerned mainly with betweenness centrality, which belongs to the shortest-path based centrality according to the classification by \cite{Brandes:2005:NAM:1062400}.

4
    Centrality\footnote{it is also referred to by other terms: centrality indices or centrality measures} is one the fundamental and useful measure in social network analysis to identify the influential vertices or edges in the network. However, how the vertices (or edges) are considered to be important depend on different centrality measures. In this thesis, we are concerned mainly with betweenness centrality, which belongs to the shortest-path based centrality according to the classification by \cite{Brandes:2005:NAM:1062400}.

5 5

6
    Since \cite{Bavelas1948} first described the concept of centrality, many centrality indices were defined and used for different purposes. Since there is no single centrality indices that are supreme over all others, we should consider of which centrality indices to use in certain application.

6
    Since \cite{Bavelas1948} first described the concept of centrality, many centrality indices were defined and used for different purposes. Since there is no single centrality indices that are supreme over all others, we should consider  which centrality indices to use in certain application.

7 7

8
    In \cref{sec:centrality_definition} we will begin with the definition of centrality. Then in \cref{sec:centrality_shortest_path_based} we will go deeper on the family of shortest-path based centrality, starting from the stress centrality, moving on to the betweenness centrality, and finally the relative betweenness centrality.

8
    In \cref{sec:centrality_definition} we begin with the definition of centrality. Then in \cref{sec:centrality_shortest_path_based} we go deeper on the family of shortest-path based centrality, starting from the stress centrality, moving on to the betweenness centrality, and finally the relative betweenness centrality.

9 9

10 10
    For those who wants to have a broader knowledge on the centrality, then \citep{Brandes:2005:NAM:1062400} classified centrality for vertices and edges into nine families, and summarized them in great detail.

11 11

12
    In different paper, different type of notations are used for the same type of concepts. Hence, to keep this report coherent, I will use the same of notation for the same concept, and this might different from the notation used in the original papers.

12
    In different paper, different type of notations are used for the same type of concepts. Hence, to keep this report coherent, I use the same of notation for the same concept, and this might different from the notation used in the original papers.

13 13

14 14
\section{Definition of centrality} \label{sec:centrality_definition}

15
    Centrality index is defined by assigning the score representing the importance of nodes or edges in the network. When changing the way of assigning the score, we arrive at different centrality indices. But the underlying concept is that centrality indices must depend only on the structure of the network. \citet{Brandes:2005:NAM:1062400} pointed out the fact that even though there is no strict definition for centrality, we can still find the common ground shared by different centrality indices, and he coined the term \emph{Structure Index}

15
    Centrality index can be roughly defined as a function assigning score representing the importance for nodes or edges in the network. When changing the way of assigning the score for each vertex, we arrive at different centrality indices. However, we cannot have arbitrary function assigning random values for nodes in graph, and call those values the centrality.

16

17
    \citet{Brandes:2005:NAM:1062400} pointed out that even though there is no strict definition for centrality, we can still find the common ground shared by different centrality indices. The underlying concept is that centrality indices must depend only on the structure of the network. To put it simply, centrality indices a way of assigning a score for each vertex (or edge) in the network, based completely on structure of the graph. Formally, he coined the term \emph{Structural Index}, and stated that a centrality index $c$ is required to be a structural index.

16 18

17 19
    \theoremstyle{definition}

18
    \begin{definition}{Structure Index}

20
    \begin{definition}{\textbf{Structural Index}}

19 21
        Let $G=(V,E)$ be a weighted, directed or undirected multigraph and let $X$ represent the set of vertices or edges of $G$, respectively. A real-valued function $s$ is called a structural index if and only if the following condition is satisfied: $\forall x \in X: G \simeq H \Rightarrow s_G(x) = s_H(\phi(x)),$ where $s_G(x)$ denotes the value of $s(x)$ in G

20 22
    \end{definition}

21 23

24
    Before beginning with the main section describing betweenness centrality, a simple centrality measure called \emph{degree centrality} is going to be presented, so that reader can have an easy-to-understand example on what is centrality. The score of degree centrality for each node is equal to the number of connection that node has. See \cref{fig:degree_centrality}. Then later, we move on to more complex centrality measurements based on shortest-paths.

25

26
    \begin{figure}[hbp]

27
        \centering

28
        \includegraphics{images/line_graph_degree_horizontal.png}

29
        \caption[Example for degree centrality]{Degree centrality score for each node}

30
        \label{fig:degree_centrality}

31
    \end{figure}

32

33

22 34
\section{Shortest-path based centrality} \label{sec:centrality_shortest_path_based}

23
    This section presents two types of shortest-path based centrality. Shortest paths are defined on both vertices and edges. Basically those two centrality indices are based on the number of shortest paths between a pair of vertices or edges. \cref{sec:centrality_stress_centrality} and \cref{sec:centrality_bc} will present those two measures in deep.

35
    This section presents two types of shortest-path based centrality: \emph{stress centrality}, and \emph{betweenness centrality}. Basically those two centrality indices are based on the set of shortest paths between a pair of vertices or edges. \cref{sec:centrality_stress_centrality} and \cref{sec:centrality_bc} presents those two measures in deep.

24 36

25
    In communication, information does not only flow in the shortest path, where the shortest path might represent the path where the total traverse time is shortest. However, in a lot of application, shortest path are chosen to forward information through the network. For example, in \textbf{XXXX is OLSR use shortest path to forward packets in the network}.

37
    In communication, information does not only flow in the shortest path, where the shortest path might represent the path where the total traversing time is shortest. However, in a lot of application, shortest path are chosen to forward information through the network. For example, OLSR routing protocol provides shortest path routes for all nodes in the network. And OLSR routing protocol is used widely in Wireless Community Networks -- the network that we want to apply centrality to improve their existing routing protocol.

26 38

27 39
    \subsubsection{Stress Centrality} \label{sec:centrality_stress_centrality}

28
        The concept was  by \cite{Shimblel1953}.

40
        The concept was presented by \cite{Shimblel1953}, the idea is the know quantify how much information is flowing through a vertex in a network. With that definition, if a vertex $v$ lies on more shortest paths, then more information is passed through $v$ and it must handle a higher workload, or stress''. Formally, it is defined in \cref{eq:stress_centrality_long}. It is the summation over all possible source $s$, and all possible target $t$ that are different than $v$.

41

42
        \begin{equation}

43
            \label{eq:stress_centrality_long}

44
            c_S(v) = \sum_{s \neq v} \sum_{t \neq v} \sigma_{st}(v)

45
        \end{equation}

46
        where $\sigma_{st}(v)$ denotes the number of shortest paths containing $v$. To simplify the notation for \cref{eq:stress_centrality_long}, we might write it as follow:

47

48
        \begin{equation}

49
            \label{eq:stress_centrality}

50
            c_S(v) = \sum_{s \neq t \neq v} \sigma_{st}(v)

51
        \end{equation}

52
        However for the clarity, I will stick with the notation in \cref{eq:stress_centrality_long}.

29 53

30
        Different type of network (such as in wireless community network, in social networks) calls for different centrality indices.

54
        Note that the formula for stress centrality $c_S(v)$ does not include the shortest paths that start or end in $v$. And the stress centrality is calculated for all shortest paths between any pair of vertices.

31 55

32
        \textbf{XXX Find out the application of stress centrality}

33 56

34
        Formally:

57
    \subsubsection{Betweenness Centrality} \label{sec:centrality_bc}

58
        The concept of \emph{betweenness centrality} was introduced independently by \citet{Freeman1977} and \citet{Anthonisse1971}. It can be viewed as a relative version of stress centrality. Here we first define the betweenness centrality, then continue with the motivation for betweenness centrality and its application.

59

60
        We define $\delta_{st}(v)$ - the

61
            \emph{pair-dependency}\footnote{In \cite{Freeman1978215}, the term pair-dependency is equivalent with the term \emph{dependency} used in \cite{Brandes01afaster}. To keep consistency, in this thesis, the definition of pair-dependency and dependency follow the \cite{Brandes01afaster} }

62
        of a pair $s, t \in V$ on an intermediary $v \in V$.

35 63

36 64
        \begin{equation}

37
            \label{eq:stress_centrality_long}

38
            c_S(v) = \sum_{s \neq v \in V} \sum_{t \neq v \in V} \sigma_{st}(v)

65
            \label{eq:pair_dependency}

66
            \delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}}

39 67
        \end{equation}

68
        where $\sigma_{st}(v)$) is the number of shortest paths from $s$ to $t$ that are passing through $v$. And $\sigma_{st}$ is the number of shortest paths from $s$ to $t$. $\delta_{st}(v)$ represents the probability that vertex $v$ falls randomly on any of the shortest paths connecting $s$ and $t$. Assume that the communication in the network follows the shortest path, then $\delta_{st}(v)$ can be interpreted as the probability that vertex $v$ can involve, intercept, enhance or inhibit the communication between $s$ and $t$.

40 69

41
        where $\sigma_{st}(v)$ denotes the number of shortest paths containing $v$.

70
        The betweenness centrality of a single vertex $c_B(v)$ is defined as:

71

72
        \begin{equation}

73
            \label{eq:betweenness_centrality}

74
            c_B(v) = \sum_{s \neq v} \sum_{t \neq v} \delta_{st}(v) = \sum_{s \neq v} \sum_{t \neq v} \frac{\sigma_{st}(v)}{\sigma_{st}}

75
        \end{equation}

42 76

43
        To simplify the notation for \cref{eq:stress_centrality_long}, we might write it as follow:

77
        From the original formula for betweenness centrality in \cref{eq:betweenness_centrality}, several variants for betweenness centrality was introduced in \cite{Brandes2008136}, such as edge betweenness centrality, group betweenness centrality, etc. The one we are mostly interested in is the betweenness centrality with endpoints vertices included. It means that the $c_S(v)$ take into account even the shortest paths starting or ending with $v$, and it doesn't require the source $s$ or the target vertex $t$ to be different from $v$. \cref{eq:betweenness_centrality_endpoints_inclusion} shows the formula for betweenness centrality with source and target included.

44 78

45 79
        \begin{equation}

46
            \label{eq:stress_centrality}

47
            c_S(v) = \sum_{s \neq t \neq v \in V} \sigma_{st}(v)

80
            \label{eq:betweenness_centrality_endpoints_inclusion}

81
            c_B(v) = \sum_{s \in V} \sum_{t \in V} \delta_{st}(v) = \sum_{s \in V} \sum_{t \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}

48 82
        \end{equation}

49 83

50
    \subsubsection{Betweenness Centrality} \label{sec:centrality_bc}

84
    \subsubsection{Relative Betweenness Centrality}

85
        After introducing betweenness centrality, \citeauthor{Freeman1977} presented another measurement \cite{Freeman1977}, called \emph{relative betweenness centrality} under the argument that betweenness centrality cannot be used to compare the influential of vertices from network of different size (e.g. different number of nodes).

86

87
        He argued that for vertices $v, u$ to have the same betweenness centrality only mean that they have the same potential for control in absolute terms. That means they can facilitate or inhibit the same number of communications. Note, we implicitly assume that all communications are conducted along shortest paths. However, the $c_B$ does not show the relative potential for control within the network. \cref{fig:bc_vs_bc_relative} illustrate that even though $c_B(v) = c_B(u_i) = 3, i = 1, 2, 3, 4$, the potential for control of vertex $v$ is much larger than vertex $u_i$. For example, removing vertex $v$ and the network would be disconnected and no communication can happen between vertices. Therefore, $v$ have a total control of the network. Meanwhile, removing any $u_i$ does not have that same disastrous effect since each $u_i$ only control part of the communications between pair of vertices.

88

89
        \begin{figure}[h]

90
            \centering

91
                \begin{tabular}{c c}

92
                    \subfloat[fig 1][BC]{\includegraphics[scale=0.3]{images/star_3_bc.png}} &

93
                    \subfloat[fig 2][BC]{\includegraphics[scale=0.3]{images/star_6_bc.png}} \\

94
                    \subfloat[fig 1][Relative BC]{\includegraphics[scale=0.3]{images/star_3_bcrelative.png}} &

95
                    \subfloat[fig 2][Relative BC]{\includegraphics[scale=0.3]{images/star_6_bcrelative.png}} \\

96
                \end{tabular}

97
            \caption[Comparision for Betweenness Centrality and Relative Betweenness Centrality]{

98
                Comparision for Betweenness Centrality and Relative Betweenness Centrality: We see that even though the red nodes have the same betweenness centrality score, it is clearly that the red node of the left graph has more control over the traffic flow within the network. For example, when the red node of the left graph is destroyed, the graph is disconnected, and no information can flow between blue nodes. On the other hands, if any red node of the right graph is cut out from the graph, the graph is still connected. The relative betweenness centrality can reflect the important of the red node for both graphs better, by setting it to $1$ for the left graph, and $0.2$ for the right graph.

99
            }

100
            \label{fig:bc_vs_bc_relative}

101
        \end{figure}

102

103
        When a vertex of interest $v$ is the center node of the star graph, just as \cref{fig:bc_vs_bc_relative} (a), its betweenness centrality score is the biggest score that a graph with $n$ nodes can have. And the maximum betweenness centrality score that a vertex can achieve is shown be to:

104
        \begin{equation}

105
            \label{eq:max_bc_score}

106
            \max c_B(v) = \frac{n^2 - 3n + 2}{2}

107
        \end{equation}

51 108

52
            The concept of \emph{betweenness centrality} was introduced independently by \citet{Freeman1977} and \citet{Anthonisse1971}. It can be viewed as a relative version of stress centrality. Here we will first define the betweenness centrality, then we will go on with the motivation for betweenness centrality and its application.

53

54
            We define $\delta_{st}(v)$ - the

55
                \emph{pair-dependency}

56
                \footnote{In \cite{Freeman1978215}, the term pair-dependency is equivalent with the term \emph{dependency} used in \cite{Brandes01afaster}. To keep consistency, in this thesis, the definition of pair-dependency and dependency follow the \cite{Brandes01afaster} }

57
            of a pair $s, t \in V$ on an intermediary $v \in V$.

58

59
            \begin{equation}

60
                \label{eq:pair_dependency}

61
                \delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}}

62
            \end{equation}

63

64
            where $\sigma_{st}(v)$) is the number of shortest paths from $s$ to $t$ that are passing through $v$. And $\sigma_{st}$ is the number of shortest paths from $s$ to $t$. $\delta_{st}(v)$ represents the probability that vertex $v$ falls randomly on any of the shortest paths connecting $s$ and $t$. Assume that the communication in the network follows the shortest path, then $\delta_{st}(v)$ can be interpreted as the probability that vertex $v$ can involve, intercept, enhance or inhibit the communication between $s$ and $t$.

65

66
            The betweenness centrality of a single vertex $c_B(v)$ is defined as:

67

68
            \begin{equation}

69
                \label{eq:betweenness_centrality}

70
                c_B(v) = \sum_{s \neq t \neq v \in V} \delta_{st}(v) = \sum_{s \neq t \neq v \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}

71
            \end{equation}

72

73
             introduced  \cite{Freeman1977} \textbf{XXX} the betweenness centrality to address the problem of being unable to compare the potential for control for vertices in different networks with stress centrality. For example, in \cref{fig:stress_vs_bc_centrality}, the stress centrality for vertices $v, u_i$ are the same. But when removing one vertex $v$, all vertices in top row of the graph are disconnected from all the vertices in the bottom row. For the network with $u_i$, for example, removing one or two vertices $u_1, u_2$ does not disconnect the network. The information can still flow freely from the top row to the bottom row following the path passing through the remaining vertex $u_3$. Therefore, using stress centrality alone, we cannot decide on whether some node are more critical for the network than the others, and betweenness centrality came up as a measure to capture the potential for control a vertex has on the network.

74

75
            \begin{figure}[h]

76
                \caption{

77
                    $c_S(u_i) = 16$ and $c_B(u_i) = \frac{1}{3}, i = 1, 2, 3$ and $c_S(v) = 16$ but $c_B(v) = 1$. It shows that stress centrality cannot be used to determine to potential for control a vertex has. This example is taken from \cite{Brandes:2005:NAM:1062400}

78
                }

79
                \label{fig:stress_vs_bc_centrality}

80
                \centering

81
                \includegraphics[scale=1.0]{images/stress_vs_bc_centrality.png}

82
            \end{figure}

83

84
            \begin{equation}

85
                \label{eq:max_stress_centrality}

86
                \max c_B(v) = \frac{n^2 - 3n + 2}{2}

87
            \end{equation}

88

89
            \textbf{XXX The application of BC}

90

91
        \subsubsection{Relative Betweenness Centrality}

92
            The so-called relative betweenness centrality measure $c'_B(v)$ was also introduced by \cite{Freeman1977}:

93

94
            \begin{equation}

95
                \label{eq:bc_relative}

96
                c'_B(v) = \frac{2 c_B(v)}{n^2 - 3n + 2}

97
            \end{equation}

98

99
            where $c_B(v)$ is the betweenness centrality defined above, and $n$ is the number of vertices in the network.

100

101
            \citeauthor{Freeman1977} argued that for vertices $v, u$ to have the same betweenness centrality only mean that they have the same potential for control in absolute terms. That means they can facilitate or inhibit the same number of communitions. Note, we implicitly assume that all communications are conducted along shortest paths. However, the $c_B$ does not show the relative potential for control within the network. \cref{fig:bc_vs_bc_relative} illustrate that even though $c_B(v) = c_B(u_i) = 3, i = 1, 2, 3$, the potential for control of vertex $v$ is much larger than vertex $u_i$. For example, removing vertex $v$ and the network will be disconnected and no communication can happen between vertices. Therefore, $v$ have a total control of the network. Meanwhile, removing any $u_i$ does not have that same disastrous effect since each $u_i$ only control part of the communications between pair of vertices.

102

103
            \begin{figure}[h]

104
                \caption{

105
                    $c_B(v) = 3$ and $c'_B(v) = 1$, and $c_B(u_i) = 3, i = 1, 2, 3$ but $c'_B(u_i) = 0.2$. It shows that the same betweenness centrality for vertices $v, u_i$ for 2 different networks does not equal to the same potential for control for their respective networks.

106
                }

107
                \label{fig:bc_vs_bc_relative}

108
                \centering

109
                \begin{subfigure}{.45\textwidth}

110
                  % \centering

111
                  \includegraphics[scale=0.3]{images/star_3.png}

112
                  \caption{A subfigure}

113
                  \label{fig:sub1}

114
                \end{subfigure}%

115
                \begin{subfigure}{.45\textwidth}

116
                  \centering

117
                  \includegraphics[scale=0.3]{images/star_6.png}

118
                  \caption{A subfigure}

119
                  \label{fig:sub2}

120
                \end{subfigure}

121
            \end{figure}

109
        The so-called \emph{relative betweenness centrality} measure $c'_B(v)$ is the fraction between thet betweenness centrality $c_B(v)$ in \cref{eq:betweenness_centrality} divided by its possible maximum score. See \cref{eq:bc_relative} for the formula of relative betweenness centrality $c'_B(v)$:

122 110

111
        \begin{equation}

112
            \label{eq:bc_relative}

113
            c'_B(v) = \frac{2 c_B(v)}{n^2 - 3n + 2}

114
        \end{equation}

115
        where $c_B(v)$ is the betweenness centrality defined in \cref{eq:betweenness_centrality}, and $n$ is the number of vertices in the network.

thesis/ch3_computation_of_bc.tex
4 4
    Recall the definition of betweenness centrality:

5 5

6 6
    \begin{equation*}

7
        \label{eq:betweenness_centrality}

8 7
        c_B(v) = \sum_{s \neq t \neq v \in V} \delta_{st}(v) = \sum_{s \neq t \neq v \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}

9 8
    \end{equation*}

10 9

11 10
    Traditionally, just by looking at the equation, we normally divide the task of computing betweenness centrality into 2 steps:

12 11

13 12
    \begin{itemize}

14
        \item First we need to compute tables with length and number of shortest paths between all pairs of vertices in the network.

15
        \item Second for each vertex $v$, we need to consider all the pair $s, t$, we use the tables calculated above to determine the fraction of shortest paths between $s, t$ that are passing through $v$. Summing all the fraction for every possible $s, t$ and we arrive at the $c_B(v)$. This way of summing is referred to by the term \emph{naive summation of pair-dependencies}

13
        \item First we need to compute tables with length and number of shortest paths between all pairs of vertices in the network;

14
        \item Second for each vertex $v$, we need to consider all the pair $s, t$, we use the tables calculated above to determine the fraction of shortest paths between $s, t$ that are passing through $v$. Summing all the fraction for every possible $s, t$ and we arrive at the $c_B(v)$. This way of summing is referred to by the term \emph{naive summation of pair-dependencies}.

16 15
    \end{itemize}

17 16

18
    The chapter starts with presenting some important concept in \cref{sec:computation_important_concepts}. Then \cref{sec:computation_counting_shortest_paths} gives an overview on classic ways to find the shortest paths. And \cref{sec:computation_summing_pair_dependencies} introduces 2 ways to sum all the pair-dependencies: (i) the naive way, (ii) the faster approach to accummulate pair-dependencies gradually by \cite{Brandes01afaster}. Up until now, the approach in \cite{Brandes01afaster} is the state-of-the-art in calculating the exact betweenness centrality, and it is implemented in many standard graph libraries such as

19
        \emph{networkx} \footnote{https://networkx.github.io/},

20
        \emph{Boost Graph Library} \footnote{www.boost.org/libs/graph/doc/}

21
    The \cref{sec:computation_summary} intends to summarize the result of two previous section with \cref{tab:complexity_of_algorithms} recaping the complexity of some algorithms used in determining betweenness centrality.

17
    The chapter starts with presenting some important concept in \cref{sec:computation_important_concepts}. Then \cref{sec:computation_counting_shortest_paths} gives an overview on classic ways to find the shortest paths. And \cref{sec:computation_summing_pair_dependencies} introduces 2 ways to sum all the pair-dependencies: (i) the naive way, (ii) the faster approach to accumulate pair-dependencies gradually by \cite{Brandes01afaster}. Up until now, the approach in \cite{Brandes01afaster} is the state-of-the-art in calculating the exact betweenness centrality, and it is implemented in many standard graph libraries such as

18
        \emph{networkx}\footnote{https://networkx.github.io/},

19
        \emph{Boost Graph Library}\footnote{www.boost.org/libs/graph/doc/}

20

21
    The \cref{sec:computation_summary} intends to summarize the result of two previous section by recapping the complexity of some algorithms used in determining betweenness centrality.

22 22

23 23
    After having a throughout background, \cref{sec:computation_heuristic} presents the heuristic way to calculate also the exact betweenness centrality by \cite{PuzisZEDB12}. However, the complexity of the Heuristic Betweenness Centrality is not guaranteed to perform better than the algorithm of \cite{Brandes01afaster}, its performance depends on the topology.

24 24

25 25
\section{Important concepts} \label{sec:computation_important_concepts}

26

26 27
    Define the set of \emph{predecessors} of a vertex $v$ on the shortest path from $s$ as:

27 28

28 29
    \begin{equation} \label{eq:predecessor}

......
46 47
    The APSP can be computed straightforwardly by applying algorithm for SSSP $n$ times, for all vertex $s \in V$. The resulting time complexity thus will be $\mathcal{O}(mn + n^2)$, and $\mathcal{O}(mn + n^2 \log n)$ respectively. In the sparse graph, it is a good practice to solving the SSSP multiple times. However, in non-sparse graph, where the number of edges is close to the maximal number of edges, i.e. $m \simeq n(n-1) \simeq n^2$, then it is better to use Floyd/Warshall algorithm with the complexity of $\mathcal{O}(n^3)$.

47 48

48 49

49
    \textbf{XXX What is the input}: the source vertex $s$

50
    \textbf{XXX What is the output?}:

50
\section{Summing all the pair-dependencies} \label{sec:computation_summing_pair_dependencies}

51
    \subsubsection*{Naive ways}

52
        In order to calculate the betweenness centrality for vertex $v$, we have to sum all the pair dependencies $\delta_{st}(v)$ for all possible source $s$ and target $t$ that are different from $v$. The first part of the algorithm calculates all the distance $d$ between all pair of vertices. Vertex $v$ lies on the shortest path between a pair $(s, t)$ if $d(s, t) = d(s, v) + d(v, t)$. And the number of shortest path from $s$ to $t$ passing through $v$ is calculated as $\sigma_{st}(v) = \sigma_{sv} \cdot \sigma_{vt}$.

53

54
        Therefore, summing of all possible $\delta_{st}(v)$ yields $\mathcal{O}(n^2)$, and we have the betweenness centrality per vertex $v$: $c_B(v)$. Doing for all other vertices in the network, and it yields $\mathcal{O}(n^3)$.

51 55

52
    \subsection{Breadth-First Search for unweighted graph}

53
        I understood the algorithm, but how can I explain it to another person?

56
    \subsubsection*{Gradually accumulation of pair-dependencies} \label{sec:computation_accumulation_pair_dependencies}

57
        \citeauthor{Brandes01afaster} noticed that there is a better way to sum all the pair dependencies \cite{Brandes01afaster}. The basic way is to finish completely the first part of finding shortest paths and distance between all pair of vertices, and then start with the summation of pair dependencies. The better way is achieved by exploiting the recursive relation of the \emph{dependency} $\delta_{s\LargerCdot}(v)$ of a vertex $s \in V$ on a single vertex $v \in V$, defined as:

54 58

59
        \begin{equation}

60
            \label{eq:dependency}

61
            \delta_{s\LargerCdot}(v) = \sum_{t \in V} \delta_{st}(v)

62
        \end{equation}

55 63

64
        \cite{Brandes01afaster} proved that the \emph{dependency} of $s \in V$ on any $v \in V$ obeys:

56 65

57
    \subsection{Dijkstra's algorithm for weighted graph}

58
        The Dijkstra's algorithm can be modified so that when given a source vertex $s$, the output will be the set of predecessors and the number of shortest paths from $s$ to other $t \neq s \in V$

66
        \begin{equation}

67
            \label{eq:partial_sum_dependency}

68
            \delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w))

69
        \end{equation}

59 70

60
        \begin{algorithm}

61
        \caption{Dijkstra's algorithm} \label{algo:dijkstra}

62
        \begin{algorithmic}

63
            \Procedure{Dijkstra for SSSP with source vertex s}{}

64
                \State $S \leftarrow$ empty stack

65
                \State $P[w] \leftarrow$ empty list, $w \in V$

66
                \State $\sigma[t] \leftarrow 0, t \in V$; $\sigma[s] \leftarrow 1$

67
                \State $d[t] \leftarrow -1, t \in V$; $d[s] \leftarrow 0$

68
                \State $Q \leftarrow$ empty queue

69
                \State enqueue $s \rightarrow Q$

70
                \While{$Q$ \emph{not empty}}

71
                    dequeue $v \leftarrow Q$

72
                    push $v \rightarrow S$

71
        The set of predecessor of vertex $w$ on the shortest path from $s$: $P_s(w)$ can be obtained from the first part of the algorithm. And the dependencies of $s$ on all vertices can be computed in $\mathcal{O}(m)$. Therefore, instead of solving the APSP algorithm from start to end, we solve the SSSP for each source vertex $s \in V$. After completing the SSSP algorithm, we calculate the dependencies of $s$ as laid out in \cref{eq:partial_sum_dependency}.

73 72

74
                \EndWhile

75
            \EndProcedure

76
        \end{algorithmic}

77
        \end{algorithm}

73
        Next, we look at the relation between the betweenness centrality for $v \in V$ $c_B(v)$ and dependency $\delta_{s\LargerCdot}(v)$. Summing all the dependencies of $s \ in V$ on a single vertex $v$, and we arrive at the betweenness centrality $c_B(v)$ as in the following equation:

78 74

75
        \begin{equation}

76
            \label{eq:accummulate_betweenness_centrality}

77
            c_B(v) = \sum_{s \neq v } \delta_{s\LargerCdot}(v)

78
        \end{equation}

79 79

80
\section{Summing all the pair-dependencies} \label{sec:computation_summing_pair_dependencies}

81
    \subsubsection{Naive ways}

82
    \subsubsection{Gradually accummulation of pair-dependencies} \label{sec:computation_accumulation_pair_dependencies}

80
        Therefore, we can then gradually accummulate the betweenness centrality score $c_B(v)$ by summing up the dependency $\delta_{s\LargerCdot}(v)$ to $c_B(v)$.

81

82
        Next section is intended to provide the high-level on the algorithm to calculate betweenness centrality

83 83

84 84

85 85
\section{Complexity of determining betweenness centrality as a whole} \label{sec:computation_summary}

86 86

87
     \begin{table} [h]

88
        \caption{Complexity for the 1st step: counting the number and length of shortest paths}

89
        \label{tab:complexity_of_algorithms}

90
        \begin{tabular}{l|l|l}

91
            \toprule

92
            Algorithm name & Unweighted graphs & Weighted graph \\

93
            \midrule

94
            Breadth-First Search & $\mathcal{O}(mn + n^2)$ & not applicable\\

95
            Dijkstra &  & $\mathcal{O}(mn + n^2 \log n)$ \\

96
            Floyd/Warshall & & $\mathcal{O}(n^3)$ \\

97
            \bottomrule

98
        \end{tabular}

99
    \end{table}

100

101
    \begin{table} [h]

102
        \caption{Complexity for the 2nd step: summing all pair-dependencies}

103
        \label{tab:complexity_of_algorithms}

104
        \begin{tabular}{l|l|l}

105
            \toprule

106
            Algorithm name & Complexity \\

107
            \midrule

108
            Naive summation & $\mathcal{O}(n^3)$ \\

109
            \cite{Brandes01afaster} &  $\mathcal{O}(mn)$\\

110
            \bottomrule

111
        \end{tabular}

112
    \end{table}

87
    This section summarizes the complexity for naive way, and for Brandes' algorithm to calculate betweenness centrality. The \cref{algo:basic_bc} run in $\mathcal{O}(n^3)$ for both unweighted and weighted graphs. And the Brandes's algorithm in \cref{algo:brandes_bc} can run in $\mathcal{O}(mn)$ for unweighted graph, and $\mathcal{O}(mn + n^2 \log n)$. The breaking down of the computational time of those two algorithms are given below.

88

89
    For \cref{algo:basic_bc}, the overall computation is dominated by line 2 to sum all the pair dependencies.

90

91
    \begin{algorithm}

92
    \caption{Basic way to calculate betweenness centrality} \label{algo:basic_bc}

93
    \begin{algorithmic}

94
        \Procedure{Basic Betweenness Centrality}{}

95
            \State Solving APSP problem

96
            \State Summing all pair-dependencies

97
        \EndProcedure

98
    \end{algorithmic}

99
    \end{algorithm}

100

101
    For \cref{algo:brandes_bc}, within the \texttt{for} loop, for unweighted graph, each execution from line 2 - 3 executes in $\mathcal{O}(m)$. For weighted graph, solving SSSP problem dominates the computation with $O(m + n \log n)$ for unweighted graph. The \texttt{for} loop is executed $n$ times. Thus, we have the running time stated in the beginning of the section.

102

103
    \begin{algorithm}

104
    \caption{Brandes betweenness centrality} \label{algo:brandes_bc}

105
    \begin{algorithmic}

106
        \Procedure{Brandes Betweenness Centrality}{}

107
            \For{ $s \in V$ do }

108
                \State Solving SSSP problem for source $s$ (with BFS or Dijkstra's algorithm)

109
                \State Finding all the dependencies of $s$ for all other vertices $v$ \cref{eq:partial_sum_dependency}

110
                \State Accumulate the betweenness centrality for vertices $v$ \cref{eq:accummulate_betweenness_centrality}

111
            \EndFor

112
        \EndProcedure

113
    \end{algorithmic}

114
    \end{algorithm}

115

113 116

114 117
\section{Heuristic algorithm for the exact betweenness centrality} \label{sec:computation_heuristic}

118
    We start the section with the high-level description of steps in the heuristic algorithm. We also provides definition for some term that deem to be important. For the details and all the proofs, reader can always refer to \cite{PuzisZEDB12}

119

120
    \citeauthor{PuzisZEDB12} uses the insight that computing betweenness centrality on tree can be done in linear time in \cite{PuzisZEDB12}. From now on this algorithm is denoted as HBC. He defined a heuristic algorithm to speed up the betweenness centrality calculation. The original graph is partitioned into one or many bi-connected components (BCCs). Then the bi-connected components tree is formed, and several analysis is performed on the BCC tree. The summary of HBC's steps are as follow:

121

122
    \begin{enumerate}

123
        \item Divide the graphs into bi-connected components and form BCC tree;

124
        \item Compute the \emph{Component Tree Weight}. This step involves the knowledge of the whole graph;

125
        \item Form the matrix with the \emph{communication intensity} value for each pair of vertices in a BCC;

126
        \item Calculate the betweenness centrality for each vertices in each BCC using \cref{eq:BC_for_v_heuristic};

127
        \item Finalize the betweenness centrality for the original graph.

128
    \end{enumerate}

129

130
    \subsection*{Partitioning a graph into bi-connected components}

131

132
        Given a graph $G=(V,E)$. If the removal of any $v\in V$ disconnects $G$, then $v$ is said to be a \emph{cut-point} (or \emph{articulation point}). A graph is said to be \emph{bi-connected }if at least 2 vertices must be removed to disconnect the graph. \cref{fig:puzis_bcc} from \cite{PuzisZEDB12} gives an example of partitioning the graph into BCCs.

133

134
        \begin{figure}

135
            \centering

136
            \includegraphics[scale=0.5]{images/puzis_bcc.png}

137
            \caption[BCC partition and cut-points]{BCC partition and cut-points. Reproduced from \cite{PuzisZEDB12}.}

138
            \label{fig:puzis_bcc}

139
        \end{figure}

140

141

142
    \subsection*{Component Tree Weight}

143
        The Component Tree Weight is intended to keep track of how many number of nodes can be reach via a cut-point of a specific BCC when that BCC is separated from the original graph.Let $B$ be a bi-connected component and $v \in B$ be a cut-point. Let $B^{+}(v)$ be the \emph{connected component} in G which includes $B$ after the removal of $v$. In another term, $B^{+}(v)$ include all the vertices in G that are connected to $v$ via $B$. An example is for a bi-connected component $BCC_B$ in \cref{fig:puzis_bcc}. After removing vertex $v$, the connected component ${BCC_B}^{+}(v) = \{5, 6, u, 1, 2, 3, 4\}$. The \emph{cut with} value is defined to be the size of $D_B^{v+} = |B^{+}(v)|$.

144

145
        Define $B^{-}(v)$ as a set of vertices that can only be reach through $v$ from $B$. Therefore, when $v$ is cut out from the graph, those vertices in $B^{+}(v)$ (a connected component including $B$ without the vertex $v$) cannot reach $B^{-}(v)$. The cut without'' value is defined to be $D_B^{v-} = |B^{-}(v)|$. For example, when removing vertex $v$ from $BCC_B$, ${BCC_B}^{-}(v) = \{7, 8, 9\}$. Those vertices cannot be reach from $B^{+}(v)$ anymore.

146

147
        The relation between cut with'' and cut without'' is as follow: $|B^{+}(v)| + 1 + |B^{+}(v)| = |V|$. It can be thought as the number of vertices.

148

149
        \cref{tab:component_tree_weight} shows the value assigned to Component Tree Weight after the Step 2.

150

151
        \begin{table}[h]

152
            \caption{Component Tree Weight calculation for \cref{fig:puzis_bcc}}

153
            \label{tab:component_tree_weight}

154
            \centering

155
            \begin{tabular}{l l l l}

156
                \toprule

157
                Component & Cut-point & Cut with'' value & Cut without'' value \\

158
                \midrule

159
                $BCC_A$ & u & 2 & 8 \\

160
                $BCC_B$ & u & 8 & 2 \\

161
                $BCC_C$ & v & 7 & 3 \\

162
                $BCC_D$ & v & 1 & 9 \\

163
                \bottomrule

164
            \end{tabular}

165
        \end{table}

166

167

168
    \subsection*{Communication intensity}

169
        For each bi-connected component $B_i$, we calculate the traffic matrix $h^{i}_{st}$ representing the \emph{communication intensity} between a pair of vertex $s$, $t$. The traffic matrix is symmetric. There are 3 cases:

170

171
        \begin{itemize}

172
            \item If $x$ and $y$ are not cut-points, then $h_{xy} = 0$ for $x = y$ and $h_{xy} = 1$ for $x \neq y$

173
            \item If $x$ is not a cut-point, and $y$ is a cut-point, then $h_{xy} = D_B^{y-} + 1$

174
            \item If both $x$ and $y$ are cut-points, then $h_{xy} = (D_B^{x-} + 1) \cdot (D_B^{y-} + 1)$

175
        \end{itemize}

176

177
    \subsection*{Betweenness computation for bi-connected component}

178
        The betweenness centrality calculated

179
        \begin{equation} \label{eq:BC_for_v_heuristic}

180
            c_B(v)_{v \in B_i} = \sum_{s, t \in B_i} \delta_{st}(v) \dot h_{st}^{i}

181
        \end{equation}

182
        And the heuristic algorithm also use the faster Brandes's algorithm to calculate betweenness centrality. Recall the equation for summing the dependency of $s \in V$ on any $v \in V$

183

184
        \begin{equation*}

185
            \delta_{s\LargerCdot}(v) = \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot (1 + \delta_{s\LargerCdot}(w))

186
        \end{equation*}

187
        For any bi-connected component $B_i$, the dependency of $s \in B_i$ on any $v \in B_i$ can be written as follow:

188

189
        \begin{equation}

190
            \label{eq:dependency_heuristic_partial_sum}

191
            \delta_{s\LargerCdot}(v) = h^{i}_{sv} + \sum_{w : v \in P_s(w)} \frac{\sigma_{sv}}{\sigma_{sw}} \cdot \delta_{s\LargerCdot}(w)

192
        \end{equation}

193

194
    \subsection*{Betweenness computation for original graph $G$}

195
        It was proven in the paper that for non-cut-point, the betweenness value calculated locally in each $B_i$ is equal to the betweenness value calculated on the whole graph.

196

197
        \begin{equation}

198
            c_B(v) = c_B(v)_{v \in B}

199
        \end{equation}

200
        For the cut-point betweenness value, we first have to figure out the inter-component communication, denoted as $BC^{inter}(v)$:

201

202
        \begin{equation}

203
            BC^{inter}(v) = \sum_{B \ni v} D^{v-}_{B} \cdot D^{v+}_{B}

204
        \end{equation}

205
        For cut-point $v$, the betweenness value on the whole graph is defined as:

206

207
        \begin{equation}

208
            c_B(v) = \sum_{B \ni v} c_B(v)_{v \in B} - BC^{inter}(v)

209
        \end{equation}

thesis/ch4_experiment_setup.tex
1 1
%!TEX root = main.tex

2 2

3
\chapter{Simulation scenario} \label{sec:sim_scenario}

3
\chapter{Experiment} \label{sec:experiment}

4
    The previous chapter presents two algorithms to calculate Betweenness Centrality (BC). One algorithm was found out by \citeauthor{Brandes01afaster} in \cite{Brandes01afaster}. For the ease of notation, from now on it is called \emph{BBC}. The other heuristic version to calculate betweenness centrality was devised by \citeauthor{PuzisZEDB12} \cite{PuzisZEDB12}, and its short-form name from this point is \emph{HBC}.

4 5

5
http://stackoverflow.com/questions/10455905/clocks-per-sec-not-actually-clocks-per-sec

6
    This chapter presents all the setup information to compare the running time of two algorithms. In order to perform the analysis, there are few things needed to address first. To analyze the running time of the algorithm, we need to know about all the factors that can influence the running time. Those factors are the implementation, the type and size of graph, the system that runs the algorithms. The following section addresses those factors in depth. \cref{sec:experiment_implementation} presents the implementation of BBC and HBC were run on the physical router and server. \cref{sec:experiment_hardware} provides the hardware specifications for the system we use. \cref{sec:experiment_openwrt} concerns with the technical side of porting the program to embedded device, in our case is the Ubiquiti router. This section contains a lot of technical note, which might be served as a guideline on how to make a custom package for embedded system. \cref{sec:experiment_input} presents about two types of input that we use in the experiment: the randomly generated graphs, and the real topologies of the WCNs. Lastly, there are fluctuation when measuring the running time of the algorithms. \cref{sec:experiment_design} presents the method to get the average running time, how the experiment was carried out, e.g. how many graphs was generated, how many times the algorithm was run for one specific input.

7

8
\section{Implementation} \label{sec:experiment_implementation}

9
    The BBC is implemented in the Boost Graph Library\footnote{http://www.boost.org/doc/libs/1_60_0/libs/graph/doc/index.html} (BGL). That is the generic library with implementation for many classical graph algorithms.

10

11
    From \cref{sec:computation_heuristic}, the implementation for HBC contains four main steps. The first step involves dividing the main graph into some bi-connected components $B_i$. And then for each articulation points in all $B_i$, the link weight is computed. To calculate the link weight, we need to use the information from the original graph. Second, for each $B_i$, create the traffic matrix $h^i$ to represent the \emph{communication intensity} between all pair of vertices in each $B_i$.

12

13
    The third, and the final part is the procedure to calculate betweenness for each $B_i$. Since the original graph is divided into some bi-connected components $B_i$, performing the original BBC for each $B_i$ will not generate the correct result. Actually, the accumulation step, as shown in \cref{eq:BC_for_v_heuristic} involves multiplying the pair-dependency $\delta_{st}(v)$ with its corresponding communication intensity value $h_{st}^{i}$. For an ease of comparison, the pseudo code of the accumulation step for BBC and HBC are presented in \cref{algo:accummulation_brandes} and \cref{algo:accumulation_heuristic}, respectively.

14

15
    \begin{algorithm}[h]

16
        \caption{Accumulation procedure for BBC} \label{algo:accummulation_brandes}

17
        \begin{algorithmic}

18
            \Procedure{accumulation}{}

19
                \State $c_B[s] \leftarrow c_B[s] + (|S| - 1)$

20
                \For{$v \in V$}

21
                    $\delta[v] \leftarrow 0$

22
                \EndFor

23

24
                \While{$S$ not empty}

25
                    \State pop $w \leftarrow S$

26
                    \For{$v \in Pred[w]$}

27
                        $\delta[v] \leftarrow \delta[v] + \frac{\sigma[v]}{\sigma[w]} (1 + \delta[w])$

28
                    \EndFor

29
                    \If{$w \neq s$}

30
                        $c_B[w] \leftarrow c_B[w] + \delta[w] + 1$ \Comment{$w$ is target of $s$ once}

31
                    \EndIf

32
                \EndWhile

33
            \EndProcedure

34
        \end{algorithmic}

35
    \end{algorithm}

36

37
    \begin{algorithm}

38
    \caption{Accumulation procedure for HBC}\label{algo:accumulation_heuristic}

39
        \begin{algorithmic}

40
            \Procedure{accumulation}{}

41
                \For{$v \in V$}

42
                    $\delta[v] \leftarrow 0$

43
                \EndFor

44
                \While{$S$ not empty}

45
                    \State pop $w \leftarrow S$

46
                    \State $h \leftarrow$ get_the_communication_intensity_between $(w, s)$

47
                    \State $c_B[s] += h$

48
                    \For{$v \in Pred[w]$}

49
                        $\delta[v] \leftarrow \delta[v] + \frac{\sigma[v]}{\sigma[w]} (h + \delta[w])$

50
                    \EndFor

51
                    \If{$w \neq s$}

52
                        $c_B[w] \leftarrow c_B[w] + \delta[w] + h$

53
                    \EndIf

54
                \EndWhile

55
            \EndProcedure

56
        \end{algorithmic}

57
    \end{algorithm}

58

59

60
\section{Hardware specifications} \label{sec:experiment_hardware}

61
    \subsection*{Server}

62
        The experiment is run on the machine provided by Advanced Networking Systems (ANS) lab\footnote{https://ans.disi.unitn.it/}, with the hardware components as in \cref{tab:specification_computer}

63

64
        \begin{table}

65
            \caption{Specification of a computer}

66
            \label{tab:specification_computer}

67
            \centering

68
            \begin{tabular}{l l}

69
                \toprule

70
                CPUs: & 2 $\times$ Intel(R) Xeon(R) CPU E5-2630 v3 \@ 2.40GHz \\

71
                Memory: & 4 $\times$ 16GB \@ 1866 MHz \\

72
                \bottomrule

73
            \end{tabular}

74
        \end{table}

75

76
        The machine was running GNU/Linux with kernel 3.16.

77

78
    \subsection*{WiFi Router}

79
        We use Ubiquiti NanoStation M2 router to test the performance. The specification is in \cref{tab:specification_router}. Ubiquiti router is flashed with a firmware obtained from OpenWrt as described in the next section.

80

81
        \begin{table}

82
            \caption{Specification of Ubiquiti NanoStation M2 router}

83
            \label{tab:specification_router}

84
            \centering

85
            \begin{tabular}{l l}

86
                \toprule

87
                Platform: & Atheros AR7240 \\

88
                CPU MHz: & 390 \\

89
                Flash MB: & 8 \\

90
                RAM MB: & 32 \\

91
                \bottomrule

92
            \end{tabular}

93
        \end{table}

94

95

96
\section{Porting a program to OpenWrt} \label{sec:experiment_openwrt}

97
    OpenWrt\footnote{https://openwrt.org/ -- the most widely used open-souce operating system for wireless routers.} is described as a Linux distribution for embedded devices.''. It allows people to build custom firmware to run on many supported devices. And it aids with cross compiling the package which can be installed on those devices with custom firmware.

98

99
    Even though we can build the firmware image with OpenWrt toolchain. However, there is the firmware image available on the Download pageootnote{https://wiki.openwrt.org/toh/hwdata/ubiquiti/ubiquiti_nanostationm2} for Ubiquiti NanaStation M2 router, so we use it and flash the device with that firmware.

100

101
    The source code of the program running on the machine with GNU/Linux kernel 3.16, and the program running on Ubiquiti router are the same. To make it possible to run a program in the Ubiquiti router, OpenWrt helps with compiling the source code into a package that can be installed on Ubiquiti router.

102

103

104
\section{Input data} \label{sec:experiment_input}

105
    The experiment involves testing with 2 main sources of input: the artificially generated graphs, and the real topologies from the Wireless Community Networks.

106

107
    \subsection{Synthetic graphs}

108
        For randomly generated graphs, we are interested in three types: Erdos, Power Law, and Community Graph. All the graphs were generated following the open-source code made available by \citeauthor{Baldesi2016} \cite{Baldesi2016} in a framework called \emph{NePA TesT}. NePA TesT provides a way to generate synthetic graph for different graph type such as Erdos, Power Law, and Community Network.

109

110
        The reason why we wanted to test on synthetic graphs is because we want to know how BBC and HBC would perform with different graph structures, and graph sizes. And as will see later in \cref{sec:experiment_result}, performance of BBC and HBC depends on the structure and property of each graph type.

111

112
        \begin{table}

113
            \caption{All synthetic graphs generated for the experiment}

114
            \label{tab:graph_generator}

115
            \centering

116
            \begin{tabular}{l|l|l}

117
                \toprule

118
                Graph type & Number of nodes & Edges / Nodes Ratio \\

119
                \midrule

120
                CN & from [100, 1000] with the step 100 & 1.2 \\

121
                ER & from [100, 1000] with the step 100 & 3 \\

122
                PL & from [100, 1000] with the step 100 & 1.2 \\

123
                \bottomrule

124
            \end{tabular}

125
        \end{table}

126

127
        For every line in \cref{tab:graph_generator}, 30 different graphs were generated per  size. So for CN graphs, we have 10 different sizes multiplies by 30 graphs, and we have 300 CN graphs in total. For all three graph types, we have 900 graphs. We use the notation \emph{CN100}, \emph{PL100}, \emph{ER100} to denotes the Community Network, Power Law and Erdos graphs with 100 nodes, respectively. The same format of notation is used for graphs with larger number of nodes.

128

129

130
    \subsection{Real topologies of WCNs}

131
        In conjunction with the synthetic networks, we also want to evaluate BBC and HBC on the real networks. Since in the end, what matters is which algorithm would work better in reality. In this thesis, we focus on three WCNs: Funk Feuer Wien (FFW)\footnote{https://www.funkfeuer.at/} and Funk Feuer Graz (FFG)\footnote{graz.funkfeuer.at/en/index.html} in Austria and Ninux (NNX)\footnote{wiki.ninux.org/} in Italy.

132

133
        The WCNs are ever evolving with some current node leaving the network, while some new nodes might join it. \cite{Maccari2015175} provides a depth and throughout analysis on these networks for a week \cite{Maccari2015175}. He found out that the fluctuation in the number of nodes and the number of links are small. According to him, those small changes are from a small subset of leaf nodes leaving and entering network sporadically. Since that fluctuations happen at peripheral nodes, the shortest paths for pair of nodes did not vary much with other time-periods. The reason is that those peripheral leaf nodes does not lies in the shortest paths when routing for other nodes. Hence the performance of shortest-path based betweenness measurement varies slightly. There are other measurements taken as well, but the conclusion from the paper is that for a week, the network features of these real WCNs are stable. With the above analysis, we limit the performance testing of BBC and HBC on one snapshot for each WCNs in consideration.

134

135

136
\section{Experiment Design} \label{sec:experiment_design}

137
    Those previous subsections present all the component of the experiment. We have the input data composing of synthetic and real graphs. We have 2 algorithms needed to evaluate: BBC and HBC. We specified the target system to evaluate the running time of the 2 mentioned algorithms. We start the section with definition of running time. The execution step of the experiment is presented after, aka how many times one single algorithm (BBC or HBC) is evaluated for each input graph.

138

139
    Every time we mention about the running time, or execution time of BBC or HBC algorithm, we are talking about the time that a processor spent on calculating result for each algorithm. In C++, there is the function call in standard library to help measure the number of CPU clocks consumped by the program.

140

141
    \begin{figure}

142
        \begin{lstlisting}

143
            #include<time.h>

144

145
            clock_t begin = clock()

146

147
            // Run BBC algorithm / HBC algorithm for one graph

148

149
            clock_t end = clock()

150

151
            double running_time = double (end - begin) / CLOCKS_PER_SECOND

152
        \end{lstlisting}

153
        \caption{Code to calculate the CPU time}

154
    \end{figure}

155

156

157
    The experiment consists of running each algorithm 10 times for each input graph. In this experiment, the measurement we want to get is the running time for each algorithm. And we want that measure to be as close to the actual value as possible. When running for one time, there are various things which might affect the result. And we need to repeat the experiment multiple times to make sure the measurement we get is not a coincidence.

158

159
    \textbf{Note:} Though we run the program on the machine with CPU containing 8 cores and supporting hyper threading, the program does not support parallelism. One program only run on one single thread from the beginning to the end.

160

161
    For example, for the Ninux graph, we will run BBC 10 times with it, and then we will run HBC 10 times. The running time were all noted down for the analysis.

162

163
    \begin{table}

164
        \centering

165
        \begin{tabular}{l|l|l}

166
            \toprule

167
            Graph type & On the computer & On Ubiquiti router \\

168
            \midrule

169
            CN with nodes in the range [100, 1000] & Yes & Yes \\

170
            ER with nodes in the range [100, 1000] & Yes & No \\

171
            PL with nodes in the range [100, 1000] & Yes & No \\

172
            Ninux & Yes & Yes \\

173
            FFG & Yes & Yes \\

174
            FFW & Yes & Yes \\

175
            \bottomrule

176
        \end{tabular}

177
        \caption{Summary of the number of runs (repetitions) for different input graph}

178
        \label{tab:experiment_design}

179
    \end{table}

thesis/ch5_experiment_result.tex
1 1
%!TEX root = main.tex

2 2

3
\chapter{Simulation experiments} \label{sec:sim_experiment}

3
\chapter{Results} \label{sec:experiment_result}

4
    The chapter begins with the result from the synthetic graphs in \cref{sec:result_synthetic}. Since HBC only improve performance for one type of synthetic graphs -- the Community Network graphs, then in \cref{sec:result_synthetic_analysis} we provides commentary on the reason HBC works for CN, but not the other topology types ER and PL. Then we moves on to the next section presenting about the three real topologies of WCNs -- Ninux, FFG, FFW in \cref{sec:result_wcn}. The chapter ends with the running time analysis for all those WCNs in both the Ubiquity router and a computer.

5

6
\section{Running time for synthetic graphs} \label{sec:result_synthetic}

7
    As mentioned in \cref{tab:experiment_design}, only the Community Network synthetic graphs are run on both the computer and router. While the remaining two synthetic graphs -- Erdos and Power Law -- are only tested for performance with the computer.

8

9
    The value on the y-axis for the running time of HBC and BBC for those figures \ref{fig:CN_both_BC_HBC}, \ref{fig:ER_PL_server_BC_HBC}, \ref{fig:CN_real_BC_HBC} is the average of average.

10

11
    Let $r_{ij}^{(100)}, i \in [1, 10]$ (because we repeat the same measurement 10 times) denote the running time of BBC algorithm for a CN graph with 100 nodes $g_j^{(100)}, j \in [1, 30]$ since for each type of graphs, we generate 30 graphs in total. Then the value on the y-axis corresponding to graph with 100 nodes is the average running time of all the repetitions for all CN graphs with 100 nodes. Formally, check out \cref{eq:avg_running_time}

12

13
    \begin{equation}

14
        \label{eq:avg_running_time}

15
        v^{(100)} = \frac{1}{300} \sum_{j=1}^{30} \sum_{i = 1}^{10} r_{ij}^{(100)}

16
    \end{equation}

17

18
    For the CN graphs, \cref{fig:CN_both_BC_HBC} summarizes the running time observed for graphs with different node types. We see a huge improvement of time of HBC over BBC when performing the calculation on both the computer and the router. The difference between two graphs is the scale of the y-axis. The program executes much faster in the computer than in the router, but this is simply due to the higher processing power of the computer. What is interesting from \cref{fig:CN_both_BC_HBC} is how fast HBC complete the calculation comparing to BBC. We see that for CN graphs with 1000 nodes, HBC finishes the computation 100 faster than BBC with the computer, and 60 times fater with the router. Also for CN graphs with 1000 nodes, with the router, BBC takes more than 6 minutes to compute betweenness centrality, while HBC can finish in less than 7 seconds. This is the stark constrast on the performance. Given the router has small processing power, and it has other tasks to do such as forwarding packets, faster execution time can result in better performance overall. From looking at a graph, we can see that HBC running time increase linearly, while BBC running time quickly soars.

19

20
    \begin{figure}[hpt]

21
        \begin{tabular}{cc}

22
            \subfloat[fig 1][]{\includegraphics[width=0.45\textwidth]{images/CN_server_BC_HBC.png}} &

23
            \subfloat[fig 2][]{\includegraphics[width=0.45\textwidth]{images/CN_server_speedup_ratio.png}} \\

24
            \subfloat[fig 3][]{\includegraphics[width=0.45\textwidth]{images/CN_artificial_router_BC_HBC.png}} &

25
            \subfloat[fig 4][]{\includegraphics[width=0.45\textwidth]{images/CN_artificial_router_speedup_ratio.png}}

26
        \end{tabular}

27
        \caption[Performance for CN graphs on the computer and the Ubiquiti router]{Performance for CN graphs on the computer and the Ubiquiti router. Left column shows the running time for both algorithms. Right column shows the speed-up ratio between HBC and BBC.}

28
        \label{fig:CN_both_BC_HBC}

29
    \end{figure}

30

31
    We have just looked at one type of graph where HBC outperforms BBC - the CN graphs. Next we are going to look at two other types of graph where using HBC worsens the overall performance. \cref{fig:ER_PL_server_BC_HBC} shows the running time on server for Power Law and Erdos graphs. Both lines follow the same pattern: the running time of BBC is a bit smaller than HBC's running time.

32

33

34
    \begin{figure}

35
        \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/ER_server_BC_HBC.png}}

36
        \subfloat[fig 2][]{\includegraphics[width=0.5\textwidth]{images/PL_server_BC_HBC.png}}

37
        \caption{Performance for Erdos and Power Law graphs on the computer.}

38
        \label{fig:ER_PL_server_BC_HBC}

39
    \end{figure}

40

41
    \begin{figure}

42
        \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/ER_server_speedup_ratio.png}}

43
        \subfloat[fig 2][]{\includegraphics[width=0.5\textwidth]{images/PL_server_speedup_ratio.png}}

44
        \caption{Speed up ratio of HBC over BBC}

45
        \label{fig:ER_PL_speed_up_ratio_BC_HBC}

46
    \end{figure}

47

48
    The question then arises is when HBC can offer improvement over BBC, what kind of topologies should we use HBC to calculate betweenness centrality.

49

50
\section{Synthetic graphs analysis} \label{sec:result_synthetic_analysis}

51
    In the previous section, we noticed that HBC helps improving the performance of Community Network graphs. While for Erdos and Power Lab graphs, HBC takes longer time to run. This section analyzes all the 900 synthetic graphs to give an explanation on why HBC improves performance for some graphs, but not for the others.

52

53
    The section has two parts, the \cref{sec:result_analysis_num_of_bcc} seeks answer for whether the number of bi-connected components (BCCs) found in a single graph affect the performance. \cref{sec:result_analysis_num_of_nodes} justifies that the number of vertices (or nodes) in any single bi-connected components is the main factor affecting the performance.

54

55
    \textbf{Notations:} For each graph type [CN, PL, ER] and for each graph size [100, 200, ..., 1000], we generated 30 graphs in total. We call those graphs $G^{(i)}, i \in [1, 30]$. For each $G^{(i)}$, suppose that we can find $n^{(i)}$ bi-connected components, $B_j^{(i)}, j \in [1, n^{(i)}]$.

56

57
    \subsection*{The number of bi-connected components} \label{sec:result_analysis_num_of_bcc}

58
        The average number of BCCs for each type of graphs is summarized in \cref{tab:num_of_bcc_for_graphs}, and visualized in \cref{fig:num_of_bcc_for_graphs}.

59

60
        For each graph type, and for each graph size:

61
        \begin{equation}

62
            N_{BCC} = \frac{1}{30} \sum_{i = 1}^{30} n^{(i)}

63
        \end{equation}

64

65

66
        \begin{figure}[hpt]

67
        \centering

68
            \includegraphics[width=0.6\textwidth]{images/num_of_bcc_for_graphs.png}

69
            \caption[Average number of BCCs for synthetic graphs]{Average number of BCCs for synthetic graphs. Note that the y-axis is drawn in log-scale with base 2.}

70
            \label{fig:num_of_bcc_for_graphs}

71
        \end{figure}

72

73
        For CN graphs, the number of BCCs is almost equal to the number of nodes in the graph. While for PL graphs, in general we find only 1 BCC, and this BCC is the original PL graph. The case for ER graphs is a bit different, there are several BCCs found, and the number of BCCs found increase with the number of nodes in ER graphs, e.g from 2.4 average number of BCCs for graph with 100 nodes, up to almost 16 BCCs for graph with 1000 nodes.

74

75
        Our hypothesis is that if the original graph can be divided into many BCCs, then HBC will save the running time. However, the data shows us that even though ER graphs can be divided into many BCCs, HBC does not perform better for ER graph, see more in \cref{fig:ER_PL_server_BC_HBC}. Hence, the number of BCCs can not be used as a prediction for whether or not HBC would decrease the calculating time for betweenness centrality.

76

77

78
    \subsection*{The number of node in each bi-connected component} \label{sec:result_analysis_num_of_nodes}

79
        In the previous \cref{sec:result_analysis_num_of_bcc}, we showed that the number of BCCs found in each graph cannot predict whether HBC would run faster comparing to BBC. This subsection takes a different approach, it analyzes the bi-connected components in term of the its size -- the number of nodes in a bi-connected component. More specifically, we focus on the biggest BCC.

80

81
        After the preprocessing step done in HBC algorithm, then it uses a modified version (see \cref{algo:accumulation_heuristic} of the original Brandes Betweenness Centrality to accumulate the betweenness centrality score. Therefore, if the size of each BCC is as big as the original graphs, then HBC does not save time when calculating betweenness. Not to mention the fact that HBC adds overhead to the overall running time when it pre-processes the original graph.

82

83
        \begin{figure}[hpt]

84
            \centering

85
            \includegraphics[width=0.6\textwidth]{images/max_nodes_in_graph.png}

86
            \caption{The biggest bi-connected components for each synthetic graph}

87
            \label{fig:max_nodes_in_graph}

88
        \end{figure}

89

90
        Let $MB^{(i)}$ denotes the maximum number of nodes for a bi-connected component in a graph $G^{(i)}$:

91

92
        \begin{equation}

93
            MaxB^{(i)} = \max(B_j^{(i)}), j \in [1, n^{(i)}]

94
        \end{equation}

95

96
        Let $|B_j^{(i)}|$ denotes the number of nodes within each bi-connected component $B_j^{(i)}$. Then the y-value $N_{max}$ in the \cref{fig:max_nodes_in_graph} is defined as follow:

97

98
        \begin{equation}

99
            \label{eq:avg_max_num_of_node_in_bcc}

100
            N_{max} = \frac{1}{30} \sum_{i = 1}^{30}  MaxB^{(i)}

101
        \end{equation}

102

103
        For each graph type [CN, PL, ER] and for each graph size [100, 200, ..., 1000], we calculate the average size of the biggest bi-connected component for each graph following the \cref{eq:avg_max_num_of_node_in_bcc}. And the result is plotted in \cref{fig:max_nodes_in_graph}. The graph shows that for PL and ER, the largest BCC contains almost all the nodes in the original graph (see the scatter points for PL and ER seems to follow the straight line dividing the Q1 into two halves).

104

105
        In the case of CN graphs, even for the graph size of 1000 nodes, the largest BCC on average only have 17 vertices. Therefore, computing the betweenness centrality for such the small BCC will improve the running time of HBC significantly.

106

107
        For the raw data, see \cref{tab:max_nodes_in_graph}

108

109

110

111
\section{Three real WCN} \label{sec:result_wcn}

112
     \cref{tab:real_wcn_analysis} shows the high-level features of the FFG, FFW, Ninux such as the number of vertices and nodes, and the analysis on the bi-connected components. It shows that the number of nodes in the largest bi-connected component accounted for $27\%$, $46\%$, $35\%$ the total nodes in the FFG, FFW and Ninux network, respectively.

113

114
    \begin{table}[hpt]

115
        \centering

116
        \begin{tabular}{l c c c }

117
            \toprule

118
            & FFG & FFW & Ninux \\

119
            \midrule

120
            number of vertices & 126 & 235 & 126 \\

121
            number of edges & 181 & 370 & 143 \\

122
            edges/nodes ratio & 1.44 & 1.57 & 1.13 \\

123
            density & 0.022 & 0.013 & 0.018 \\

124
            number of bi-connected components & 67 & 120 & 80 \\

125
            max number of nodes in BCC & 35 & 107 & 44 \\

126
            \bottomrule

127
        \end{tabular}

128
        \caption{Features of the network under analysis}

129
        \label{tab:real_wcn_analysis}

130
    \end{table}

131

132
    For the real wireless community networks, HBC consistently outperform BBC in the running time. In the case of FFG, where the largest bi-connected component includes 27\% of total nodes, we also observe the largest speed-up ratio between HBC and BBC -- more than 6 times when running on the router. On another hand, for FFW, where the biggest bi-connected component has 46\% of total nodes, we only observe the speed-up ratio around 3. But in all cases, HBC reduces the time to calculate betweenness centrality for all real topologies. For the running time of HBC and BBC, see more in \cref{fig:CN_real_BC_HBC}; for the speed-up ratio, check out \cref{fig:CN_real_speedup_ratio_BC_HBC}. The raw data is provided in \cref{tab:CN_real_router_BC_HBC} and \cref{tab:CN_real_server_BC_HBC}

133

134
    \begin{figure}[hpt]

135
        \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_router_BC_HBC.png}}

136
        \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_server_BC_HBC.png}}

137
        \caption[Running time of HBC and BBC for real WCNs]{Running time of HBC and BBC for real WCNs. The X-axis denotes name of Wireless Community Networks.}

138
        \label{fig:CN_real_BC_HBC}

139
    \end{figure}

140

141
    \begin{figure}[hpt]

142
        \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_router_speedup_ratio.png}}

143
        \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_server_speedup_ratio.png}}

144
        \caption[Speed up ratio of HBC over BBC for real WCNs]{Speed up ratio of HBC over BBC for real WCNs. The X-axis is the name of Wireless Community Networks}

145
        \label{fig:CN_real_speedup_ratio_BC_HBC}

146
    \end{figure}

147

148
    The raw data is also provided in \cref{tab:CN_real_server_BC_HBC} and \cref{tab:CN_real_router_BC_HBC}

149

150

151

152

153

154

thesis/ch6_conclusion.tex
1
%!TEX root = main.tex

2

3
\chapter{Conclusion} \label{sec:conclusion}

4

5
In this work, we presented the performance evaluation for two algorithms to calculate betweenness centrality on Ubiquiti router. One state of the art algorithm was found out by Brandes in \cite{Brandes01afaster} (called BBC). Another algorithm is the heuristic version aimed to improve the Brandes's algorithm (denoted HBC). On the router, we tested with three real topology of Wireless Community Networks (WCNs), and it was shown that those topology, HBC were able to calculate faster than BBC from 3 to 7 times in \cref{fig:CN_real_speedup_ratio_BC_HBC}.

6

7
We also evaluate the performance for synthetic Community Networks graphs as well. Those graphs were generated algorithmically using \emph{NePa TesT} \cite{Baldesi2016}. The result for the synthetic CN networks is promising. When running on routers for synthetic CN graphs with 1000 nodes, HBC achieves a speed up ratio of 60 times over BBC. On average, when running on the Ubiquiti router, a CN network with 1000 nodes take almost 7 minutes with BBC, and less than 7 seconds with HBC. This is the huge improvement given the limited capacity of the router. Testing with synthetic CN graphs, the result also showed that HBC improved the performance over BBC.

8

9
Besides CN graphs (both the real topology and the randomly generated ones), we also performed the analysis with synthetic Erdos and Power Law graph. HBC takes longer time than BBC to finish the calculation. For example, for Erdos graph with 1000 nodes, BBC takes 24 seconds, and HBC takes 26.4 seconds. HBC even though does not guarantee that the performance would be improved over the BCC, its complexity in worst case is also the same as BBC ($\mathcal{O}(mn)$ for an unweighted graph, and $\mathcal{O}(mn + n^2 \log n)$ for a weighted graph).

10

11
We also carried out an analysis for CN, Erdos and Power Law graphs to see the reason why HBC works for CN graphs, but not for the others. We concluded that the size of the bi-connected components (BCCs) is the main factor affecting the performance of HBC. For Funk Feuer Wien -- a WCN in Vienna, the experiment shows that even though the biggest bi-connected components found contains almost half of nodes in the networks (46\% of total nodes), HBC can still run 3 times faster than BBC.

12

13
So far, we have shown that HBC is an algorithm to choose when we need to calculate betweenness centrality of the Wireless Community Networks real topologies. There are ways that we think can improve the performance further, which may be implemented in the future. One property of WCN is that the topology evolves over time since any nodes can leave and join the network. However, in general, the new incoming or recently-left nodes are those in peripheral of the network \cite{Maccari2015175}. We can thus keep track of the betweenness centrality in each bi-connected component, and we would only calculate the betweenness centrality again when there are modification in that BCC.

14

15
For this thesis, both algorithms were run on the set of predefined graph. In the future, we can extend the functionality of the program, so that it can extract the real-time topology of the WCN of interest, and perform all the necessary calculation. Also for the router in the testing environment, its sole task was to run the experiment. The router was not busy with forwarding packets or finding routes for packets as it has to do in the reality. It will be interesting to see how the HBC work with the constantly evolving topology, and with the busy router.

thesis/ideas.tex
187 187

188 188
        feel of

189 189

190
\section{Experiment setup}

191
 Cores is a hardware term that describes the number of independent central processing units in a single computing component (die or chip).

192

193
       A Thread, or thread of execution, is a software term for the basic ordered sequence of instructions that can be passed through or processed by a single CPU core.

194

195
       Intel® Hyper-Threading Technology (Intel® HT Technology) delivers two processing threads per physical core. Highly threaded applications can get more work done in parallel, completing tasks sooner.

190 196

191 197

192 198
\section{Questions}

......
230 236
    Then, 1 chapter about the result

231 237

232 238
    Bibliography: 20 - 3.0

239

240

241
\section{Scope}

242
        The scope of the paper:

243
        \begin{itemize}

244
            \item no self-loop

245
            \item no multigraph: graph with no parallel edges

246
            \item unweighted and weighted graph

247
            \item can apply to connected or disconnected graph. In the disconnected graph, certain vertices are not reachable from vertices in a different component. This yeilds the zero shortest paths counts, and as a result those 2 endpoints contributes 0 toward the betweenness.

248
        \end{itemize}

249

250
    connected graph: since the betweenness centrality is calculated based on shortest paths. In the disconnected graph, certain vertices are not reachable from vertices in a different component. This yields the zero shortest paths counts for betweenness centrality.

251

252

253
\section{Notation}

254
    A graph $G=(V,E)$ contains a set of vertices (nodes) $V$ and a set of edges (links) $E$. We assume we assume that al graphs are undirected and connected. \textbf{XXX check the modification for directed graph}

255

256
    $m = |E|$ is the number of edges in graph $G$

257

258
    $n = |V|$ is the number of vertices in graph $G$

259

260
    Let $\omega$ be a weight function on the edges. We assume that $\omega(e) > 0, e \in E$ for weighted graphs. And we define $\omega(e) = 1, e \in E$ for unweighted graphs. If for $s, t \in V$ and ${s, t} \in E$, then we can also use $\omega(s, t)$ to denote the weight of the edge between $s$ and $t$.

261

262
    The \emph{length} of a path is the sum of weights of all edges along the path. We use $d(s, t)$ to denote the \emph{distance} between vertices $s$ and $t$, i.e. the minimum length of any path connecting $s$ and $t$.

263

264
\section{Result}

265

266
        \begin{figure}[h]

267
        \caption{}

268
        \label{fig:histogram_num_of_nodes_CN200}

269
        \centering

270
        \includegraphics[scale=0.6]{images/histogram_num_of_nodes_CN200.png}

271
        \end{figure}

272

273
        \begin{figure}[h]

274
        \caption{}

275
        \label{fig:histogram_num_of_nodes_ER200}

276
        \centering

277
        \includegraphics[scale=0.6]{images/histogram_num_of_nodes_ER200.png}

278
        \end{figure}

279

280
        \begin{figure}[h]

281
        \caption{}

282
        \label{fig:histogram_num_of_nodes_PL200}

283
        \centering

284
        \includegraphics[scale=0.6]{images/histogram_num_of_nodes_PL200.png}

285
        \end{figure}

286

287

288
\section{Counting shortest paths}

289
    In the end, I choose not to describe those algorithms in the final version of my thesis

290

291
    \textbf{XXX What is the input}: the source vertex $s$

292
        \textbf{XXX What is the output?}:

293

294
        \subsection{Breadth-First Search for unweighted graph}

295
            I understood the algorithm, but how can I explain it to another person?

296

297

298

299
        \subsection{Dijkstra's algorithm for weighted graph}

300
            The Dijkstra's algorithm can be modified so that when given a source vertex $s$, the output will be the set of predecessors and the number of shortest paths from $s$ to other $t \neq s \in V$

301

302
            \begin{algorithm}

303
            \caption{Dijkstra's algorithm} \label{algo:dijkstra}

304
            \begin{algorithmic}

305
                \Procedure{Dijkstra for SSSP with source vertex s}{}

306
                    \State $S \leftarrow$ empty stack

307
                    \State $P[w] \leftarrow$ empty list, $w \in V$

308
                    \State $\sigma[t] \leftarrow 0, t \in V$; $\sigma[s] \leftarrow 1$

309
                    \State $d[t] \leftarrow -1, t \in V$; $d[s] \leftarrow 0$

310
                    \State $Q \leftarrow$ empty queue

311
                    \State enqueue $s \rightarrow Q$

312
                    \While{$Q$ \emph{not empty}}

313
                        dequeue $v \leftarrow Q$

314
                        push $v \rightarrow S$

315

316
                    \EndWhile

317
                \EndProcedure

318
            \end{algorithmic}

319
            \end{algorithm}

320

233 321
\clearpage

234 322
\bibliography{references}{}

235 323
\bibliographystyle{plain}

thesis/main.tex
4 4
% between \documentclass and \begin{document}.

5 5

6 6
\usepackage[margin=1in]{geometry}  % set the margins to 1in on all sides

7
\usepackage{emptypage}

8
\usepackage{float}             % For inserting figures side-by-side

9
\usepackage{subfig}

10

7 11
\usepackage{graphicx}              % to include figures

8 12
\graphicspath{ {report_images/ images/} }       % '/' is needed

13
\newcommand*{\LargerCdot}{\raisebox{-1.5ex}{\scalebox{3}{$\cdot$}}}

9 14
\usepackage{caption}

10
\usepackage{subcaption}

11 15
\usepackage{booktabs}              % for beautiful tables

12 16

13
\usepackage{epstopdf}              % for the .ps image

14
\epstopdfsetup{update}

15
\DeclareGraphicsExtensions{.eps}

16
\epstopdfDeclareGraphicsRule{.eps}{pdf}{.pdf}{ps2pdf -dEPSCrop -dNOSAFER #1 \OutputFile}

17 17
\usepackage{amsmath}               % great math stuff

18 18
\usepackage{amsfonts}              % for blackboard bold, etc

19 19

......
22 22
\newtheorem{definition}{Definition}[section]

23 23

24 24
\usepackage{listings}              % to add code

25

25 26
\lstset{language=Java}

26 27
\renewcommand{\lstlistingname}{Code} % change "Listing 1.1" to "Code 1.1"

27 28

28
\usepackage[utf8]{inputenc}

29 29
\usepackage{color}

30 30

31 31
\usepackage{cite}

......
75 75
  \catcode_=12

76 76
}

77 77

78
\newcommand{\HRule}{\rule{\linewidth}{0.5mm}} % New command to make the lines in the title page

78 79
\usepackage{booktabs} % To thicken table lines

79 80

80 81
\usepackage{algorithm}

81 82
\usepackage[noend]{algpseudocode} % For pseudocode

82 83

84
\usepackage[toc]{appendix}

85

86
% \usepackage[lofdepth,lotdepth]{subfig}

87

83 88
\usepackage[noabbrev,capitalise]{cleveref}

84 89

90
% Font for Unicode

91
\usepackage[utf8]{inputenc}

92
% \usepackage[vietnam,english]{babel}

93

94
\newcommand*{\blankpage}{%

95
\vspace*{\fill}

96
    \begin{center}

97
        % This page is intentionally left blank.

98
    \end{center}

99
\vspace{\fill}}

100

85 101
\begin{document}

86 102

103
\pagenumbering{gobble}

87 104
\include{titlepage}

105
\include{acknowledgements}

106
\blankpage

107

108
\pagenumbering{roman}

88 109
\include{toc}

89 110

90 111
\pagenumbering{arabic}

91 112
\include{ch1_introduction}

113
\blankpage

92 114
\include{ch2_centrality}

115
\blankpage

93 116
\include{ch3_computation_of_bc}

94 117
\include{ch4_experiment_setup}

118
\blankpage

95 119
\include{ch5_experiment_result}

120
\blankpage

121
\include{ch6_conclusion}

122
\include{appendix}

96 123

124
\addcontentsline{toc}{chapter}{\listfigurename}

125
\listoffigures

126
\addcontentsline{toc}{chapter}{\listtablename}

127
\listoftables

97 128

98
\clearpage

99 129
\bibliography{references}{}

100 130
% \bibliographystyle{IEEEtranSN}

101 131
\bibliographystyle{IEEEtranN}

102 132

133
\cleardoublepage

103 134
\end{document}

thesis/references.bib
6 6
               Ulrik Brandes},

7 7
  title     = {Heuristics for Speeding Up Betweenness Centrality Computation},

8 8
  booktitle = {2012 International Conference on Privacy, Security, Risk and Trust,

9
               {PASSAT} 2012, and 2012 International Confernece on Social Computing,

10
               SocialCom 2012, Amsterdam, Netherlands, September 3-5, 2012},

9
               {PASSAT} 2012},

11 10
  pages     = {302--311},

12 11
  year      = {2012},

13 12
  crossref  = {DBLP:conf/socialcom/2012},

14
  url       = {http://dx.doi.org/10.1109/SocialCom-PASSAT.2012.66},

15 13
  doi       = {10.1109/SocialCom-PASSAT.2012.66},

16 14
  timestamp = {Mon, 14 Jan 2013 18:11:28 +0100},

17
  biburl    = {http://dblp2.uni-trier.de/rec/bib/conf/socialcom/PuzisZEDB12},

18 15
  bibsource = {dblp computer science bibliography, http://dblp.org}

19 16
}

20 17

......
28 25
 location = {Esbjerg, Denmark},

29 26
 pages = {191--203},

30 27
 numpages = {13},

31
 url = {http://dx.doi.org/10.1007/978-3-540-89900-6_20},

32 28
 doi = {10.1007/978-3-540-89900-6_20},

33 29
 acmid = {1485471},

34 30
 publisher = {Springer-Verlag},

......
55 51
note = "",

56 52
issn = "0378-8733",

57 53
doi = "http://dx.doi.org/10.1016/j.socnet.2007.11.001",

58
url = "http://www.sciencedirect.com/science/article/pii/S0378873307000731",

59 54
author = "Ulrik Brandes",

60 55
keywords = "Betweenness centrality",

61 56
keywords = "Algorithms",

......
74 69
note = "Special Issue on Wireless Network Intrusion ",

75 70
issn = "0022-0000",

76 71
doi = "http://dx.doi.org/10.1016/j.jcss.2013.06.018",

77
url = "http://www.sciencedirect.com/science/article/pii/S002200001300127X",

78
author = "Leonardo Maccari and Renato Lo Cigno",

72
author = "Leonardo Maccari and Renato {Lo Cigno}",

79 73
keywords = "Shortest path betweenness",

80 74
keywords = "OLSR",

81 75
keywords = "Multi-hop networks",

......
95 89
note = "Modeling and Performance Evaluation of Wireless Ad-Hoc Networks ",

96 90
issn = "1570-8705",

97 91
doi = "http://dx.doi.org/10.1016/j.adhoc.2014.07.016",

98
url = "http://www.sciencedirect.com/science/article/pii/S1570870514001474",

99
author = "Leonardo Maccari and Renato Lo Cigno",

92
author = "Leonardo Maccari and Renato {Lo Cigno}",

100 93
keywords = "Mesh networks",

101 94
keywords = "Community networks",

102 95
keywords = "Network topology",

......
117 110
    priority = {2},

118 111
    publisher = {American Sociological Association},

119 112
    title = {{A Set of Measures of Centrality Based on Betweenness}},

120
    url = {http://links.jstor.org/sici?sici=0038-0431\%28197703\%2940\%3A1\%3C35\%3AASOMOC\%3E2.0.CO\%3B2-H},

121 113
    volume = {40},

122 114
    year = {1977}

123 115
}

......
132 124
note = "",

133 125
issn = "0378-8733",

134 126
doi = "http://dx.doi.org/10.1016/0378-8733(78)90021-7",

135
url = "http://www.sciencedirect.com/science/article/pii/0378873378900217",

136 127
author = "Linton C. Freeman",

137 128
abstract = "The intuitive background for measures of structural centrality in social networks is reviewed and existing measures are evaluated in terms of their consistency with intuitions and their interpretability. Three distinct intuitive conceptions of centrality are uncovered and existing measures are refined to embody these conceptions. Three measures are developed for each concept, one absolute and one relative measure of the centrality of positions in a network, and one reflecting the degree of centralization of the entire network. The implications of these measures for the experimental study of small groups is examined."

138 129
}

......
147 138
abstract="This is a note to introduce a new measure of a kind of structural centrality called pair-dependency. Pair-dependency explicates the centrality-related notion of the gatekeeper. Moreover, it turns out to be a fundamental structural property of communication networks that provides the basis for the derivation of two standard measures of structural centrality.",

148 139
issn="1573-7845",

149 140
doi="10.1007/BF00184720",

150
url="http://dx.doi.org/10.1007/BF00184720",

151 141
year="1980"

152 142
}

153 143

......
162 152
note = "",

163 153
issn = "0378-8733",

164 154
doi = "http://dx.doi.org/10.1016/0378-8733(91)90017-N",

165
url = "http://www.sciencedirect.com/science/article/pii/037887339190017N",

166 155
author = {Linton C. Freeman and Stephen P. Borgatti and Douglas R. White},

167 156
abstract = "A new measure of centrality, CF, is introduced. It is based on the concept of network flows. While conceptually similar to Freeman's original measure, CB, the new measure differs from the original in two important ways. First, CF is defined for both valued and non-valued graphs. This makes CF applicable to a wider variety of network datasets. Second, the computation of CF is not based on geodesic paths as is CB but on all the independent paths between all pairs of points in the network."

168 157
}

......
179 168
 issn = {0146-4833},

180 169
 pages = {68--73},

181 170
 numpages = {6},

182
 url = {http://doi.acm.org/10.1145/2500098.2500108},

183 171
 doi = {10.1145/2500098.2500108},

184 172
 acmid = {2500108},
`
... This diff was truncated because it exceeds the maximum size that can be displayed.

Also available in: Unified diff