Revision c6e2ce5a thesis/ch5_experiment_result.tex
thesis/ch5_experiment_result.tex  

1  1 
%!TEX root = main.tex 
2  2  
3 
\chapter{Simulation experiments} \label{sec:sim_experiment} 

3 
\chapter{Results} \label{sec:experiment_result} 

4 
The chapter begins with the result from the synthetic graphs in \cref{sec:result_synthetic}. Since HBC only improve performance for one type of synthetic graphs  the Community Network graphs, then in \cref{sec:result_synthetic_analysis} we provides commentary on the reason HBC works for CN, but not the other topology types ER and PL. Then we moves on to the next section presenting about the three real topologies of WCNs  Ninux, FFG, FFW in \cref{sec:result_wcn}. The chapter ends with the running time analysis for all those WCNs in both the Ubiquity router and a computer. 

5  
6 
\section{Running time for synthetic graphs} \label{sec:result_synthetic} 

7 
As mentioned in \cref{tab:experiment_design}, only the Community Network synthetic graphs are run on both the computer and router. While the remaining two synthetic graphs  Erdos and Power Law  are only tested for performance with the computer. 

8  
9 
The value on the yaxis for the running time of HBC and BBC for those figures \ref{fig:CN_both_BC_HBC}, \ref{fig:ER_PL_server_BC_HBC}, \ref{fig:CN_real_BC_HBC} is the average of average. 

10  
11 
Let $r_{ij}^{(100)}, i \in [1, 10]$ (because we repeat the same measurement 10 times) denote the running time of BBC algorithm for a CN graph with 100 nodes $g_j^{(100)}, j \in [1, 30]$ since for each type of graphs, we generate 30 graphs in total. Then the value on the yaxis corresponding to graph with 100 nodes is the average running time of all the repetitions for all CN graphs with 100 nodes. Formally, check out \cref{eq:avg_running_time} 

12  
13 
\begin{equation} 

14 
\label{eq:avg_running_time} 

15 
v^{(100)} = \frac{1}{300} \sum_{j=1}^{30} \sum_{i = 1}^{10} r_{ij}^{(100)} 

16 
\end{equation} 

17  
18 
For the CN graphs, \cref{fig:CN_both_BC_HBC} summarizes the running time observed for graphs with different node types. We see a huge improvement of time of HBC over BBC when performing the calculation on both the computer and the router. The difference between two graphs is the scale of the yaxis. The program executes much faster in the computer than in the router, but this is simply due to the higher processing power of the computer. What is interesting from \cref{fig:CN_both_BC_HBC} is how fast HBC complete the calculation comparing to BBC. We see that for CN graphs with 1000 nodes, HBC finishes the computation 100 faster than BBC with the computer, and 60 times fater with the router. Also for CN graphs with 1000 nodes, with the router, BBC takes more than 6 minutes to compute betweenness centrality, while HBC can finish in less than 7 seconds. This is the stark constrast on the performance. Given the router has small processing power, and it has other tasks to do such as forwarding packets, faster execution time can result in better performance overall. From looking at a graph, we can see that HBC running time increase linearly, while BBC running time quickly soars. 

19  
20 
\begin{figure}[hpt] 

21 
\begin{tabular}{cc} 

22 
\subfloat[fig 1][]{\includegraphics[width=0.45\textwidth]{images/CN_server_BC_HBC.png}} & 

23 
\subfloat[fig 2][]{\includegraphics[width=0.45\textwidth]{images/CN_server_speedup_ratio.png}} \\ 

24 
\subfloat[fig 3][]{\includegraphics[width=0.45\textwidth]{images/CN_artificial_router_BC_HBC.png}} & 

25 
\subfloat[fig 4][]{\includegraphics[width=0.45\textwidth]{images/CN_artificial_router_speedup_ratio.png}} 

26 
\end{tabular} 

27 
\caption[Performance for CN graphs on the computer and the Ubiquiti router]{Performance for CN graphs on the computer and the Ubiquiti router. Left column shows the running time for both algorithms. Right column shows the speedup ratio between HBC and BBC.} 

28 
\label{fig:CN_both_BC_HBC} 

29 
\end{figure} 

30  
31 
We have just looked at one type of graph where HBC outperforms BBC  the CN graphs. Next we are going to look at two other types of graph where using HBC worsens the overall performance. \cref{fig:ER_PL_server_BC_HBC} shows the running time on server for Power Law and Erdos graphs. Both lines follow the same pattern: the running time of BBC is a bit smaller than HBC's running time. 

32  
33  
34 
\begin{figure} 

35 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/ER_server_BC_HBC.png}} 

36 
\subfloat[fig 2][]{\includegraphics[width=0.5\textwidth]{images/PL_server_BC_HBC.png}} 

37 
\caption{Performance for Erdos and Power Law graphs on the computer.} 

38 
\label{fig:ER_PL_server_BC_HBC} 

39 
\end{figure} 

40  
41 
\begin{figure} 

42 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/ER_server_speedup_ratio.png}} 

43 
\subfloat[fig 2][]{\includegraphics[width=0.5\textwidth]{images/PL_server_speedup_ratio.png}} 

44 
\caption{Speed up ratio of HBC over BBC} 

45 
\label{fig:ER_PL_speed_up_ratio_BC_HBC} 

46 
\end{figure} 

47  
48 
The question then arises is when HBC can offer improvement over BBC, what kind of topologies should we use HBC to calculate betweenness centrality. 

49  
50 
\section{Synthetic graphs analysis} \label{sec:result_synthetic_analysis} 

51 
In the previous section, we noticed that HBC helps improving the performance of Community Network graphs. While for Erdos and Power Lab graphs, HBC takes longer time to run. This section analyzes all the 900 synthetic graphs to give an explanation on why HBC improves performance for some graphs, but not for the others. 

52  
53 
The section has two parts, the \cref{sec:result_analysis_num_of_bcc} seeks answer for whether the number of biconnected components (BCCs) found in a single graph affect the performance. \cref{sec:result_analysis_num_of_nodes} justifies that the number of vertices (or nodes) in any single biconnected components is the main factor affecting the performance. 

54  
55 
\textbf{Notations:} For each graph type [CN, PL, ER] and for each graph size [100, 200, ..., 1000], we generated 30 graphs in total. We call those graphs $G^{(i)}, i \in [1, 30]$. For each $G^{(i)}$, suppose that we can find $n^{(i)}$ biconnected components, $B_j^{(i)}, j \in [1, n^{(i)}]$. 

56  
57 
\subsection*{The number of biconnected components} \label{sec:result_analysis_num_of_bcc} 

58 
The average number of BCCs for each type of graphs is summarized in \cref{tab:num_of_bcc_for_graphs}, and visualized in \cref{fig:num_of_bcc_for_graphs}. 

59  
60 
For each graph type, and for each graph size: 

61 
\begin{equation} 

62 
N_{BCC} = \frac{1}{30} \sum_{i = 1}^{30} n^{(i)} 

63 
\end{equation} 

64  
65  
66 
\begin{figure}[hpt] 

67 
\centering 

68 
\includegraphics[width=0.6\textwidth]{images/num_of_bcc_for_graphs.png} 

69 
\caption[Average number of BCCs for synthetic graphs]{Average number of BCCs for synthetic graphs. Note that the yaxis is drawn in logscale with base 2.} 

70 
\label{fig:num_of_bcc_for_graphs} 

71 
\end{figure} 

72  
73 
For CN graphs, the number of BCCs is almost equal to the number of nodes in the graph. While for PL graphs, in general we find only 1 BCC, and this BCC is the original PL graph. The case for ER graphs is a bit different, there are several BCCs found, and the number of BCCs found increase with the number of nodes in ER graphs, e.g from 2.4 average number of BCCs for graph with 100 nodes, up to almost 16 BCCs for graph with 1000 nodes. 

74  
75 
Our hypothesis is that if the original graph can be divided into many BCCs, then HBC will save the running time. However, the data shows us that even though ER graphs can be divided into many BCCs, HBC does not perform better for ER graph, see more in \cref{fig:ER_PL_server_BC_HBC}. Hence, the number of BCCs can not be used as a prediction for whether or not HBC would decrease the calculating time for betweenness centrality. 

76  
77  
78 
\subsection*{The number of node in each biconnected component} \label{sec:result_analysis_num_of_nodes} 

79 
In the previous \cref{sec:result_analysis_num_of_bcc}, we showed that the number of BCCs found in each graph cannot predict whether HBC would run faster comparing to BBC. This subsection takes a different approach, it analyzes the biconnected components in term of the its size  the number of nodes in a biconnected component. More specifically, we focus on the biggest BCC. 

80  
81 
After the preprocessing step done in HBC algorithm, then it uses a modified version (see \cref{algo:accumulation_heuristic} of the original Brandes Betweenness Centrality to accumulate the betweenness centrality score. Therefore, if the size of each BCC is as big as the original graphs, then HBC does not save time when calculating betweenness. Not to mention the fact that HBC adds overhead to the overall running time when it preprocesses the original graph. 

82  
83 
\begin{figure}[hpt] 

84 
\centering 

85 
\includegraphics[width=0.6\textwidth]{images/max_nodes_in_graph.png} 

86 
\caption{The biggest biconnected components for each synthetic graph} 

87 
\label{fig:max_nodes_in_graph} 

88 
\end{figure} 

89  
90 
Let $MB^{(i)}$ denotes the maximum number of nodes for a biconnected component in a graph $G^{(i)}$: 

91  
92 
\begin{equation} 

93 
MaxB^{(i)} = \max(B_j^{(i)}), j \in [1, n^{(i)}] 

94 
\end{equation} 

95  
96 
Let $B_j^{(i)}$ denotes the number of nodes within each biconnected component $B_j^{(i)}$. Then the yvalue $N_{max}$ in the \cref{fig:max_nodes_in_graph} is defined as follow: 

97  
98 
\begin{equation} 

99 
\label{eq:avg_max_num_of_node_in_bcc} 

100 
N_{max} = \frac{1}{30} \sum_{i = 1}^{30} MaxB^{(i)} 

101 
\end{equation} 

102  
103 
For each graph type [CN, PL, ER] and for each graph size [100, 200, ..., 1000], we calculate the average size of the biggest biconnected component for each graph following the \cref{eq:avg_max_num_of_node_in_bcc}. And the result is plotted in \cref{fig:max_nodes_in_graph}. The graph shows that for PL and ER, the largest BCC contains almost all the nodes in the original graph (see the scatter points for PL and ER seems to follow the straight line dividing the Q1 into two halves). 

104  
105 
In the case of CN graphs, even for the graph size of 1000 nodes, the largest BCC on average only have 17 vertices. Therefore, computing the betweenness centrality for such the small BCC will improve the running time of HBC significantly. 

106  
107 
For the raw data, see \cref{tab:max_nodes_in_graph} 

108  
109  
110  
111 
\section{Three real WCN} \label{sec:result_wcn} 

112 
\cref{tab:real_wcn_analysis} shows the highlevel features of the FFG, FFW, Ninux such as the number of vertices and nodes, and the analysis on the biconnected components. It shows that the number of nodes in the largest biconnected component accounted for $27\%$, $46\%$, $35\%$ the total nodes in the FFG, FFW and Ninux network, respectively. 

113  
114 
\begin{table}[hpt] 

115 
\centering 

116 
\begin{tabular}{l c c c } 

117 
\toprule 

118 
& FFG & FFW & Ninux \\ 

119 
\midrule 

120 
number of vertices & 126 & 235 & 126 \\ 

121 
number of edges & 181 & 370 & 143 \\ 

122 
edges/nodes ratio & 1.44 & 1.57 & 1.13 \\ 

123 
density & 0.022 & 0.013 & 0.018 \\ 

124 
number of biconnected components & 67 & 120 & 80 \\ 

125 
max number of nodes in BCC & 35 & 107 & 44 \\ 

126 
\bottomrule 

127 
\end{tabular} 

128 
\caption{Features of the network under analysis} 

129 
\label{tab:real_wcn_analysis} 

130 
\end{table} 

131  
132 
For the real wireless community networks, HBC consistently outperform BBC in the running time. In the case of FFG, where the largest biconnected component includes 27\% of total nodes, we also observe the largest speedup ratio between HBC and BBC  more than 6 times when running on the router. On another hand, for FFW, where the biggest biconnected component has 46\% of total nodes, we only observe the speedup ratio around 3. But in all cases, HBC reduces the time to calculate betweenness centrality for all real topologies. For the running time of HBC and BBC, see more in \cref{fig:CN_real_BC_HBC}; for the speedup ratio, check out \cref{fig:CN_real_speedup_ratio_BC_HBC}. The raw data is provided in \cref{tab:CN_real_router_BC_HBC} and \cref{tab:CN_real_server_BC_HBC} 

133  
134 
\begin{figure}[hpt] 

135 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_router_BC_HBC.png}} 

136 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_server_BC_HBC.png}} 

137 
\caption[Running time of HBC and BBC for real WCNs]{Running time of HBC and BBC for real WCNs. The Xaxis denotes name of Wireless Community Networks.} 

138 
\label{fig:CN_real_BC_HBC} 

139 
\end{figure} 

140  
141 
\begin{figure}[hpt] 

142 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_router_speedup_ratio.png}} 

143 
\subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_server_speedup_ratio.png}} 

144 
\caption[Speed up ratio of HBC over BBC for real WCNs]{Speed up ratio of HBC over BBC for real WCNs. The Xaxis is the name of Wireless Community Networks} 

145 
\label{fig:CN_real_speedup_ratio_BC_HBC} 

146 
\end{figure} 

147  
148 
The raw data is also provided in \cref{tab:CN_real_server_BC_HBC} and \cref{tab:CN_real_router_BC_HBC} 

149  
150  
151  
152  
153  
154 
Also available in: Unified diff