commit c6e2ce5aa5f9d0997081d855b59065926550e3c2
Author: Quynh PX Nguyen
Date: Fri Apr 1 12:36:16 2016 +0200
[Thesis] Final LaTeX version
diff --git a/thesis/ch5_experiment_result.tex b/thesis/ch5_experiment_result.tex
index d269645..31f718a 100644
--- a/thesis/ch5_experiment_result.tex
+++ b/thesis/ch5_experiment_result.tex
@@ -1,3 +1,154 @@
%!TEX root = main.tex
-\chapter{Simulation experiments} \label{sec:sim_experiment}
+\chapter{Results} \label{sec:experiment_result}
+ The chapter begins with the result from the synthetic graphs in \cref{sec:result_synthetic}. Since HBC only improve performance for one type of synthetic graphs -- the Community Network graphs, then in \cref{sec:result_synthetic_analysis} we provides commentary on the reason HBC works for CN, but not the other topology types ER and PL. Then we moves on to the next section presenting about the three real topologies of WCNs -- Ninux, FFG, FFW in \cref{sec:result_wcn}. The chapter ends with the running time analysis for all those WCNs in both the Ubiquity router and a computer.
+
+\section{Running time for synthetic graphs} \label{sec:result_synthetic}
+ As mentioned in \cref{tab:experiment_design}, only the Community Network synthetic graphs are run on both the computer and router. While the remaining two synthetic graphs -- Erdos and Power Law -- are only tested for performance with the computer.
+
+ The value on the y-axis for the running time of HBC and BBC for those figures \ref{fig:CN_both_BC_HBC}, \ref{fig:ER_PL_server_BC_HBC}, \ref{fig:CN_real_BC_HBC} is the average of average.
+
+ Let $r_{ij}^{(100)}, i \in [1, 10]$ (because we repeat the same measurement 10 times) denote the running time of BBC algorithm for a CN graph with 100 nodes $g_j^{(100)}, j \in [1, 30]$ since for each type of graphs, we generate 30 graphs in total. Then the value on the y-axis corresponding to graph with 100 nodes is the average running time of all the repetitions for all CN graphs with 100 nodes. Formally, check out \cref{eq:avg_running_time}
+
+ \begin{equation}
+ \label{eq:avg_running_time}
+ v^{(100)} = \frac{1}{300} \sum_{j=1}^{30} \sum_{i = 1}^{10} r_{ij}^{(100)}
+ \end{equation}
+
+ For the CN graphs, \cref{fig:CN_both_BC_HBC} summarizes the running time observed for graphs with different node types. We see a huge improvement of time of HBC over BBC when performing the calculation on both the computer and the router. The difference between two graphs is the scale of the y-axis. The program executes much faster in the computer than in the router, but this is simply due to the higher processing power of the computer. What is interesting from \cref{fig:CN_both_BC_HBC} is how fast HBC complete the calculation comparing to BBC. We see that for CN graphs with 1000 nodes, HBC finishes the computation 100 faster than BBC with the computer, and 60 times fater with the router. Also for CN graphs with 1000 nodes, with the router, BBC takes more than 6 minutes to compute betweenness centrality, while HBC can finish in less than 7 seconds. This is the stark constrast on the performance. Given the router has small processing power, and it has other tasks to do such as forwarding packets, faster execution time can result in better performance overall. From looking at a graph, we can see that HBC running time increase linearly, while BBC running time quickly soars.
+
+ \begin{figure}[hpt]
+ \begin{tabular}{cc}
+ \subfloat[fig 1][]{\includegraphics[width=0.45\textwidth]{images/CN_server_BC_HBC.png}} &
+ \subfloat[fig 2][]{\includegraphics[width=0.45\textwidth]{images/CN_server_speedup_ratio.png}} \\
+ \subfloat[fig 3][]{\includegraphics[width=0.45\textwidth]{images/CN_artificial_router_BC_HBC.png}} &
+ \subfloat[fig 4][]{\includegraphics[width=0.45\textwidth]{images/CN_artificial_router_speedup_ratio.png}}
+ \end{tabular}
+ \caption[Performance for CN graphs on the computer and the Ubiquiti router]{Performance for CN graphs on the computer and the Ubiquiti router. Left column shows the running time for both algorithms. Right column shows the speed-up ratio between HBC and BBC.}
+ \label{fig:CN_both_BC_HBC}
+ \end{figure}
+
+ We have just looked at one type of graph where HBC outperforms BBC - the CN graphs. Next we are going to look at two other types of graph where using HBC worsens the overall performance. \cref{fig:ER_PL_server_BC_HBC} shows the running time on server for Power Law and Erdos graphs. Both lines follow the same pattern: the running time of BBC is a bit smaller than HBC's running time.
+
+
+ \begin{figure}
+ \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/ER_server_BC_HBC.png}}
+ \subfloat[fig 2][]{\includegraphics[width=0.5\textwidth]{images/PL_server_BC_HBC.png}}
+ \caption{Performance for Erdos and Power Law graphs on the computer.}
+ \label{fig:ER_PL_server_BC_HBC}
+ \end{figure}
+
+ \begin{figure}
+ \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/ER_server_speedup_ratio.png}}
+ \subfloat[fig 2][]{\includegraphics[width=0.5\textwidth]{images/PL_server_speedup_ratio.png}}
+ \caption{Speed up ratio of HBC over BBC}
+ \label{fig:ER_PL_speed_up_ratio_BC_HBC}
+ \end{figure}
+
+ The question then arises is when HBC can offer improvement over BBC, what kind of topologies should we use HBC to calculate betweenness centrality.
+
+\section{Synthetic graphs analysis} \label{sec:result_synthetic_analysis}
+ In the previous section, we noticed that HBC helps improving the performance of Community Network graphs. While for Erdos and Power Lab graphs, HBC takes longer time to run. This section analyzes all the 900 synthetic graphs to give an explanation on why HBC improves performance for some graphs, but not for the others.
+
+ The section has two parts, the \cref{sec:result_analysis_num_of_bcc} seeks answer for whether the number of bi-connected components (BCCs) found in a single graph affect the performance. \cref{sec:result_analysis_num_of_nodes} justifies that the number of vertices (or nodes) in any single bi-connected components is the main factor affecting the performance.
+
+ \textbf{Notations:} For each graph type [CN, PL, ER] and for each graph size [100, 200, ..., 1000], we generated 30 graphs in total. We call those graphs $G^{(i)}, i \in [1, 30]$. For each $G^{(i)}$, suppose that we can find $n^{(i)}$ bi-connected components, $B_j^{(i)}, j \in [1, n^{(i)}]$.
+
+ \subsection*{The number of bi-connected components} \label{sec:result_analysis_num_of_bcc}
+ The average number of BCCs for each type of graphs is summarized in \cref{tab:num_of_bcc_for_graphs}, and visualized in \cref{fig:num_of_bcc_for_graphs}.
+
+ For each graph type, and for each graph size:
+ \begin{equation}
+ N_{BCC} = \frac{1}{30} \sum_{i = 1}^{30} n^{(i)}
+ \end{equation}
+
+
+ \begin{figure}[hpt]
+ \centering
+ \includegraphics[width=0.6\textwidth]{images/num_of_bcc_for_graphs.png}
+ \caption[Average number of BCCs for synthetic graphs]{Average number of BCCs for synthetic graphs. Note that the y-axis is drawn in log-scale with base 2.}
+ \label{fig:num_of_bcc_for_graphs}
+ \end{figure}
+
+ For CN graphs, the number of BCCs is almost equal to the number of nodes in the graph. While for PL graphs, in general we find only 1 BCC, and this BCC is the original PL graph. The case for ER graphs is a bit different, there are several BCCs found, and the number of BCCs found increase with the number of nodes in ER graphs, e.g from 2.4 average number of BCCs for graph with 100 nodes, up to almost 16 BCCs for graph with 1000 nodes.
+
+ Our hypothesis is that if the original graph can be divided into many BCCs, then HBC will save the running time. However, the data shows us that even though ER graphs can be divided into many BCCs, HBC does not perform better for ER graph, see more in \cref{fig:ER_PL_server_BC_HBC}. Hence, the number of BCCs can not be used as a prediction for whether or not HBC would decrease the calculating time for betweenness centrality.
+
+
+ \subsection*{The number of node in each bi-connected component} \label{sec:result_analysis_num_of_nodes}
+ In the previous \cref{sec:result_analysis_num_of_bcc}, we showed that the number of BCCs found in each graph cannot predict whether HBC would run faster comparing to BBC. This subsection takes a different approach, it analyzes the bi-connected components in term of the its size -- the number of nodes in a bi-connected component. More specifically, we focus on the biggest BCC.
+
+ After the preprocessing step done in HBC algorithm, then it uses a modified version (see \cref{algo:accumulation_heuristic} of the original Brandes Betweenness Centrality to accumulate the betweenness centrality score. Therefore, if the size of each BCC is as big as the original graphs, then HBC does not save time when calculating betweenness. Not to mention the fact that HBC adds overhead to the overall running time when it pre-processes the original graph.
+
+ \begin{figure}[hpt]
+ \centering
+ \includegraphics[width=0.6\textwidth]{images/max_nodes_in_graph.png}
+ \caption{The biggest bi-connected components for each synthetic graph}
+ \label{fig:max_nodes_in_graph}
+ \end{figure}
+
+ Let $MB^{(i)}$ denotes the maximum number of nodes for a bi-connected component in a graph $G^{(i)}$:
+
+ \begin{equation}
+ MaxB^{(i)} = \max(B_j^{(i)}), j \in [1, n^{(i)}]
+ \end{equation}
+
+ Let $|B_j^{(i)}|$ denotes the number of nodes within each bi-connected component $B_j^{(i)}$. Then the y-value $N_{max}$ in the \cref{fig:max_nodes_in_graph} is defined as follow:
+
+ \begin{equation}
+ \label{eq:avg_max_num_of_node_in_bcc}
+ N_{max} = \frac{1}{30} \sum_{i = 1}^{30} MaxB^{(i)}
+ \end{equation}
+
+ For each graph type [CN, PL, ER] and for each graph size [100, 200, ..., 1000], we calculate the average size of the biggest bi-connected component for each graph following the \cref{eq:avg_max_num_of_node_in_bcc}. And the result is plotted in \cref{fig:max_nodes_in_graph}. The graph shows that for PL and ER, the largest BCC contains almost all the nodes in the original graph (see the scatter points for PL and ER seems to follow the straight line dividing the Q1 into two halves).
+
+ In the case of CN graphs, even for the graph size of 1000 nodes, the largest BCC on average only have 17 vertices. Therefore, computing the betweenness centrality for such the small BCC will improve the running time of HBC significantly.
+
+ For the raw data, see \cref{tab:max_nodes_in_graph}
+
+
+
+\section{Three real WCN} \label{sec:result_wcn}
+ \cref{tab:real_wcn_analysis} shows the high-level features of the FFG, FFW, Ninux such as the number of vertices and nodes, and the analysis on the bi-connected components. It shows that the number of nodes in the largest bi-connected component accounted for $27\%$, $46\%$, $35\%$ the total nodes in the FFG, FFW and Ninux network, respectively.
+
+ \begin{table}[hpt]
+ \centering
+ \begin{tabular}{l c c c }
+ \toprule
+ & FFG & FFW & Ninux \\
+ \midrule
+ number of vertices & 126 & 235 & 126 \\
+ number of edges & 181 & 370 & 143 \\
+ edges/nodes ratio & 1.44 & 1.57 & 1.13 \\
+ density & 0.022 & 0.013 & 0.018 \\
+ number of bi-connected components & 67 & 120 & 80 \\
+ max number of nodes in BCC & 35 & 107 & 44 \\
+ \bottomrule
+ \end{tabular}
+ \caption{Features of the network under analysis}
+ \label{tab:real_wcn_analysis}
+ \end{table}
+
+ For the real wireless community networks, HBC consistently outperform BBC in the running time. In the case of FFG, where the largest bi-connected component includes 27\% of total nodes, we also observe the largest speed-up ratio between HBC and BBC -- more than 6 times when running on the router. On another hand, for FFW, where the biggest bi-connected component has 46\% of total nodes, we only observe the speed-up ratio around 3. But in all cases, HBC reduces the time to calculate betweenness centrality for all real topologies. For the running time of HBC and BBC, see more in \cref{fig:CN_real_BC_HBC}; for the speed-up ratio, check out \cref{fig:CN_real_speedup_ratio_BC_HBC}. The raw data is provided in \cref{tab:CN_real_router_BC_HBC} and \cref{tab:CN_real_server_BC_HBC}
+
+ \begin{figure}[hpt]
+ \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_router_BC_HBC.png}}
+ \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_server_BC_HBC.png}}
+ \caption[Running time of HBC and BBC for real WCNs]{Running time of HBC and BBC for real WCNs. The X-axis denotes name of Wireless Community Networks.}
+ \label{fig:CN_real_BC_HBC}
+ \end{figure}
+
+ \begin{figure}[hpt]
+ \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_router_speedup_ratio.png}}
+ \subfloat[fig 1][]{\includegraphics[width=0.5\textwidth]{images/CN_real_server_speedup_ratio.png}}
+ \caption[Speed up ratio of HBC over BBC for real WCNs]{Speed up ratio of HBC over BBC for real WCNs. The X-axis is the name of Wireless Community Networks}
+ \label{fig:CN_real_speedup_ratio_BC_HBC}
+ \end{figure}
+
+ The raw data is also provided in \cref{tab:CN_real_server_BC_HBC} and \cref{tab:CN_real_router_BC_HBC}
+
+
+
+
+
+