Task B: Evaluate your Data Structures and SIR model
In this second task, you will evaluate your implemented structures in terms of their time complexities for the different operations and different use case scenarios. Scenarios arise from the possible use cases of contract tracing and epidemic modelling on a graph. In addition, run some epidemic modelling simulations to evaluate your structures within an application (epidemic modelling and simulation) and to explore more about the effect of the model parameters on epidemic spreading.
Write a report on your analysis and evaluation of the different implementations. Consider and recommend in which scenarios each type of implementation would be most appropriate. Report the outcomes of your SIR epidemic model simulations.
Graph generators
To evaluate the data structures, we will generate a number of graphs, then apply a number of operations on them, see the Use Case Scenarios section next. In this section, we describe more about how to generate the graphs, and see on Canvas for further details and videos.
There are many ways to generate graphs, but we will focus on two:
Random (Erdos-Renyi) – there is a probability for generating an edge between any pair of vertices.
Scale-free – graphs whose degree distribution follow a power law (see https://en. wikipedia.org/wiki/Scale-free_network).
We suggest to use Pajek , which is available on multiple platforms and has visualisation, which allows one to look at the generated graphs. But it also has basic graph generation functionality and doesn’t require programming to use, unlike more powerful graph libraries.
Use Case Scenarios
Typically, you use real usage data to evaluate your data structures. However, for this assignment, you will write data generators to enable testing over different scenarios of interest. We are also interested in the effect of the average degree of the graph on these scenarios. There are many possibilities, but for this assignment, consider the following scenarios:
Scenario 1 k-hop Neighbourhoods: When doing contract tracing and/or epidemic modelling, computing neighbourhoods is an important function. In this scenario, the graph is not changing, but important operations such as k-hop neighbourhood (recall this is all neighbours up to k-hops away) are requested.
You are to evaluate the performance of the the k-hop neighbourhood implementations, as the average degree of the evaluated graph and k are varied.
Scenario 2 Dynamic Contact Conditions: In the real world, the contact conditions between people are likely not static and there will be need to update these over time. In this scenario, the contacts are changing (been added and deleted). In this scenario, you are to evaluate the performance of your implementations in terms of:
edge additions
edge deletions
You are to evaluate the performance the edge operations as the average degree and size (number of vertices) of the initial graph (before edge changes) are varied.
Scenario 3 Dynamic People Tracing: As time goes on, the people been traced will change. Although it is possible to have a large graph that tries to capture every possible person, it is computationally difficult to do so and run simulations on it. In this scenario, you are to evaluate the performance of your implementations in terms of:
vertex addition
vertex deletion
You are to evaluate the performance the vertex operations as the average degree and size (number of vertices) of the initial graph (before vertex changes) are varied.
SIR Model Epidemic Simulation
In addition to evaluating how well the graph representations perform for the contract tracing and epidemic modelling simulations, we also want to experiment with how the networked SIR model works and performs.
Run epidemic simulation for different graph types (Erdos-Renyi and Scale-free), different seed initialisations and different infection and recover probabilities. For performance evaluation, evaluate each of the three data structures on a subset of the possible simulation parameters and compare how they perform (in terms of timing efficiency).
We don’t expect a comprehensive analysis, but want you to explore and gain a better understanding of epidemic modelling and what it implies for how to limit the spread of epidemics.
Consider how you might present the results, but at a minimum plot the following:
Number of susceptible, infected and recovered after every iteration.
Data Generation
When generating the vertices and edges to add or remove and find neighbourhoods for, the distribution of these elements, compared to what is in the graph already, will have an effect on the timing performance. However, without the usage and query data, it is difficult to specify what this distributions might be. Instead, in this assignment, uniformly sample from a fixed range, e.g., 0 to max vertex label of your graph when generating the vertices and edges for removing and adding and k-hop neighbourhoods, and a different range (e.g., greater than the largest vertex label when adding vertices (we do not want to repeatingly add vertices that are in the graph already).
For generating graphs with different initial average degrees and sizes, you can either use Pajek or your favourite graph generator, we will not prescribe one particular approach. Whichever method you decide to use, remember to generate graphs of different average degrees to evaluate on. Due to the randomness of the data, you may wish to generate a few graphs with the same average degrees and sizes and take the average across a number of runs when performing the performance evaluation and analysis.
Analysis
In your analysis, you should evaluate each of your representations and data structures in terms of the different scenarios outlined above.
Note, you may be generating and evaluating a significant number of graphs and evaluation datasets, hence we advise you to get started on this part relatively early.
Report Structure
As a guide, the report could contain the following sections:
Explain your data and experimental setup. Things to include are (brief) explanations of the data scenarios you decide to evaluate on (e.g., how many additions/removals operations did you evaluate and why these), size of the graphs, the range of average degrees tested (add some brief explanation of why this range selection), describe how the scenarios were generated (a paragraph and perhaps a figure or high level pseudo code suffice), which approach(es) you decide to use for measuring the timing results, and briefly describe the fixed set(s) you sampled from to generate the elements for addition, removal of vertices and edges and k-hop neighbourhood of the graphs.
Evaluation of the data structures using the generated data (different scenarios, average degree, graph size). Analyse, compare and discuss your results. Provide your explanation on why you think the results are as you observed. You may consider using the known theoretical time complexities of the operations of each data structure to help in your explanation if useful.
Summarise your analysis as recommendations, e.g., for this certain data scenario of this average degree (and size), I recommend to use this data structure because... We suggest you refer to your previous analysis to help.
Evaluate the different parameters of the SIR model and its effect on the number
of susceptible, infected and recovered individuals over time. Which combination
of parameters has the fastest infection spread, which one has the most extensive
number of infected? How does the average timing compare between your three
graph representation? Can you explain why?
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.