Q1 Collect data from TMDb and visualize co-actor network
Q1.1 Collect data from TMDb and build a graph
For this Q1.1, you will be using and submitting a python file. Complete all tasks according to the instructions found in submission.py to complete the Graph class, the TMDbAPIUtils class, and the two global functions. The Graph class will serve as a re-usable way to represent and write out your collected graph data. The TMDbAPIUtils class will be used to work with the TMDB API for data retrieval.
NOTE: You must only use a version of Python ≥ 3.7.0 and < 3.8 for this question. This question has been developed, tested for these versions. You must not use any other versions (e.g., Python 3.8). While we want to be able to extend to more Python versions, the specified versions are what we can definitively support at this time.
NOTE: You must only use the modules and libraries provided at the top of submission.py and modules from the Python Standard Library. Pandas and Numpy CANNOT be used — while we understand that they are useful libraries to learn, completing this question is not critically dependent on their functionality. In addition, to enable our TAs to provide better, more consistent support to our students, we have decided to focus on the subset of libraries.
NOTE: We will call each function once in submission.py during grading. The total runtime of submission.py must not exceed 10 minutes. Submissions exceeding this limit will receive zero credit. The average runtime of the code during grading is expected to take approximately 4 seconds. When we grade, we will take into account what your code does, and aspects that may be out of your control. For example, sometimes the server may be under heavy load, which may significantly increase the response time (e.g., the closer it is to HW1 deadline, likely the longer the response time!).
a) Implementation of the Graph class according to the instructions in submission.py
b) Implementation of the TMDbAPIUtils class according to the instructions in submission.py. You will use version 3 of the TMDb API to download data about actors and their co-actors. To use the TMDb API:
o Create a TMDb account and obtain your client id / client secret which are required to obtain an authentication Token. Refer to this document for detailed instructions (log in using your GT account).
o Refer to the TMDB API Documentation as you work on this question. The documentation contains a helpful ‘try-it-out’ feature for interacting with the API calls.
c) Producing correct nodes.csv and edges.csv. You must upload your nodes.csv and edges.csv files to Argo-Lite as directed in Q1.2.
NOTE: Q1.2 builds on the results of Q1.1
Q1.2 Visualizing a graph of co-actors using Argo-Lite
Using Argo Lite, visualize a network of actors and their co-actors. You will produce an Argo Lite graph snapshot your edges.csv and nodes.csv from Q1.1.c.
a. To get started, review Argo Lite’s readme on GitHub. Argo Lite has been open-sourced.
b. Importing your Graph
● Launch Argo Lite
● From the menu bar, click ‘Graph’ → ‘Import CSV’. In the dialogue that appears:
o Select ‘I have both nodes and edges file’
● Under Nodes, use ‘Choose File’ to select nodes.csv from your computer
o Leave 'Has Headers' selected
o Verify ‘Column for Node ID’ is ‘id’
● Under Edges, use ‘Choose File’ to select edges.csv from your computer
o Verify ‘Column for Source ID’ is ‘source’
o Select ‘Column for Target ID’ to ‘target’
o Verify ‘Selected Delimiter’ is ','
● At the bottom of the dialogue, verify that ‘After import, show’ is set to ‘All Nodes’
● The graph will load in the window. Note that the layout is paused by default; you can select to 'Resume’ or ‘Pause’ layout as needed.
● Dragging a node will 'pin' it, freezing its position. Selecting a pinned node, right clicking it, then choosing 'unpin selected' will unpin that node, so its position will once again be computed by the graph layout algorithm. Experiment with pinning and unpinning nodes.
NOTE: If a malformed .csv is uploaded, Argo-Lite could become un-responsive. If you suspect this is the case, open the developer tools for your browser and review any console error messages.
c. Setting graph display options
● On “Graph Options” panel, under 'Nodes' → 'Modifying All Nodes', expand 'Color' menu
o Select Color by 'degree', with scale: ‘Linear Scale’
o Select a color gradient of your choice that will assign lighter colors to nodes with higher node degrees, and darker colors to nodes with lower degrees
● Collapse the 'Color' options, expand the 'Size' options.
o Select 'Scale by' to 'degree', with scale: Linear Scale'
o Select meaningful Size Range values of your choice or use the default range.
● Collapse the 'Size' options
● On the Menu, click ‘Tools’ → ‘Data Sheet’
● Within the ‘Data Sheet’ dialogue:
o Click ‘Hide All’
o Set ‘10 more nodes with highest degree’
o Click ‘Show’ and then close the ‘Data Sheet’ dialogue
● Click and drag a rectangle selection around the visible nodes
● With the nodes selected, configure their node visibility by setting the following:
o Go to 'Graph Options' → 'Labels'
o Click ‘Show Labels of Selected Nodes’
o At the bottom of the menu, select 'Label By' to ‘name'
o Adjust the ‘Label Length’ so that the full text of the actor name is displayed
● Show only non-leaf vertices. On the Menu, click ‘Tools’ → Data Sheet→ ‘Show k More Nodes with Highest Degree’. (where k is the input number of nodes such that only nodes with a degree > 1 are visible). To make this easier, we suggest writing a utility function in your Graph class to find the count of leaf nodes in order to determine how many nodes should be shown.
The result of this workflow yields a graph with the sizing and coloring depend upon the node degree and the nodes with the highest degree are emphasized by showing their labels. Also,
d. Designing a meaningful graph layout
Using the following guidelines, create a visually meaningful and appealing layout:
● Reduce as much edge crossing as possible
● Do not allow any nodes to overlap
● Keep the graph compact and symmetric as possible
● Use the nodes’ spatial positions to convey information (e.g., “clusters” or groups)
● Experiment with showing additional node labels. If showing all node labels creates too much visual complexity, show at least 10 “important” node labels. You may decide what “importance” mean to you. For example, you may consider nodes (actors) having higher connectivity as potentially more “important” (based on how the graph is built).
The objective of this task is to familiarize yourself with basic, important graph visualization features. Therefore, this is an open-ended task and most designs are acceptable You should experiment with Argo Lite’s features, changing node size and shape, etc. In practice, it is not possible to create “perfect” visualizations for most graph datasets. The above guidelines are ones that generally help. However, like most design tasks, creating a visualization is about making selective design compromises. Some guidelines could create competing demands and following all guidelines may not guarantee a “perfect” design.
If you want to save your Argo Lite graph visualization snapshot locally to your device, so you can continue working on it later, we recommend the following workflow.
● Select 'Graph' → 'Save Snapshot'
o In the 'Save Snapshot` dialog, click 'Copy to Clipboard'
o Open an external text editor program such as TextEdit or Notepad. Paste the clipboard contents of the graph snapshot, and save it to a file with a .json extension. You should be able to accomplish this with a default text editor on your computer by overriding the default file extension and manually entering ‘.json’.
o You may save your progress by saving the snapshot and loading them into Argo Lite to continue your work.
● To load a snapshot, choose 'Graph' → 'Open Snapshot'
● Select the graph snapshot you created.
NOTE: Q1.2 (d) will not be graded on Gradescope. We will give a qualitative score on the overall design and presentation of your graph visualization in Argo Lite.
e. Publish and Share your graph snapshot
● Name your graph: On the top navigation bar, click on the label ‘Untitled Graph’. In the ‘Rename Snapshot’ dialogue window that appears, enter your GTUsername as the ‘Snapshot Name’ and click ‘Done’
● Select 'Graph ' → 'Publish and Share Snapshot' → 'Share’
● Next, click 'Copy to Clipboard' to copy the generated URL
● Return the URL in the return_argo_lite_snapshot() function in submission.py
If you modify your graph after you publish and share a URL, you will need to re-publish and obtain a new URL of your latest graph.
NOTE: If this function returns a malformed or invalid snapshot URL, it will likely cause Gradescope to crash.
Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions
Consult our trusted tutors.