You are to analyze campaign contributions to the 2016 U.S. presidential primary
Ask Expert

Be Prepared For The Toughest Questions

Practice Problems

You are to analyze campaign contributions to the 2016 U.S. presidential primary

Dataset

You are to analyze campaign contributions to the 2016 U.S. presidential primary races made in California. Use the csv file attached . You should download and save this file in the same folder as this notebook is stored.

General Guidelines:

· This is a real dataset and so it may contain errors and other pecularities to work through

· This dataset is ~218mb, which will take some time to load (and probably won't load in Google Sheets or Excel)

· If you make assumptions, annotate them in your responses

· While there is one code/markdown cell positioned after each question as a placeholder, some of your code/responses may require multiple cells

· Double-click the markdown cells that say YOUR ANSWER HERE to enter your written answers. If you need more cells for your written answers, make them markdown cells (rather than code cells)

Setup

Run the two cells below.

The first cell will load the data into a pandas data frame namedcontrib. Note that a custom date parser is defined to speed up loading. If Python were to guess the date format, it would take even longer to load.

The second cell subsets the data frame to focus on just the primary period through May 2016. Otherwise, we would see general election donations which would make it harder to draw conclusions about the primaries.

import pandas as pd

import matplotlib.pyplot as plt

import datetime

# These commands below set some options for pandas and to have matplotlib show the charts in the notebook

pd.set_option('display.max_rows', 1000)

pd.options.display.float_format = '{:,.2f}'.format

%matplotlib inline

# Define a date parser to pass to read_csv

d = lambda x: pd.datetime.strptime(x, '%d-%b-%y')

# Load the data

# We have this defaulted to the folder OUTSIDE of your repo - please change it as needed

contrib = pd.read_csv('../../P XXXXXXXXXXCA.csv', index_col=False, parse_dates=['contb_receipt_dt'], date_parser=d)

print(contrib.shape)

# Note - for now, it is okay to ignore the warning about mixed types.

# Subset data to primary period 

contrib = contrib.copy()[contrib['contb_receipt_dt']  datetime.datetime(2016, 5, 31)]

print(contrib.shape)

1. Data Exploration

1a. First, take a preliminary look at the data.

· Print the shape of the data. What does this tell you about the number of variables and rows you have?

· Print a list of column names.

· Review the documentation for this data (link above). Do you have all of the columns you expect to have?

· Sometimes variable names are not clear unless we read the documentation. In your own words, based on the documentation, what information does the election_tpvariable contain?

1b. Print the first 5 rows from the dataset to manually check some of the data.

This is a good idea to ensure the data loaded and the columns parsed correctly!

1c. Pick three variables from the dataset above and run some quick sanity checks.

When working with a new dataset, it is important to explore and sanity check your variables. For example, you may want to examine the maximum and minimum values, a frequency count, or something else. Use the three markdown cells below to explain if your three chosen variables "pass" your sanity checks or if you have concerns about the integrity of your data and why.

1c YOUR RESPONSE HERE

1d. Plotting a histogram

Make a very nice histogram and professional of one of the variables you picked above. What are some insights that you can see from this histogram? Remember to include on your histogram:

· Include a title

· Include axis labels

· The correct number of bins to see the breakout of values

· Hint: For some variables the range of values is very large. To do a better exploration, make the initial histogram the full range and then you can make a smaller histogram 'zoomed' in on a discreet range.

2. Exploring Campaign Contributions

Let's investigate the donations to the candidates.

2a. Present a table that shows the number of donations to each candidate sorted by number of donations.

· When presenting data as a table, it is often best to sort the data in a meaningful way. This makes it easier for your reader to examine what you've done and to glean insights. From now on, all tables that you present in this assignment (and course) should be sorted.

· Hint: Use thegroupbymethod. Groupby is explained in Unit 13: async 13.3 & 13.5

· Hint: Use thesort_valuesmethod to sort the data so that candidates with the largest number of donations appear on top.

Which candidate received the largest number of contributions (variable 'contb_receipt_amt')?

2b. Now, present a table that shows the total value of donations to each candidate. sorted by total value of the donations

Which candidate raised the most money in California?

2c. Combine the tables (sorted by either a or b above).

· Looking at the two tables you presented above - if those tables are Series convert them to DataFrames.

· Rename the variable (column) names to accurately describe what is presented.

· Merge together your tables to show the count and the value of donations to each candidate in one table.

· Hint: Use the merge method.

2d. Calculate and add a new variable to the table from 2c that shows the average \$ per donation. Print this table sorted by the average donation

2e. Plotting a Bar Chart

Make a single bar chart that shows two different bars per candidate with one bar as the total value of the donations and the other as average $ per donation.

· Show the Candidates Name on the x-axis

· Show the amount on the y-axis

· Include a title

· Include axis labels

· Hint: Make the y-axis a log-scale to show both numbers! (matplotlib docs:https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.yscale.html)

2f. Comment on the results of your data analysis in a short paragraph.

· There are several interesting conclusions you can draw from the table you have created.

· What have you learned about campaign contributions in California?

· We are looking for data insights here rather than comments on the code!

3. Exploring Donor Occupations

Above in part 2, we saw that some simple data analysis can give us insights into the campaigns of our candidates. Now let's quickly look to see whatkindof person is donating to each campaign using thecontbr_occupationvariable.

3a. Show the top 5 occupations of individuals that contributed to Hillary Clinton.

· Subset your data to create a dataframe with only donations for Hillary Clinton.

· Then use thevalue_countsandheadmethods to present the top 5 occupations (contbr_occupation) for her donors.

· Note: we are just interested in the count of donations, not the value of those donations.In[]:

3b. Write a function calledget_donors.

Imagine that you want to do the previous operation on several candidates. To keep your work neat, you want to take the work you did on the Clinton-subset and wrap it in a function that you can apply to other subsets of the data.

· The function should take a DataFrame as a parameter, and return a Series containing the counts for the top 5 occupations contained in that DataFrame.

In[]:

def get_donors(df):

"""This function takes a dataframe that contains a variable named contbr_occupation.

It outputs a Series containing the counts for the 5 most common values of that

variable."""

3c. Now run theget_donorsfunction on subsets of the dataframe corresponding to three candidates. Show each of the three candidates below.

· Hillary Clinton

· Bernie Sanders

· Donald Trump

In[]:

# 3c YOUR CODE HERE

3d. Finally, usegroupbyto separate the entire dataset by candidate.

· Call .apply(get_donors) on your groupby object, which will apply the function you wrote to each subset of your data.

· Look at your output and marvel at what pandas can do in just one line!

3e. Comment on your data insights & findings in a short paragraph.

3f. Think about your findings in section 3 vs. your findings in section 2 of this assignment.

Do you have any new data insights into the results you saw in section 2 now that you see the top occupations for each candidate?

4. Plotting Data

There is an important element that we have not yet explored in this dataset - time.

4a. Present a single line chart with the following elements.

· Show the date on the x-axis

· Show the contribution amount on the y-axis

· Include a title

· Include axis labels

4b. Make a better time-series line chart

This chart is messy and it is hard to gain insights from it. Improve the chart from 4a so that your new chart shows a specific insight. In the spot provided, write the insight(s) that can be gained from this new time-series line chart.

Data

Hint
Computer"The merge method :a. It finds an entity instance by id. It is then taken from the passed object which is either an existing entity instance from the persistence context is retrieved, or a new instance loaded from the database.b. It from the passed object, copies fields.c. It also returns newly updated instance.Also, this Merge method doesn't care about states. It just returns a persisted ...

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

1
img

Submit Question

Post project within your desired price and deadline.

2
img

Tutor Is Assigned

A quality expert with the ability to solve your project will be assigned.

3
img

Receive Help

Check order history for updates. An email as a notification will be sent.

img
Unable to find what you’re looking for?

Consult our trusted tutors.

Developed by Versioning Solutions.