Question 4.
For this question, you will be using a CSV file with the filename bikes.csv. The CSV file contains 110 entries recording the number of users of the Capital Bikeshare bicycle-sharing system based in Washington D.C., USA, with 60 samples from the year 2011 and 50 from the year 2012. Each row includes the year, month and date of the record, and various weather features, as well as the number of casual users and the number of registered users on that day. For example, the first row of the data is:
This says that on January 8, 2011, the average temperature was 0◦C, the humidity was 54%, the windspeed was 18km/h, and there were 68 casual users and 891 registered users of the bicycles on that day.
(a) The number of casual users in each day is recorded in the casual column. Use R to compute an estimate for the mean number of casual bicycle users per day, and then compute a 95% confidence interval for the mean number of casual bicycle users per day. Give a brief summary of your answer that could be understood by someone with no background in statistics.
(b) The number of registered users in each day is recorded in the registered column. Use R to compute an estimate for the mean number of registered bicycle users per day, and then compute a 95% confidence interval for the mean number of registered bicycle users per day. Give a brief summary of your answer that could be understood by someone with no background in statistics.
In the remaining parts of this question, you will need to extract relevant rows from the data frame to address specific questions. In the following description, it is assumed that you have imported the dataset into a variable called bikes.
Assuming the data frame is stored in a variable called bikes, you can create additional data frames by requesting only the rows meeting certain conditions. For example, to create a data frame called bikes.2011 that contains only the records from the year 2011, you can use the following syntax:
bikes .2011 <- bikes [ bikes$year == 2011 , ]
Note that the comma at the end is essential (as it indicates that all columns should be taken).
You can use the nrow function to count the number of rows in a data frame. If you have done this successfully, then typing nrow(bikes.2011) should show that there are 60 records from the year 2011.
(c) Use R to compute a 95% confidence interval for the mean total number of bicycle users (casual plus registered) per day in the year 2011. Give a brief summary of your answer that could be understood by someone with no background in statistics. You will need to sum together the casual and registered users for each day before computing the interval. Remember that addition in R acts on vectors, and data frame columns are vectors.
(d) Use R to compute a 95% confidence interval for the mean total number of bicycle users (casual plus registered)
per day for days where the average temperature is 10◦C or more. Give a brief summary of your answer that
could be understood by someone with no background in statistics.
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.