2. (Problem 1.45 in KNN)
The primary objective of the Study on the Efficacy of Nosocomial Infection Control (SENIC Project) was to determine whether infection surveillance and control programs have reduced the rates of nosocomial (hospital-acquired) infection in United States hospitals. The data set in SENIC.txt consists of a random sample of 113 hospitals selected from the original 338 hospitals surveyed. Each line of the data set has an identification number ID and provides information on 11 other variables for a single hospital.
The average length of a stay in a hospital (Stay) is anticipated to be related to infection risk Risk, available facilities and services AFS, and routine chest X-ray ratio Xray. (See Appendix C.1 for details on these and the other variables included in the data set.)
(a) Obtain scatterplots of average length of stay against each of the three predictor variables, and overlay lowess smoothers. Does a linear mean function seem plausible in each case? Explain.
(b) Obtain the least squares estimate for the linear regression of average length of stay on each of the three predictor variables, and overlay the least squares lines on your scatterplots. Does the simple linear regression model seem plausible in each case? Explain.
(c) Calculate MSE for each of the three linear regression fits. Which predictor variable leads to the smallest variability around the fitted regression line? Was this result apparent from your plots in parts (a) and (b)? Explain.
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.