The client would like to know the number of vehicles in the sample after cleaning
Ask Expert

Be Prepared For The Toughest Questions

Practice Problems

The client would like to know the number of vehicles in the sample after cleaning

Part A

Objective:

The purpose of this project is to provide you with an opportunity to demonstrate an advanced level of synthesis, understanding and communication of the concepts, statistical methods and practical analyses within R that you have learnt throughout this course.

Please remember that STA8005 is a postgraduate level course which requires that students demonstrate an advanced level of knowledge, skills, reasoning and problem-solving. Also, this project is a significant assessment item worth 40% of your final grade. As such you should expect to find it challenging and expect to spend considerable time working on it. I encourage you to start as soon as possible. You do not need to have completed all of the course work and topics to make a start on becoming familiar with the data.

The Data:

A consultancy firm has asked you to explore some data about vehicles and address three specific aspects of interest (Tasks 1, 2 and 3 below) for their client, and then report your process and findings in a written report.

The data file vehicles.txt contains data for 12 variables from 400 vehicles. The variables relate to the size, fuel efficiency and price of the vehicles. Each of the 12 variables are defined below. Before beginning the Tasks, you may need to do some data cleaning due to missing data or outliers. All analysis for the following tasks should be based on your cleaned data set. For the purpose of this exercise assume that the data meets any required MVN assumptions.

Definition of 12 variables in vehicles.txt:

• Name: The vehicle make and model name

• retail: Suggested Retail Price, what the manufacturer thinks the vehicle is worth, including adequate profit for the automaker and the dealer (U.S. Dollars)

• cost: Dealer Cost (or "invoice price"), what the dealership pays the manufacturer (U.S. Dollars)

• engine_size: Engine Size (litres)

• cylinders: Number of Cylinders (4, 6 or 8)

• horsepower: Horsepower (ft-lb/s) (foot-pounds per second)

• city_mpg: City Miles Per Gallon

• highway_mpg: Highway Miles Per Gallon

• weight: Weight (Pounds)

• wheel_base: Wheel Base (inches)

• length: Length (inches)

• width: Width (inches)

Task 1: The client would like to know the number of vehicles in the sample after cleaning. They would also like to know the number of vehicles with 4, 6 or 8 cylinders recorded in the data and the mean and standard deviation of the retail price of each cylinder group.

Action: Clean the data as necessary and describe the changes you have made and the final structure of the data you will analyse. Provide a frequency table of the number of vehicles by cylinder group and describe. Find the mean and standard deviation by cylinder group. Interpret interesting aspects of this data summary.

They would also like to know what the relationships are between the engine_size based on the variables: retail, cylinders, horsepower, city_mpg and highway_mpg. Which engine sizes are most similar to each other and which are most different?

Action: First, create a new variable called engine_gr and recode the engine size variable so that:

Engine size <2 = engine_gr 1

Engine size >=2 & <3 = engine_gr 2

Engine size >=3 & <4 = engine_gr 3

Engine size >=4 & <5 = engine_gr 4

Engine size >=5 engine_gr 5

Provide a table showing the number of vehicles in each engine_gr level and comment. Perform, provide relevant output, and interpret a cluster analysis to show the multivariate relationships among engine sizes (engine_gr). Note: there are several ways you could perform the cluster analysis – be sure to explain what you tried and explain why you decided on your final choice.

Hint
ManagementStandard deviation: It is a statistic which measures the data set 's dispersion which is relative to its mean. It is calculated as the square root of variance by determining the each deviation of the data point that is relative to the mean. It basically calculates all the uncertainty as the risk, even if it is in the favor of the investor....

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

1
img

Submit Question

Post project within your desired price and deadline.

2
img

Tutor Is Assigned

A quality expert with the ability to solve your project will be assigned.

3
img

Receive Help

Check order history for updates. An email as a notification will be sent.

img
Unable to find what you’re looking for?

Consult our trusted tutors.

Developed by Versioning Solutions.