Perform a principal component analysis using SAS on the correlation matrix for the
Ask Expert

Be Prepared For The Toughest Questions

Practice Problems

Perform a principal component analysis using SAS on the correlation matrix for the

Question 1. PCA analysis with 5 plots

Answer the following from your SAS output (ensure to include your code and outputs and justifications)

i. Prepare the dataset for input for a PCA via SAS.

ii. Perform a principal component analysis using SAS on the correlation matrix for the p=9 variables. Show your full SAS code and output. Perform a PCA on the whole data set of molecules using SAS.

iii. Also perform the procedures to obtain the following 5 plots related to PROC PCA. Refer to Irene's SAS notes for Assignment 2 & Lab for PCA Week 8-9.pdf

• Scree plot

• Profile plot

• Component Pattern plots

• Score plots

• Loading Plots

Using the plots and SAS notes and your SAS outputs report and answer the following (justify your answers).

a) Report the eigenvalues and the eigenvectors.

b) What percentage of the total sample variation is accounted for by each of the first PC, 2nd PC to the ninth PC?

c) What percentage of the total sample variation is accounted for by the first PC to the ninth PC?

d) Write out the formulation for the PCs.

e) Interpret the PCs via eigen values.

f) Interpret the PCs using your component pattern profiles from SAS.

g) Can the data be effectively summarised in fewer than 9 dimensions? Justify your answer using BOTH relevant plots and eigenvalues.

Question 2: PCA with reduced k <p for plots

Choose the reduced dimensionality k < 9, you think appropriate for data reduction from 9 to k, based on your PCA findings in Question1. Justify your choice of k carefully.

a) Recreate the 5 plots related to PROC PCA for your given k.

b) Using the plots based on your reduced dimensionality k from part a) and outputs interpret the first to k PC’s via eigenvalues.

c) Using the plots based on your reduced dimensionality k from part a) and outputs interpret the first to k PC’s via the outputs (you choose the optimal k).

d) Which of the k PCs are skewed? Use your plots to answer this.

Question 3: DISCRIM ON 2 GROUPS OF MOLECULES

1. Prepare the dataset for input for a Discriminant analysis via SAS.

2. Generate the means, standard deviations and the variance-covariance matrix of the data for the violators.

3. Generate the means, standard deviations and the variance-covariance matrix of the data for the non-violators

4. Produce the correlation matrix and an associated scatterplot of the inputted data for the violators.

5. Produce the correlation matrix and an associated scatterplot of the inputted data for the nonviolators.

6. Using the SAS DISCRIM and your resultant outputs answer the following questions. Use priors "violators"=0.30 "non-violators"=0.70.

7. Is Σ1= Σ2 Justify your answer.

8. How is a molecule with X0T = (MW, LogP, LogD, Hdonors, Hacceptors, PSA, ROT, NATOM, NRING) = (445.429, -2.7, -3.28938, 8, 12, 207.27, 9, 55, 3) allocated? i.e. allocates it to either the violators or the non-violators group.

9. Write down the resultant confusion matrix.

Question 4: STEPWISE DISCRIM ON 4 GROUPS OF MOLECULES

STEPWISE DICRIM using oral by violatory status groups defined below.

1. For Question 4 you will need to create the following variable i.e. an interaction term between oral status and score 9_ Log D violation status at 4 levels as defined below:


2. Crosstabulate in SAS or otherwise oral by violatory status for the whole group. How many molecules in each of these 4 levels? Create a table or histogram.

3. Run a STEPWISE DISCRIM analysis using the above 4 level grouping variable.

4. Which variables best discriminate the 4 oral by violatory groups/classes? See notes on STEPDISC below and extra SAS notes (Week 10).

5. Write a clear description of your conclusions include the SAS code and outputs.

final-assignment-2-math

Hint
Computer Principal component analysis (PCA) is a tool for analyzing quantitative data. It is helpful when one is analyzing diverse data sets with many variables for a given sample. New variables (principal components) are computed and combined with the original variables hence maximizing the data variance....

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

1
img

Submit Question

Post project within your desired price and deadline.

2
img

Tutor Is Assigned

A quality expert with the ability to solve your project will be assigned.

3
img

Receive Help

Check order history for updates. An email as a notification will be sent.

img
Unable to find what you’re looking for?

Consult our trusted tutors.

Developed by Versioning Solutions.