PCA (Principle Component Analysis) is a dimensionality reduction technique
Ask Expert

Be Prepared For The Toughest Questions

Practice Problems

PCA (Principle Component Analysis) is a dimensionality reduction technique

Pass Task 8.1P: PCA dimensionality reduction

Task description:

PCA (Principle Component Analysis) is a dimensionality reduction technique that projects the data into a lower dimensional space. It can be used to reduce high dimensional data into 2 or 3 dimensions so that we can visualize and hopefully understand the data better.

In this task, you use PCA to reduce the dimensionality of a given dataset and visualize the data.

You are given:

• Breast cancer dataset which can be retrieved from:

from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()

detailed info available at: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html

• PCA(n_components=2)

• 3D plot settings: (Please refer to prac7 for 3D plot examples)

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(10, 8))

cmap = plt.cm.get_cmap("Spectral")

ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=10, azim=10)

ax.scatter(x,y,z, c=cancer.target, cmap=cmap)

• Other settings of your choice

You are asked to:

• use StandardScaler() to first fit and transform the cancer.data,

• apply PCA (n_components=2) to fit and transform the scaled cancer.data set

• print the scaled dataset shape and PCA transformed dataset shape for comparison

• create 2D plot with the first principal component as x axis and the second principal component as y axis

• set proper xlabel, ylabel for the 2D plot

• print the PCA component shape and component values

• create a 3D plot with the first 3 features (as x,y and z) of the scaled cancer.data set

• create a 3D plot with the first principal component as x axis and the second principal component as y axis, no value for z axis

• set proper title for the two 3D plots

Sample output as shown in the following figures are for demonstration purposes only. Yours might be different from the provided.


Hint
ComputerData mining is the process of finding the anomalies, patterns and correlations within the large data sets to predict the outcomes. Also, using the broad range of techniques, one could use this information to increase the revenues, cut costs, improve the customer relationships, and reduce the risks and more....

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

1
img

Submit Question

Post project within your desired price and deadline.

2
img

Tutor Is Assigned

A quality expert with the ability to solve your project will be assigned.

3
img

Receive Help

Check order history for updates. An email as a notification will be sent.

img
Unable to find what you’re looking for?

Consult our trusted tutors.

Developed by Versioning Solutions.