Assignment - Dimensionality Reduction (assignment.ipynb)
This assignment is based on content discussed in module 6 and will work with the famous MNIST dataset, which is a set of images of handwritten digits https://en.wikipedia.org/wiki/MNIST_database.
The dataset has been provided to you in a .csv file. – (mnist_dataset.csv)
Learning outcomes
• Apply a Random Forest classification algorithm to MNIST dataset
• Perform dimensionality reduction of features using PCA and compare classification on the reduced dataset to that of original one
• Apply dimensionality reduction techniques: t-SNE and LLE
Question 1. Load the MNIST dataset and split it into a training set and a test set (take the first 60,000 instances for training, and the remaining 10,000 for testing).
Question 2. Train a Random Forest classifier on the dataset and time how long it takes, then evaluate the resulting model on the test set.
Question 3. Next, use PCA to reduce the dataset’s dimensionality, with an explained variance ratio of 95%. Train a new Random Forest classifier on the reduced dataset and see how long it takes. Was training much faster? Next evaluate the classifier on the test set: how does it compare to the previous classifier?
Question 4. Use t-SNE to reduce the MNIST dataset, show result graphically.
Question 5. Compare with other dimensionality methods: Locally Linear Embedding (LLE) or Multidimensional scaling (MDS).
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.