In machine learning, clustering is used for analyzing and grouping data

Be Prepared For The Toughest Questions

Practice Problems

Pass Task 7.1P: K-Means and Hierarchical Clustering

Task description:

In machine learning, clustering is used for analyzing and grouping data which does not include pre-labelled class or even a class attribute at all. K-Means clustering and hierarchical clustering are all unsupervised learning algorithms.

K- means is a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. It is a division of objects into clusters such that each object is in exactly one cluster, not several.

In Hierarchical clustering, clusters have a tree like structure or a parent child relationship. Here, the two most similar clusters are combined together and continue to combine until all objects are in the same cluster.

In this task, you use K-Means and Agglomerative Hierarchical algorithms to cluster a synthetic dataset and compare their difference.

You are given:

• np.random.seed(0)

• make_blobs class with input:

o n_samples: 200

o centers: [3,2], [6, 4], [10, 5]

o cluster_std: 0.9

• KMeans() function with setting: init = "k-means++", n_clusters = 3, n_init = 12

• AgglomerativeClustering() function with setting: n_clusters = 3, linkage = 'average'

• Other settings of your choice

You are asked to:

• plot your created dataset

• plot the two clustering models for your created dataset

• set the K-Mean plot with title “KMeans”

• set the Agglomerative Hierarchical plot with title “Agglomerative Hierarchical”

• calculate distance matrix for Agglomerative Clustering using the input feature matrix (linkage = complete)

• display dendrogram

Sample output as shown in the following figure is for demonstration purposes only. Yours might be different from the provided.

Hint

ComputerK-Means Clustering: It is an unsupervised learning algorithm used to solve the problems of clustering. It is an algorithm i.e., unsupervised learning algorithm, that groups the unlabeled dataset into different clusters. In this, K basically explains the number of pre-defined clusters which are required to be created in the process. For example, as if K=2, there are going to 2 clusters, and...

Select Deadline for Completion

4 Days

3 Days

2 Days

1 Day

1 to 15 Hours

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

Unable to find what you’re looking for?

Consult our trusted tutors.

Ask a Question

Be Prepared For The Toughest Questions

Practice Problems

Related questions

Know the process

Submit Question

Tutor Is Assigned

Receive Help

Unable to find what you’re looking for?