## Original link:http://tecdat.cn/?p=22838

## Problems in using iris dataset: R

(a) Part: K-means clustering

The data were clustered into two groups by K-means clustering method.

Draw a graph to show the clustering

The data were clustered into three groups by K-means clustering method.

Draw a graph to show the clustering

(b) Part: hierarchical clustering

The observed values were clustered by full connection method.

The observations were clustered using average and single join.

Draw the tree view of the above clustering methods.

## Q01: use the iris dataset established in R.

(a) : K-means clustering

Discuss and / or consider standardizing the data.

```
data.frame(
"Average" = apply (iris \ [, 1:4 \], 2, mean
"Standard deviation" = apply (iris \ [, 1:4 \], 2, sd)
```

In this case, we will standardize the data because the width of the petal is much smaller than all other measurements.

## The data were clustered into two groups by K-means clustering method

Using a large enough nstart, it is easier to get the model corresponding to the minimum RSS value.

`kmean(iris, nstart = 100)`

## Draw a graph to show the clustering

```
# Draw data
plot(iris, y = Sepal.Length, x = Sepal.Width)
```

In order to better consider the length and width of petals, it is more appropriate to use PCA to reduce the dimension first.

```
# Create model
PCA.mod<- PCA(x = iris)
#Put the predicted group last
PCA$Pred <-Pred
#Draw a chart
plot(PC, y = PC1, x = PC2, col = Pred)
```

In order to better explain the PCA diagram, the variance of principal components is considered.

```
## Look at the variance explained by the main components
for (i in 1:nrow) {
pca\[\["PC"\]\]\[i\] <- paste("PC", i)
}
```

`plot(data = pca,x = Principal components, y = Variance ratio, group = 1)`

80% of the variance in the data is explained by the first two principal components, so this is a very good data visualization.

## The data were clustered into three groups by K-means clustering method

In the previous principal component diagram, clustering looks very obvious, because in fact, we know that there should be three groups, and we can execute the model of three clusters.

```
kmean(input, centers = 3, nstart = 100)
# Production data
groupPred %>% print()
```

## Draw a graph to show the clustering

```
# Draw data
Plot (sepal length, sepal width, col = pred)
```

## PCA diagram

In order to better consider the length and width of petals, it is more appropriate to use PCA to reduce the dimension first.

```
#Create model
prcomp(x = iris)
#Put the predicted group last
Pcadf $kmeans forecast<- Pred
#Draw a chart
plot(PCA, y = PC1, x = PC2,col = " Predict \ \ ncluster ", caption = " For the first two principal components of iris data, the ellipse represents 90% normal confidence, and the k-means algorithm is used to predict the two classes ") +
```

## PCA hyperbola

Sepal length sepal width graph has reasonable separation. In order to select which variables to use on X and y, we can use hyperbolic graph.

`biplot(PCA)`

This hyperbolic chart shows that petal length and sepal width can explain most of the differences in the data. The more appropriate chart is:

`plot(iris, col = Km (forecast)`

Evaluate all possible combinations.

```
iris %>%
pivot_longer() %>%
plot(col = Km forecast, facet\_ grid(name ~ ., scales = ' free\_ y', space = ' free_ y', ) +
```

# hierarchical clustering

## The observed values are clustered by full connection method.

The observations can be clustered using the full connection method (pay attention to the standardization of the data).

`hclust(dst, method = 'complete')`

## The observations were clustered using average and single join.

```
hclust(dst, method = 'average')
hclust(dst, method = 'single')
```

## Draw prediction chart

Now that the model has been established, the tree view is divided by specifying the required number of groups.

```
# data
Iris $kmeans forecast<- groupPred
# Draw data
plot(iris,col = Kmeans forecast)
```

## Draw the tree view of the above clustering methods

Shade the tree view.

```
type<- C ("average", " All ", " Single ")
for (hc in models) plot(hc, cex = 0.3)
```

Most popular insights

1.**R language K-shape algorithm stock price time series clustering**

2.**Comparison of different types of clustering methods in R language**

4.**Hierarchical clustering of iris data set in R language**

5.**Python Monte Carlo K-means clustering practice**

6.**Web comment text mining and clustering with R**

7.**Python for NLP: multi label text LSTM neural network using keras**

9.**Deep learning image classification of small data set based on keras in R language**