### Description

This tool performs k-means clustering on a set of fields in a feature data source table (e.g. a point attribute table). K-means clustering is a statistical method of grouping data. The user specifies the number of groups and the fields that contain the relevant data, and the algorithm groups similar records into each of those groups. You can specify any number of fields to be used in the clustering process. Although this is not a spatial algorithm, you can supply spatial data to the algorithm. For instance, given a point data set the x and y coordinates of the points could be supplied to the algorithm and the result would be spatial groupings of locations.

It is a stochastic process so may not yield exactly the same results each time you run it unless there are clear and obvious groupings in your data that correspond to the number of groups you have identified. The stochasticity arises because the initial ‘centres’ of each of the clusters are generated randomly in n-dimensional space. The ‘iters’ parameter corresponds to the ‘iters.max’ parameter in the R kmeans command, and controls the maximum number of iterations that are attempted when searching for a better model.

The output comma delimited text file this command creates contains the summary of the statistical output of the tool: the number of records in each group, the within group sums of squares, and the coordinates of the centres of the clusters in n-dimensional space.

For further information on the R kmeans command, type ‘? kmeans’ at the R prompt, and press Enter.

This command is driven by R. Type ‘citation’ to see the suggested citation for R.

### Syntax

kmeans(in, k, flds, outfld, [iters], [outfile], [where]);

in | the input feature data source | |

k | the number of groups into which the data is partitioned | |

flds | a list of the numerical fields that the algorithm is based on | |

outfld | the output field that will contain the group membership ID numbers | |

[iters] | the maximum number of iterations allowed | |

[outfile] | if specified the cluster size and within group sums of squares are written to this delimited text file | |

[where] | the selection statement that will be applied to the feature data source to identify a subset of features to process (see full Help documentation for further details) |

### Example

kmeans(in=”C:dataplots.shp”, k=10, flds=c(“X”,”Y”,”NDVI”,”Slope”), outfld=”CLUSTER”);

kmeans(in=”C:datamypoints.shp”, k=100, flds=c(“X”,”Y”), outfld=”CLUSTER”, iters=100, outfile=”C:datakmeansdata.csv”);