k-mean Clustering and its real usecase in the security domain.

The k-mean clustering is one of the oldest and most commonly used clustering Algorithms. The simplicity of its implementation is great starting point for new ML enthusiasts

3 min readJul 21, 2021

What is Clustering?

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.

What is k-means?

The goal of the k-means algorithm is to find groups in the data, with the number of groups represented by the variable k. The algorithm works iteratively to assign each data point to one of k groups based on the features that are provided.

The outputs of executing a k-means on a dataset are:

k centroids: Centroids for each of the k clusters identified from the dataset.
Complete dataset labeled to ensure each data point is assigned to one of the clusters.

Where can I apply k-means?

k-means can typically be applied to data that has a smaller number of dimensions, is numeric, and is continuous. Think of a scenario in which you want to make groups of similar things from a randomly distributed collection of things; k-means is very suitable for such scenarios.

Some of usecases in the Security domain.

Insurance fraud detection

Machine learning has a critical role to play in fraud detection and has numerous applications in automobile, healthcare, and insurance fraud detection. Utilizing past historical data on fraudulent claims, it is possible to isolate new claims based on its proximity to clusters that indicate fraudulent patterns. Since insurance fraud can potentially have a multi-million dollar impact on a company, the ability to detect frauds is crucial. on using clustering in automobile insurance to detect frauds.

Cyber-Profiling criminals

Cyber-profiling is the process of collecting data from individuals and groups to identify significant co-relations. The idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene. Here is the white paper on how to cyber-profile users in an academic environment based on user data preferences.

Open for any queries and suggestions.