Høgskolen i Gjøvik

HiG / IMT / emnesider / IMT4901 / recent / is2007 / Thesis2007 / Storløkken, Roger

Storløkken, Roger

Storløkken, Roger

Labelling clusters in an anomaly based IDS by means of cluster quality indexes

A major problem in anomaly detection systems based on clustering is to determine the nature of the obtained clusters. Clustering algorithms used to group the activity data into clusters, do not have the knowledge needed to determine whether the content of the clusters are benign or malicious. We therefore need a labelling algorithm to properly label the obtained clusters. A classical approach for labelling clusters is to measure the cardinality of the clusters, and label some percentage of the smallest clusters as malicious. This approach does, however, have some limitations, and does not detect massive Denial-of-Service attacks properly.

Another approach for labelling clusters, which solves the Denial-of-Service limitation, is to combine clustering quality indexes with different clustering parameters, e.g. the cluster diameter, to extract some characteristics from the clusters. Based on these cluster characteristics, labelling algorithms are developed to properly label the content of the clusters. The main idea behind this approach is that clustering evaluation techniques can indicate the existence of a massive Denial-of-Service attack and, if a cluster is very compact, this may indicate that the cluster is an attack cluster [7, 8, 9]. Only a few of the clustering quality indexes have previously been used in this labelling approach. Other indexes, in combination with different cluster parameters, might give better performance of the IDS.

Accuracy and efficiency are very important performance measures for an IDS. High accuracy is necessary for giving valuable information to the IDS analyst monitoring the systems. Too high false positive rates (The rate of (false) alarms triggered by benign activities) will leave the analyst frustrated, and then important alarms may be ignored. It is also important that an IDS works in real-time, or as close to real-time as possible. Real-time operation is necessary to be able to take countermeasures against attacks in progress, before they can do much harm.

Research questions

  1. Which clustering quality index is best suited for labelling activity clusters in a clustering based intrusion detection system, regarding accuracy and efficiency?
  2. Which combinations of clustering parameters and/or methods, and the clustering quality indexes, give the best performance of the IDS?
19.11.2007