Topic covered by this thesis Benchmarking Intrusion Detection Systems, IDS, is needed for determine how good a system is, and to test different systems against each other. For testing these systems data sets are provided, which contains different attack profiles amongst benign traffic. Different methods have been proposed to test these systems, by either simulating network traffic or extracting data sets based on real network flow. Both measures have advantages and disadvantages. There has been some critique of using simulated network flow, since it is not certain that it accurately represents the real world. The main advantage of using simulated traffic is that we know all aspects of the environment and no unknown attacks can occur, it eliminates the problem of identifying unseen attacks when using real traffic. Sometimes it is favorable to use data that are as close to the real world as possible for generating benchmarking data sets. This master thesis will look into common attack features gathered from a real network environment, as well as what factors are relevant for generating IDS data sets. Detection of common attack features will be based on event sequencing.
As mention above, there is a need to generate test data sets for IDS benchmarking, which are based on real data and can be shared openly between organizations. In this thesis we will look into what features of traffic are relevant for Intrusion Detection Systems, based on gathered traffic from an academic network. The challenge is to analyze network logs/traffic and try to extract different characteristics that constitute an attack. These data will be analyzed and processed by a method called sequencing of events to find the appropriate features that should be included in a benchmarking data set. Sequences of events can be defined as data describing behavior and actions of users or systems. The master thesis will try to collect these sequences and categorize them into either frequent or generalized episodes. These episodes will then be used in a methodology intended to produce data sets for IDS benchmarking.