How can common properties of attacks be extracted from a large and ever growing set of attacks against computers networks?When benchmarking Intrusion Detection Systems a data set is required for testing a systems ability to detect attacks. There is a need to generate test data sets for intrusion detection systems benchmarking, which are based on real data from an academic network. We will use a method called 'sequence of events' will be used to determine which features of the traffic are relevant for intrusion detection and because of that, should be included in the data set. Sequence of events can be defined as data describing the behavior of users or systems. These sequences can be collected as 'episodes' which will be used to produce the data set. Using a method like event sequencing new properties of an attack can be found.
With the growing reliability corporations and people have on the internet. A way to protect them selfs from the threats this medium poses have arisen. One of the newer technologies that have come along is Intrusion Detection System, either for personal computers or networks. They try to detect attacks against hosts that pose a threat for the system. The need for testing and benchmarking these systems are done through the use of data sets which contains attack signatures. Only a few of these data sets are openly shared between the researchers\cite{kddcup} and this makes it hard to develop new systems along with testing them against other systems working on different ideas.
When testing IDS system there are two main possibilities, use a live network and run the attack signatures to test the system. Another is to setup a test lab network to run the tests on. Both have their positive and negative sides. Running on a test lab makes sure one have 100\% control over what attacks the system is exposed to, on the other hand this will only test the system in an artifical senario where it will be hard to test for false positives of the system.
Event sequencing is based on occuranse of events and their order. Then finding some kind of pattern or relation between the events. When a sequence of event are found that occur relative close to each other and dependent on other event in the sequence, these are referred to as episodes. By mapping attacks to sequence of events and trying to determine what features are important to the attack, these sequences can be put together as 'episodes' and be used in a data set for benchmarking IDS.
A more formal definition on terms used:
Intrusion detection,
is the process of monitoring the events occurring in a computer system or network,
analyzing them for signs of security problems.
A threat,
in a communication network is a potential event or series of events that could result
in the violation of one or more security goals.
An attack,
is the actual implementation of a threat.
Sequence of events,
describing the behavior and actions of users or systems.
Frequent episodes,
is a collection of events that occur relatively close to each other in a given partial order.
Benchmarking and testing IDS is a difficult and complex task. It's normally done with data set which contains data from a real network and has some attacks embedded to check if the IDS can detect these. The problem is most of these data set are not publicly available for researchers, and makes it difficult to test new IDS against others. We'll take a look at what is important features from an attack for Intrusion Detections System. We will focus on analyzing traffic by using sequence of events. This will be done on a live academic network, to keep from creating artificial/simulated data sets. The challenge will be to identify what features belongs to an attack and should be put into the data set. Once a set of events has been identified, putting these together to 'episodes' will become the next challenge, before integrating these into the test data set.
Testing and benchmarking IDS using simulated network traffic has received criticism from several papers\cite{marc, nist}. With the limited availability of data sets for researchers to work on \cite{kddcup}, testing new IDS implementation/ideas up against current one's are hard without any good data sets to work with. The current data sets available are also getting quite old, and might give an inaccurate picture of the network activity in todays network. By creating a new data set for researchers and manufactures, both may benefit as they can get a common reference data set to work with. Basing the data set on a scientific method like event sequences opens up for more theoretical research later.
The research questions we are will take a closer look at, are the following: