GA-FFNN: An Intelligent Classification Approach for Signature based IDS

Lokesh
9 min readNov 30, 2022

Abstract. Intrusion Detection System (IDS), a second line of defense mechanism plays a major role in safeguarding the network infrastructure from various threats imposed by the “Black hat” attackers. The ever advancing nature of cyber-attacks makes the design and development of an efficient IDS, a complex task. Hence, this paper presents an intelligent IDS based on Feed Forward Neural Network (FFNN) and Genetic Algorithm (GA) for parameter optimization and classifica

tion of malicious and normal data. The experiments of GA-FFNN were evaluated on NSL KDD dataset and the performance of the proposed algorithm has been validated with the performance metrics such as Classification accuracy, Detec tion rate and false alarm rate.

Keywords: Genetic Algorithm, Feed Forward Neural Network, Parameter optimization, IDS

1 Introduction

With the growth of “The internet” and “Computer networks”, the digital transformation has led to the massive generation of sensitive information over the network that might be affected when intrusions or vulnerabilities occur in the network [1]. The recent se curity incidents like, the nexus repository breach [2], ransomware [3], wannacry [4], password leak on yahoo [5] and data theft on adobe [6] insist the importance of pro tecting the sensitive information against intruders. Earlier, the traditional security measures like antivirus, access control, and firewall were used to protect the networks from various threats. However, these security mechanisms are obsolete due to the dy namic nature of intrusions and further, it has motivated the researchers to develop a robust security mechanism, Intrusion Detection system to fight against the ever advancing intrusions [7].

According to NIST, “Intrusion detection is defined as an automated process which identifies any suspicious activities that compromise the Confidentiality, Integrity, and Availability (CIA) of the computer or network resources”. Based on the detection methodologies, IDS are classified into two types: (i) Misuse detection and (ii) Anomaly detection. The former method detects the intrusions based on the predefined patterns and provides less false positives. However, it fails to identify the new anoma lies. Whereas, the latter mechanism identifies both known and unknown attacks. How ever, false positive rate is high [8]. Several researchers prefer misuse detection over anomaly detection to achieve high classification accuracy.

In general, Intrusion detection is identified as a classification problem that discrimi nates the “normal” and “malicious data”[9]. It has led the researchers to use the machine learning algorithms like Artificial Neural Network (ANN), K-Nearest Neighbor, Ran dom forest, etc. with IDS to achieve better classification accuracy and detection rate [10]. Among these, ANN was significant in designing an Intelligent IDS as it can handle the imbalanced or incomplete dataset. The major problem in existing ANN based IDS is the architecture of ANN is unstable due to the high dimensionality of the dataset which may trap at local minima [11]. To overcome this challenge, GA-FFNN IDS is proposed where the hyper parameters of FFNN (learning rate, number of hidden units, dropout and penalty) have been optimized using genetic algorithm to improve the sta bility of the ANN based IDS. The major contribution of this work are:

  1. The proposed GA-FFNN was designed to classify the normal and malicious data.
  2. Hyper parameters of FFNN were optimized with GA that avoids premature conver gence.

3. The effectiveness of the proposed algorithm has been evaluated with the benchmark IDS dataset, NSL-KDD and the performance has been validated with accuracy and detection rate.

related works

2 Materials and Methods:

2.1 Genetic Algorithm:

Genetic Algorithm is an adaptive, meta-heuristic optimization approach, inspired
from Darwin’s theory of evolution where stronger individuals are selected in competing

environment occurred in a biological process [19]. GA postulates that the potential so-
lution of a problem is an individual chromosome that can be expressed as set of param-
eters. GA guarantees the global optimal solution as it search over the large sample

space. The working behind Genetic Algorithm is described in Algorithm 1.
Algorithm 1: Genetic Algorithm
Procedure:
Step 1: Begin the algorithm by initializing random population
Step 2: At each step, GA uses the current individuals to generate the next population
Step 3: Compute fitness value
Step 4: Select the best individuals in the current population
Step 5: Apply cross over and mutation operations
Step 6: Replace the current population by crossover to create next generation
Step 7: Terminate the algorithm when stopping criteria is satisfied

2.2 FeedForward Neural Network:
FFNN is a deep learning model often called as Multilayer Perceptron (MLP). FFNN
architecture comprises input layer, hidden layers and output layer (Figure 1). In FFNN
architecture, each neuron in one layer is connected to all the neurons of the next layer.

It is a fully connected network which learns through supervised algorithms. FFNN operates with the ReLU (Rectified Linear Units) activation function in hidden layers [20].

X(i) = ReLU(Y(i)) = max(0, Y(i)) (1)

And the net function is termed as,

y(i) = b(i) + ∑j wt(ij).X(i) (2)

Where b(i) is the bias, wt(ij) represents the weight of the FFNN and i is the index neuron of each layer in the network.

Architecture of FFNN

Working of FFNN:

Step 1: Initialize the input as number of samples and number of features in the dataset and output as decision class.

Step 2: Initialize number of features in input layer and compute net function using eqn. (1)
Step 3: Initialize epoch=100 and error >=0.01
Step 4: Use ReLU as an activation function for the hidden neurons of the hidden
layer (Eqn.2)
Step 5: Compute error using,

Error = ∑m Xm,n log(X̂m,n) (3)

Where X̂,m, and n represents the predicted output, actual output, and number of data points respectively.

Step 6: If error is greater than or equal to 0.01, Update the weights of the network and repeat the iteration. (i.e. epoch=epoch+1)

Step 7: Else return decision class.

3 Proposed Methodology

Step 1: Initialize the number of features as input and the optimize the number of hidden neurons, learning rate (l), momentum(m), and dropout(d), number of epochs and batch size

Step 2: Initialize the maximum number of iterations, number of population and fitness = 0
Step 3: Optimize l, m, and d using Algorithm 1
Step 4: Compute the error using Eqn.(3)
Step 5: Calculate fitness = accuracy (best)

Step 6: Terminate the condition when optimal parameters obtained or maximum num-
ber of iterations reached.

Step 7: Based on the best fitness function, Update the position of the population.

Algorithm

4 Experimental Analysis and Discussions:

4.1 Experimental Setup

To carry out the experiments of GA-FFNN, NSL KDD dataset was used. The GA-
FFNN algorithm was implemented using python 3.4 in an INTEL® CoreTM i5 processor @2.40 GHz, 8 GB RAM running windows 10 operating system. Further, Weka tool was used for validation purposes. The entire set of experiments were divided into three phases,

(i) Data preprocessing,

(ii) Training and Testing and

(iii) Evaluate the performance of GA-FFNN based on classification accuracy.

4.2 Data Preprocessing:

NSL-KDD dataset:

Tavallaee et al proposed NSL-KDD, an improved version of KDD ’99 dataset to remove uncertainties in KDD-CUP [21]. As compared to KDD ’99 dataset, there are no duplicate records in the test and train sets. This dataset consists of approximately 1,074,992 single connection vectors, each of which contains a total of 41 features including basic features, Content related features, Time related traffic features, and Host based traffic features. It has attribute value types grouped by Nominal, Binary and Nu-
meric. From connection vectors, each can be categorized as either an attack or normal types.

An attack types may be classified as DoS, U2R, R2L, and Probe. Data mapping and data normalization technique were carried out as in our previous works [10].

4.3 Training and testing:

Subsequently, the entire dataset was partitioned into 80% for training (TrainNSL) and
20% for testing (TestNSL) respectively.

4.4 Evaluate the performance of GA-FFNN based on classification accuracy:

The proposed GA-FFNN was designed to classify whether the incoming network traffic pattern is malicious or normal. It has been evaluated and validated with the following

metrics: classification accuracy, Detection rate, and false alarm rate. The proposed GA- FFNN architecture was designed with one input layer, 2 hidden layers, and output layer.

“Adam” function was used to optimize the hidden layers. Figure 3 visualizes the classification accuracy of proposed GA-FFNN that outperforms than the existing classifiers like random forest, bayesnet, k-star, and BFFO-CNN. Table 2 compares the detection and false alarm rate of different classifiers where the proposed approach shows its dominance over the existing approaches.

Fig. 3. Classification Accuracy

5 Conclusions

This paper has presented Genetic Algorithm based Feedforward Neural Network for the parameter optimization of FFNN and also for the classification of malicious samples from normal samples. The NSL-KDD dataset has been used to evaluate the proposed GA-FFNN and the results were validated with classification accuracy, detection rate, and false alarm rate. From the extensive experiments, the proposed classification
approach, GA-FFNN has provided better accuracy than the existing approaches.

This work can be further extended for feature selection by varying the genetical operations to optimize the parameters of FFNN.

References

1. M. Raman, K. Kannan , S. Pal.: Rough set-hypergraph-based feature selection approach for
intrusion detection systems, Def. Sci. (2016) .

2. Kacy Zurkus (2019).: www.infosecurity-magazine.com/news/thousands-left-vulnerable-in-
nexus (accessed July 2019)

3. Armerding T (2018) The 18 biggest data breaches of the 21st century. https://www.csoon-
line.com/article/2130877/data-breach/the-biggest-data-breaches-of-the-21st-century.html.

Accessed July 2019
4. G. Swenson, Bolstering Government Cybersecurity Lessons Learned from WannaCry,

(2017).https://www.nist.gov/speech-testimony/bolstering-government-cybersecurity-les-
sons-learned-wannacry (accessed July, 2019)

5. Yahoo Password leak (2017).: https://www.cnet.com/news/massive-breach-leaks-773-mil-
lion-emails-21-million-passwords/ (Accessed July 2019)

6. Adobe breach (2013).: https://krebsonsecurity.com/tag/adobe-breach/ (Accessed on July
2019)
7. M.R. Gauthama Raman, K. Kirthivasan, V.S. Shankar Sriram.: Development of rough set –
hypergraph technique for key feature identification in intrusion detection systems, Comput.
Electr. Eng. 1–12, 2017
8. K. Scarfone, P. Mell Guide to Intrusion Detection and Prevention Systems (IDPS) NIST
Spec. Publ (2007)

9. Almseidin, Mohammad, et al. : Evaluation of machine learning algorithms for intrusion de-
tection system, 2017 IEEE 15th International Symposium on Intelligent Systems and Infor-
matics (SISY), IEEE, 2017.

10. Gauthama Raman MR, Somu N, Kirthivasan K, V. S. S. Sriram.: An efficient intrusion de-
tection system based on hypergraph — Genetic algorithm for parameter optimization and fea-
ture selection in support vector machine. Knowledge-Based Syst 134:1–12 (2017)

11. Beghdad, R.: Critical study of neural networks in detecting intrusions,Computers & security,
27(5–6), 168–175 (2008).

12. Shin, Yeonju, et al.: Development of NOx reduction system utilizing artificial neural net-
work (ANN) and genetic algorithm (GA), Journal of Cleaner Production (2019).

13. Xu, Feiyi, et al. “Training Feed-Forward Artificial Neural Networks with a Modified Arti-
ficial Bee Colony Algorithm.” Neurocomputing (2019).

14. Blum, Christian, and Krzysztof Socha. “Training feed-forward neural networks with ant col-
ony optimization: An application to pattern classification.” Fifth International Conference on Hybrid Intelligent Systems (HIS’05). IEEE, 2005.

15. Chiba, Zouhair, et al.: Intelligent Approach to Build a Deep Neural Network Based IDS for
Cloud Environment Using Combination of Machine Learning Algorithms, Computers &
Security (2019).
16. Vijayanand, R., D. Devaraj, and B. Kannapiran.: Intrusion detection system for wireless

mesh network using multiple support vector machine classifiers with genetic-algorithm-
based feature selection, Computers & Security 77, 304–314, 2018

17. Akashdeep, I. Manzoor, and N. Kumar.: A feature reduced intrusion detection system using
ANN classifier, Expert Syst. Appl., vol. 88, pp. 249–257, 2017.

18. M. R. G. Raman, N. Somu, K. Kirthivasan, and V. S. S. Sriram.: A Hypergraph and Arith-
metic Residue-based Probabilistic Neural Network for classification in Intrusion Detection Systems,” Neural Networks, vol. 92, pp. 89–97, 2017
19. L. Davis, Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, 1991 .

20. Engel J (1988) Teaching feed-forward neural networks by simulated annealing. Complex
Syst 2:641–648

21. Tavallaee M, Bagheri E, Lu W, Ghorbani AA.: A detailed analysis of the KDD CUP 99 dataset. In: IEEE Symposium on Computational Intelligence for Security and Defense Applica-
tions, CISDA, IEEE, pp 1–6, 2009

--

--

Lokesh

The College of Engineering, Guindy is a public engineering college in Chennai, India and is India's oldest technical institution, founded in 1794.