Detecting and Classifying Attacks in Computer Networks Using Feed-Forward and Elman Neural Networks V. Alarcon-Aquino1, J. A. Mejia-Sanchez1, R. Rosas-Romero1 J. F. Ramirez-Cruz2 1

Department of Electrical and Electronic Engineering Communication and Signal Processing Group, CENTIA Universidad de las Américas-Puebla 72820 Cholula, Puebla MEXICO [email protected] 2 Department of Computer Science Instituto Tecnologico de Apizaco Tlaxcala, MEXICO Abstract. In this paper, we present an approach for detecting and classifying attacks in computer networks by using neural networks. Specifically, a design of an intruder detection system is presented to protect the hypertext transfer protocol (HTTP). We propose the use of an application-based model using neural networks to model properly non-linear data. The benefit of this perspective is to work directly on the causes of an attack, which are determined directly by the commands used in the protected application. The intruder detection system is designed by defining three different neural networks, which include two multi-layer feed-forward networks and the Elman recurrent network. The results reported in this paper show that the Elman recurrent network achieved a performance around ninety percent of good detection, which demonstrates the reliability of the designed system to detect and classify attacks in highlevel network protocols. Keywords: Intrusion Detection, Neural networks, HTTP protocol.

1 Introduction With the explosive growth of computer networking and electronic commerce environments, security of networking systems has become very important [3], [5]. Currently, network anomaly and intrusion detection in wide area networks and ecommerce infrastructures is gaining practical importance (see e.g., [5], [8]). Detecting computer attacks thus poses significant problems of the global Internet so that the network intrusion detection (NID) area is devoted to detecting this activity [3], [5]. Several approaches have been proposed to solve the problem of network intrusion detection (see e.g., [3], [5], [8]-[11], [13]-[14]). The problem of NID may be solved from the statistical perspective as discussed in [5] or as discussed in [3] a security issue as the one we are facing may find a correct solution depending on whether we use a host-based model or a network-based model. For a host-based model, intrusion

detection systems (IDS) find their decisions on information obtained from a single or multiple host systems, while for a network-based model, IDS find their decisions by monitoring the traffic in the network to which the hosts are connected [15]. It has been shown that neural networks have special advantages in IDS such as the self-adaptive ability and the internal parallel computation (see e.g., [3], [9]-[11], [13][14]). Elman recurrent neural networks have been recently considered for network intrusion detection (see e.g., [9], [11], [14]). In the work reported in this paper we propose an intruder detection system to detect and classify correctly attacks in highlevel network protocols by using two feed-forward neural networks and the Elman recurrent neural network. The aim is to compare and prove the underlying fundamentals of our approach using these networks in network intrusion detection. Furthermore, a competitive transfer function to classify attacks is reported. We present the use of an application-based model using neural networks to model properly nonlinear data. The benefit of this perspective is to work directly on the causes of an attack, which are determined directly by the commands used in the protected application. The remainder of this paper is organised as follows. In Section 2, we present a description of intrusion detection techniques. Section 3 discusses the HTTP protocol. In Section 4, we present a brief overview of neural networks. In Section 5, we propose the IDS to detect and classify attacks in high-level network protocols by using neural networks. Performance evaluation of the intruder detection system is presented in Section 6. In Section 7, conclusions of this paper are reported.

2 Intrusion Detection Techniques In this section we present a brief description of intrusion detection techniques. There are two primary models for analyzing events to detect attacks [9], [15]: the misuse detection model and the anomaly detection model. The misuse detection model uses patterns of known systems attacks to match and identify attacks. In other words, the misuse detection approach bases its performance on comparing an Internet command with a database of signatures (or known attacks) in order to recognize whether the Internet requirement has an intrusive nature or normal behaviour. However, this solution is a static approach in the sense that an intrusive requirement that is not in the database is not recognized as an attack. This is due to the fact that the database is built based on past experience but not on future attacks [8]. Note that new variations of an attack may find a way inside the network system because there is no record of this type of attack in the database, making this type of system very vulnerable and dependent of database constant actualizations [9]. The anomaly detection model detects intrusions by searching abnormal network traffic (see e.g., [5]). That is, this model tries to determine if a deviation from the established normal usage patterns can be flagged as intrusion [15]. The anomaly detection approach, which is based on finding patterns on Internet data, determines whether the data have an intrusive nature or normal behaviour. With this type of characterization we may be able to decide if unknown data is an attack without having to know previous information about it [10].

3 Description of HTTP protocol According to the Spanish company S21SEC [12] the hypertext transfer protocol (HTTP) is the main and most used protocol for computer communications, and it is therefore the main source of attacks in the Internet. To design an intruder detection system to detect attacks on this protocol, it is thus necessary to assemble an amount of data corresponding to normal and intrusive behaviour. These data serve as training data and have to be characterized by the proposed system; afterwards the system may be able to classify the nature of new data. There are at least five categories in a HTTP requirement [7], [11]: NORMAL: This classification includes the normal behaviour of a system command meaning that no attack is involved. COMMAND INJECTION: This category includes all commands executed directly on the system due to vulnerabilities on data validation, basically executing shell codes written in machine language. SQL DATABASE ATTACK: This is another type of command injection attack, but this is executed on SQL databases. XSS (CROSS-SITE SCRIPTING): This includes all commands executed via HTML, JAVA, or JAVASCRIPT. PATH MODIFICATION: This is the path manipulation of a file or directory that provides privileges to the attacker. The intruder detection system reported in this paper should be able to detect and classify adequately the HTTP requirement in one of these five categories.

4 Feed-forward and Elman neural networks Neural networks may be used to work as a pattern extractor of an intruder detection system, which may characterize adequately the nature of any HTTP requirement. Figure 1 shows a single artificial neuron model [4]. The input vector { X 1 , X 2 , … , X p } is multiplied by a weight vector ω kj , j = 0,1, … , p and then summed to get a single scalar number υ k , which is then passed through a transfer function ϕ (⋅) that delivers the neuron’s output yk . Letting the input vector and the desired neuron’s output fixed, we only have to determine the weight vector that allows this neuron to deliver the desired output for the corresponding input vector. The artificial neuron can deliver any output value for any input vector as long as we can find the corresponding weight vector. This process is known as the learning process of a neuron [4]. An output value between one and zero can be obtained with the sigmoid transfer function. These values are then used to decide whether an input vector is an attack or normal behaviour. The input vector to the neural network system represents the HTTP requirement made by any user in the communication network.

Fig. 1. Neuron Model

4.1 Feed-forward neural networks It is necessary the use of more than one neuron to be able to correctly model a nonlinear data flow. A multi-layer feed-forward neural network consists of a number of neurons structured on several layers, that is, an input layer, hidden layers, and an output layer (see Fig. 2). The output layer may be a layer of one or more neurons, allowing the network to deliver one or more outputs for one or more inputs. As for a single neuron, the training process for a neural network consists in correctly finding the value of each weight vector corresponding to each neuron so that we can obtain the desired output for each combination of inputs. This process is accomplished by an algorithm known as Back-propagation [4].

Fig. 2. Multi-layer feed-forward neural network

4.1.1 Back-propagation The back-propagation algorithm changes weight vectors value in recursive propagation over the network from the output layer to the input layer and the main objective of this algorithm is to minimize the error from network’s output by modifying weight vectors. The back-propagation algorithm is based on a gradient descent algorithm where the main goal is to search over the error surface, corresponding to a neural network, for the combination of weights that allow the network to perform with minimum error. This search is accomplished by calculating the gradient of the error surface in which the negative of gradient’s direction indicates the direction where the surface decreases more rapidly, and the gradient’s magnitude indicates the amount of distance over which this direction is valid. With this algorithm we may be able to train the neural network to design the intruder detection system. However, there are important considerations we must know before using this algorithm. First, for a pattern classification problem, as the one we are facing, neurons work near to the limits of zero and one because we train them to deliver this kind of output to indicate whether a requirement is an attack or normal behaviour. The problem with the backpropagation algorithm is that it computes the gradient of the error surface based on partial derivation of the transfer function at each neuron. This partial derivative represents the change of slope for this transfer function; a sigmoidal transfer function has its minimum slope change at the output limits of one and zero [6]. Minimum slope change will decreases the magnitude of the gradient making the algorithm very slow for this kind of problem. The problem of working with a slow algorithm to train a neural network is that it can get stuck in a local minimum of the surface. If a local minimum is far to the global minimum the network performs poorly. 4.1.2 Faster training RESILIENT BACK PROPAGATION: The purpose of the resilient back-propagation training algorithm is to eliminate the harmful effects of the small magnitudes of the partial derivatives as discussed above. Only the sign of the derivative is used to determine the direction of the weight update. The magnitude of the derivative has no effect on the weight update. The size of the weight change is determined by a separate update value. The update value for each weight and bias is increased by a factor Δinc if the derivative of the performance function with respect to that weight has the same sign for two successive iterations. The update value is decreased by a factor Δdec if the derivative with respect that weight changes sign from the previous iteration. Whenever the weights are oscillating the weight change is reduced. If the weight continues to change in the same direction for several iterations, then the magnitude of the weight change is increased allowing faster training [1]. CONJUGATE GRADIENT: The basic back-propagation algorithm adjusts the weights in the gradient descent direction (negative of the gradient). This is the direction in which the performance function is decreasing more rapidly. It turns out that, although the function decreases more rapidly along the negative of the gradient, this does not necessarily produces the fastest convergence. In the conjugate gradient algorithms a search is performed along conjugate directions, which produces generally faster convergence than gradient descent directions. In the basic back-propagation algorithm the learning rate is used to determine the length of the weight update (step

size). In the conjugate gradient algorithm, the step size is adjusted at each iteration. A search is made along the conjugate gradient direction to determine the step size, which minimizes the performance function along that line [1]. 4.2. Elman recurrent neural network Another option to design the intruder detection system based on neural networks is the Elman recurrent neural network which consists of the same structure of a multilayer feed-forward network with the addition of a feedback loop which enables the network to find temporal training patterns. The Elman neural network is a recurrent network that connects the feedback loop from the output of the hidden layer to its input [2], [4]. This recurrent connection allows the Elman network to both detect and generate time-varying patterns. This neural network structure is used to obtain a different performance allowing us to take a design decision.

5 Detection of Attacks based on Neural Networks As mentioned previously, any HTTP requirement may fall in one of five categories which include normal and abnormal behaviours. The intruder detection system is designed to take an HTTP requirement and determine the correspondent category, which is accomplished by a MATLAB® script. For this purpose, the dataset is divided into two parts. The first part is seventy percent of the dataset corresponding to training data, and the second part is the remaining thirty percent corresponding to test data. To this point we count with 488 requirements corresponding to abnormal behaviour and 285 corresponding to normal behaviour [7]. 5.1. Data Pre-processing An example of a typical HTTP requirement is given by //nombre.exe?param1=..\..\archivo. Most of this string consists of filenames, parameters, and alphanumeric strings which normally changes from system to system. The most significant part of this string is the file extensions, and special characters. As a result, every alphanumeric string is replaced with the special character ‘@’ [7]. Thus, the example requirement shown before takes the following shape //@.exe?@=..\..\@. Now we have only the significant part of every requirement. The next step is to convert these characters to their corresponding ASCII decimal value. Once data is formatted to decimal, we need to consider that HTTP requirements do not have a fixed length; however, a neural network needs a fixed input length because the number of neurons in the input layer depends on this fixed length. The next step in data preprocessing is to fix the requirement’s length. This is accomplished by a sliding window approach; its main function is to convert a variable length vector in several fixed length vectors. The length is determined by a constant defined by the problem’s nature. The sliding window approach can be described as follows. Consider a decimal

vector with six elements. Now suppose that a neural network requires a fixed input length M equal to three. The sliding window approach delivers (N-M+1) vectors of fixed length M, where N is the length of the original vector. As a result, we have static length vectors which are able to work as input vectors for a neural network. In order to have a better network performance, these vectors are then converted to their binary form [7]. After the binary conversion, a binary matrix is obtained, which has to be converted to a single vector of length Mx8, where M is the fixed length defined by the sliding window. 5.2. Neural Network Architecture As explained previously, the length of the input vector is Mx8. In this case we have chosen M equal to eight; thus, for this problem an input length equal to 64 is obtained. The value of M was chosen to generate an adequate size for the dataset after using the sliding window approach. Therefore, the neural network has 64 neurons in the input layer, while the output layer is conformed by five output neurons corresponding to each of the five categories where a HTTP requirement may fall. The network is trained to output a 1 in the correct position of the output vector corresponding to the desired HTTP requirement and to fill the rest of the output vector with 0’s. Three neural networks are defined with 64 input neurons and five output neurons but with different number of hidden neurons and structure. The first network is a multi-layer feed-forward network (FF1) with two hidden layers with 15 neurons each (see Fig. 3a). The number of hidden layers and hidden neurons is selected by guesswork and experience. This network is trained with a resilient back-propagation algorithm and all neurons are sigmoid to enable the network to output a one or a zero as this is a pattern classification problem. This neural network is trained with 70% of the dataset achieving an error goal of .015 (see Fig. 3b).

(a)

(b)

Fig. 3. (a) Multilayer feed-forward neural network. (b) First feed-forward network training

The second network is a multi-layer feed-forward network (FF2) with two hidden layers with 20 neurons each. This network differs from the first one in the training algorithm, which in this case is a conjugate gradient algorithm. This neural network is also trained with 70% of the dataset achieving an error goal of .015 (see Fig. 4a). The

third network is the Elman recurrent network with two hidden layers with 30 neurons each. According to [2] the number of neurons used by an Elman network to face a problem is larger than what a multi-layer feed-forward network would use for the same problem. This is the reason why we choose 30 neurons for each hidden layer. A resilient back-propagation algorithm is used to train this network. Training results are shown in Fig. 4b.

(a)

(b)

Fig. 4. (a) Second feed-forward network training (b) Elman network training

6 Performance Evaluation The three neural networks are assessed with 30% of the dataset, delivering, for each training vector, an output which is passed through competitive MATLAB® transfer function. This ensures that the output corresponding to the most activated neuron is a value of one and the rest of the neurons are a value of zero [7] (see Fig. 5). Then we compared outputs with targets and counted errors. For this particular problem we have chosen two performance criteria for each network. The first criterion is identification percentage corresponding to the amount of outputs that exactly correspond to a desired target identifying correctly the category of any HTTP requirement. The second criterion is detection percentage corresponding to the amount of outputs that exactly detects a normal or an abnormal requirement, without taking care of the category in which an abnormal requirement may fall. The detection percentage should be larger than identification percentage because of the level of accuracy that each percentage uses. Table 1 shows the percentage obtained for each neural network as well as false positives and negatives. A false positive occurs when the system classifies an action as intrusion when it is a valid action, whereas a false negative occurs when an intrusion actually happens but the system allows is to pass as non-intrusive behaviour. It can be seen that the best performance is accomplished by the Elman network with 90% of good detection and 87% of correct identification. However, the other networks performed almost as well as the Elman. Once the network is correctly trained weight vectors are defined and we are able to work with this trained network as an intruder detection system for the HTTP protocol.

Fig. 5. Identification and detection.

Table 1. Performance of each neural network

Identification

FF1

FF2

ELMAN

84.85

84.42

87.01

Detection

88.74

89.18

90.91

False Positives

4.97%

5%

4.86%

False Negatives

0.97%

0.96%

0.94%

7 Conclusions This paper has presented an approach for detecting and classifying intrusions in highlevel network protocols by using neural networks. The results reported in this paper show that the best performance is accomplished by the Elman network with 90% of good detection and 87% of correct identification. Note that a better performance may be achieved if the network is trained with more training data keeping the neural network’s knowledge updated. The approach of anomaly detection has proven to be a good solution for this kind of problem, performing a 90% of good detection which is by far a better performance than what misuse detection can accomplish with new and unknown data. Future work will focus on investigating further enhancements using recurrent neural networks trained by real-time recurrent learning algorithms. Furthermore, a hardware implementation on an FPGA, which may work with a firewall for detecting intruders and protecting the system, is also considered.

References [1] M. Beale, and H. Demuth, Neural Network Toolbox, Math Works, Inc. Massachusetts, USA, (2003). [2] C. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, Nueva York, USA, (1995). [3] A. Bivens, C. Palagiri, R. Smith, B. Szymanski, and M. Embrechts, Network-Based Intrusion Detection Using Neural Networks, Intelligent Engineering Systems through Artificial Neural Networks, Proc. of ANNIE-2002, vol. 12, ASME Press, New York, (2002) pp. 579584. [4] S. Haykin S., Neural Networks: A Comprehensive Foundation, McMMillan, New York, (1994). [5] C. Manikopoulos, C. and S. Papavassiliou, Network Intrusion and Fault Detection: A Statistical Anomaly Approach, IEEE Communications Magazine, October (2002) pp. 76-82. [6] T. Masters, Practical Neural Network Recipes in C++, Academic Press, Inc. California, USA, (1993). [7] J. A. Mejia-Sanchez, Detección de Intrusos en Redes de Comunicaciones Utilizando Redes Neuronales, Department of Electrical and Electronic Engineering, Universidad de las Américas Puebla, Mexico, May (2004). [8] B. Mukherjee, L. T. Heberlein, and K. N. Levitt. Network Intrusion Detection, IEEE Network, May/June (1994). [9] J. P. Planquart, Application of Neural Networks to Intrusion Detection, SANS Institute, July (2001). [10] N. Pongratz, Application of Neural Networks to Recognize Computer Identity Hijacking, University of Wisconsin, (2001). [11] E. Torres, Immunologic System for intrusion detection at http protocol level, Department of Systems Engineering, Pontificia Universidad Javeriana, Colombia, May (2003). [12] S21SEC, http://www.s21sec.com [13] L. de Sa Silva, A. C. Ferrari dos Santos, J. D. S. Da Silva, A. Montes., A Neural Network Application for Attack Detection in Computer Networks, IEEE International Joint Conference on Neural Networks, Vol. 2, July (2004) pp. 1569-1574. [14] X. Jing-Sheng, S. Ji-Zhou, Z. Xu., Recurrent Network in Network Intrusion Detection System, IEEE International Conference on Machine Learning and Cybernetics, Vol. 5 August (2004) pp. 2676-2679. [15] Y. Bai, and H. Kobayashi, Intrusion Detection Systems: Technology and Development, IEEE International Conference on Advanced Information Networking and Application (AINA’03), (2003)

Lecture Notes in Computer Science

Abstract. In this paper, we present an approach for detecting and classifying attacks in computer networks by using neural networks. Specifically, a design of an intruder detection system is presented to protect the hypertext transfer protocol (HTTP). We propose the use of an application-based model using neural networks to ...

125KB Sizes 0 Downloads 259 Views

Recommend Documents

Lecture Notes in Computer Science
study aims to examine the effectiveness of alternative indicators based on wavelets, instead of some technical ..... In this paper, the energy, entropy and others of CJ(k), wavelet coefficients at level J, .... Max depth of initial individual program

Lecture Notes in Computer Science
... S and Geetha T V. Department of Computer Science and Engineering, .... concept than A. If the matching degree is unclassified then either concept A or B is.

Lecture Notes in Computer Science
tinct systems that are used within an enterprising organization. .... files and their networks of personal friends or associates, Meetup organizes local ..... ployed, and in a busy community any deleted pages will normally reappear if they are.

Lecture Notes in Computer Science
forecasting by means of Financial Genetic Programming (FGP), a genetic pro- ... Address for correspondence: Jin Li, CERCIA, School of Computer Science, The ...

Lecture Notes in Computer Science
This is about twice the data generated in 1999, given an increasing ... the very same pre-processing tools and data have been used by all of them. We chose.

The Standard Libraries (Lecture Notes in Computer ...
Book synopsis. Ada 2012 is the latest version of the international standard for the programming language Ada. It is designated. ISO/IEC 8652:2012 (E) and is a ...

Lecture Notes in Macroeconomics
Thus if the real interest rate is r, and the nominal interest rate is i, then the real interest rate r = i−π. ... M2 (M1+ savings accounts):$4.4 trillion. Remember that the ...

Lecture Notes in Applied Probability
B M S. There are 5 ways to fill the first position (i.e., Bill's mailbox), 4 ways to fill ..... cording to the “bullet” voting system, a voter must place 4 check marks on ...... 3.36 The Colorful LED Company manufacturers both green and red light

Lecture Notes in Mathematics
I spent the first years of my academic career at the Department of Mathe- matics at ... He is the one to get credit for introducing me to the field of graph complexes ... not 2-connected graphs along with yet another method for computing the.

Lecture Notes in Mathematics 1876
This field is the theory of sets, whose creator was Georg Cantor, . . . , this appears .... quixotic extremes as that of challenging the method of proof by reductio ad.

Inquisitive semantics lecture notes
Jun 25, 2012 - reformulated as a recursive definition of the set |ϕ|g of models over a domain. D in which ϕ is true relative to an assignment g. The inductive ...

Lecture Notes
1. CS theory. M a Compas 3-manifold. A page connetton of gange group G Gəvin)or SU(N). Sas - & + (AndA +š Anka A). G-O(N). SO(N) in this talk. - k integer .... T or Smains the same along row. 2) The # should s down the column. P P P P P spa, Az15)=

Computer Science 75 Fall 2009 Scribe Notes Lecture 4: October 5 ...
interoperable (languages, platforms, applications). ∗ e.g., between Java and Python or between ... markup language), which was developed in the mid-1980s. • To quote W3C: “XML isn't always the best ... it be on the web or mobile devices or in p

Computer Science E-1 Spring 2010 Scribe Notes Lecture 6: March 29 ...
Lecture 6: March 29, 2010. Andrew Sellergren. • One of the pieces of information we can see about computers connected to the same router is the MAC address. These MAC addresses are serial numbers that identify network cards in computers. On some ho

Computer Science E-1 Spring 2010 Scribe Notes Lecture 4: March 1 ...
time. 2 The Internet (3:00–100:00). 2.1 DNS (3:00–5:00, 12:00–20:00). • Recall from last week that when you type in a URL into your browser's address bar, one ...

Computer Science 75 Fall 2009 Scribe Notes Lecture 2: September 21 ...
Project 1 will task you with implementing an online ordering system for our beloved, but now defunct Three Aces Pizza. One of the challenging aspects of this project will be to develop a logical representation of the menu in XML format. The menu itse

Computer Science E-1 Spring 2010 Scribe Notes Lecture 8: April 12 ...
websites are secure prior to submitting personal information 2. do your homework to ensure ... Symantec. They rank the top 5 riskiest cities online as follows: Seattle, .... will become more clear when we dive into a real programming language in a fe

Computer Science 75 Fall 2009 Scribe Notes Lecture 8: November 9 ...
TextMarks, for example, is a service which assists in the process of sending and receiving SMS messages from the web. Most mobile carriers have ... the ability to search by location and to receive the data in RSS format. This meant there was a .....