IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 80- 88
International Journal of Research in Information Technology (IJRIT) www.ijrit.com
ISSN 2001-5569
String Pattern Matching For High Speed in NIDS Mr. B.VARUNKUMAR, Mrs.S.V.SURYAKALA M.Tech Student, Department of Electronics and Communication Engineering SRM University Chennai, India Asst.Professor, Department of electronics and communication engineering SRM University Chennai, India
[email protected] ,
[email protected]
Abstract — This paper is based on Network Intrusion Detection System (NIDS) using FPGA. Now days the system is highly affected through the malicious patterns, so it demands exceptionally high performance. To improve the performance hardware components are required. In NIDS pattern matching demands exceptionally high performance to match the patterns. The work has done in this paper is to increase the throughput and the speed of the system and the memory required for area has to reduced. In this the string pattern matching has to be done in ASCII based bits and BIT based bits using Aho-Corasick algorithm and Short pattern matching algorithm. The speed of the ASCII based bits is 160.21MHZ where as in BIT based is 360.01MHZ.Our implementation will be having the throughput of 1Gbps. The malicious patterns can be detected in this performance and the power consumption will be low, for pattern matching Finite State Machine(FSM) is used to change the sequential states. Keywords: NIDS, FSM, AHO-CORASICK algorithm, Frequency, Power Consumption.
1.INTRODUCTION Pattern matching for network security and intrusion detection demands exceptionally high performance. Much work has been done in this field and yet efficient, flexible, and powerful systems still have significant room for improvement. Methods commonly used to protect against security breaches include firewalls with filtering mechanisms to screen out obviously dangerous packets, and intrusion detection systems which use much more sophisticated rules and pattern matching to sense potential malicious packets. These techniques require significant computational resources, and, using highly-parallel flexible fabrics such as FPGA, provide opportunities for dramatic improvements. The power of the internet has grown explosively to a giant open network. Internet attacks require little efforts and monetary investment to create, are difficult to trace, and can be launched from virtually anywhere in the world. Therefore, computer networks are constantly assailed by attacks and scams, ranging from nuisance hacking to more nefarious probes and attacks. The most commonly used network protection systems are firewall and Network Intrusion Detection System (NIDS). They are critical net-work security tools that help protect high-speed computer networks from malicious users. Firewall and NIDS are installed at the border of a network to inspect and monitor the incoming and outgoing network traffic. Firewall, which performs only layer 3 or 4 filtering, processes packets based on their headers. NIDS, in contrast, provides not only layer-3 or 4, but also layer-7 filtering. NIDS searches both packet headers and payloads to identify attack patterns (or signatures). Hence, NIDS can detect and prevent harmful content, such as computer worms, malicious codes, or attacks being transmitted over the network. Such systems examine network communications, identify patterns of computer attacks, and then take action to either Mr. B.VARUNKUMAR, IJRIT
80
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 80- 88
terminate the connections or alert system administrators.
With the rapid expansion of the Internet and the explosion in the number of attacks, design of Network Intrusion Detection Systems has been a big challenge. Advances in optical networking technology are pushing link rates beyond OC-768 (40 Gbps). This throughput is impossible to achieve using existing software-based solu-tions [9], and thus, must be performed in hardware. Most hardware-based solutions for high-speed string matching in NIDS fall into three main categories: ternary content addressable memory (TCAM)-based, dynamic/ static random access memory (DRAM/SRAM)-based, and SRAM-logic-based solutions. Although TCAM-based engines can retrieve results in just one clock cycle, they are power hungry and their throughput is limited by the relatively low speed of TCAMs. On the other hand, SRAM and SRAM-logic-based solutions require multiple cycles to perform a search. Therefore, pipelining techniques are commonly used to improve the throughput. In the SRAM-logic-based approach, a portion of the dictionary is implemented using logic resource, making this approach logic bound and hard to scale to support larger dictionaries. The SRAM-based approaches, which are memory bound, result in an inefficient memory utilization. This inefficiency limits the size of the supported dictionary. In addition, it is difficult to use external SRAM in these architectures, due to the constraint on the number of I/O pins. This constraint restricts the number of external stages, while the amount of on-chip memory upper bounds the size of the memory for each pipeline stage. Due to these two limitations, state-of-the-art SRAM-based solutions do not scale well to support larger dictionary. This scalability has been a dominant issue for implementation of NIDSes in hardware . The key issues to be addressed in designing an architecture for string pattern matching engines are 1. Size of the supported dictionary, 2. throughput, 3. scalability with respect to the size of the dictionary, and 4. dictionary update. To address these challenges, we propose a preprocessing algorithm and a scalable, high-throughput, Memory-efficient Architecture for large-scale String Matching (MASM). This architecture utilizes binary search tree (BST) structure to improve the storage efficiency. MASM also provides a fixed latency due to the linear pipelined architecture. This paper makes the following contributions: An algorithm called Aho-Corasick algorithm that can be performed for the ASCII bits. In that it requires more area and the speed of the system is increased as compared to normal performance. The short-pattern matching algorithm is performed for BIT based string pattern matching and the area consumed will be less and the speed will be increased.
2. BACKGROUND AND RELATED WORK 2.1. String Pattern Matching String pattern matching (or simply string matching) is one of the most important functions of the NIDSs, as it provides the content-search capability. A string matching algorithm compares all the string patterns in a given dictionary (or database) to the traffic passing through the device. Note that the string matching is also referred to as exact string matching. Among currently available NIDS solutions, Snort [2] is a popular open source and cross-platform NIDS. Snort uses signatures and packet headers to detect malicious internet activities. As an open source system, Snort rules are contributed by the network security community to make widely accepted and effective rule-sets. These rule-sets, which include both packet headers and signatures (strings and regular expressions), have grown quite rapidly, as Mr. B.VARUNKUMAR, IJRIT
81
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 80- 88
rules are added as soon as they are extracted by the network security experts. The string patterns constitute the largest portion of the signatures in a Snort database. String pattern matching (or simply string matching) is oneof the most important functions of the NIDSs, as itprovides the content-search capability. A string matchingalgorithm compares all the string patterns in a givendictionary (or database) to the traffic passing through thedevice. Note that the string matching is also referred to asexact string matching.Among currently available NIDS solutions, Snort is apopular open source and crossplatform NIDS. Snort usessignatures and packet headers to detect malicious internetactivities. As an open source system, Snort rules arecontributed by the network security community to makewidely accepted and effective rulesets. These rule-sets,which include both packet headers and signatures (stringsand regular expressions), have grown quite rapidly, as rulesare added as soon as they are extracted by the networksecurity experts. The string patterns constitute the largestportion of the signatures in a Snort database. There are over8K string signatures in the current Snort database.
Fig 1.Multiple string-matching where state machine will recognize the appearance of any of the search strings anywhere in the entire data stream. To address some challenges in pattern matching sequence, a pre-processingString algorithm and a scalable, highthroughput, Memory-efficientArchitecture for large-scale String Matching (MASM).This architecture utilizes binary search tree (BST) structure to improve the storage efficiency. MASM also provides afixed latency due to the linear pipelined architecture.This paper makes the following contributions: • • • •
An algorithm called leaf-attaching to efficientlydisjoint a given dictionary without increasing thenumber of patterns An architecture that achieves a memory efficiencyof 0.56 (for Rogets) and 1.32 byte/char (for Snort)Stateof-the-art designs can only achievethe memory efficiency of over 2 byte/char in thebest case. The implementation on ASIC and FPGA shows asustained aggregated throughput of 24 and 3.2 Gbps,respectively . The design can be duplicated to improve the throughput by exploiting its simple architecture.
Mr. B.VARUNKUMAR, IJRIT
82
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 80- 88
2.2. SPLIT ALGORITHM FOR PATTERN MATCHING Most patterns use only a small subset of the 256 possible characters.Some pattern characters are frequent and appear in a transition almost in every state while others appear infrequently. The pattern matching module which is shown in Fig 2 operates at pipelining stage. 2.3. ARCHITECTURE There are two matching steps in the architecture: 1) Pattern matching and 2) Labelmatching, handled by the pattern matching module andlabel matching module (LMM), respectively.Input datastream is fed into PMM L bytes at a time. This input windowis advanced 1 byte per clock cycle. PMM then matches theinput string against the pattern database, while LMMmatches the {prefix; suffix; match vector} combination tovalidate the long pattern and outputs the matching result.In LMM, all entries are uniquely defined. Hence, any matching mechanism can be utilized. The critical point isthe relationship between the size of the input window Land the number of entries in the LMM. The window size Lshould be greater than or equal to the matching latency of theLMM. For this reason, L should be chosen according to thesize of the dictionary.The block diagram of the basic pipeline and a singlestage of a BST are shown in Fig. 2. To take advantage of thedual-port feature offered by SRAM, the architecture isconfigured as dual-linear pipelines. This configurationdoubles the matching rate. At each stage, the memory hastwo sets of Read/Write ports so that two strings can beinput every clock cycle. The content of each entry in thememory includes: 1) a pattern P, 2) a match vector MV, and3) a pattern label PL. In each pipeline stage, there are fourdata fields forwarded from the previous stage: 1. The input string SI , 2. The matching status vector MSV , 3. The memory access address Addr, and 4. The matched label ML.
Mr. B.VARUNKUMAR, IJRIT
83
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 80- 88
Fig 2 Basic Pattern Matching module with pipelining stage The forwarded memory address is used to retrieve thepattern and its associated data stored in the local memory.This information is compared with the input string todetermine the matching status. In case of a match, thematched label (ML) and the matching status vector (MSV)are updated. The comparison result (1 if the input string isgreater than the node’s pattern, 0 otherwise) is appended tothe current memory address and forwarded to the nextstage. 2.4. Comparator Module There is one comparator in each stage of the pipeline. Itcompares the input string with the node’s pattern, and usesthe node’s match vector (MV) to produce the matchingstatus vector (MSV ). Fig. 3 depicts the block diagram of an8-byte comparator. The inputs include: 1. An input string SI , 2. A pattern P, 3. A match vector MV, 4. A pattern label PL, and 5. A match label.
Fig 4. Block diagram of 8 byte comparator
Fig.4.1.Operation table of 8 bit matching vector decoder The input string and the pattern P go into the “bytecomparator,” which performs byte-wise comparisons of thetwo inputs. The results (M7-M0) are fed into the “matchingvector decoder,” which operates based on the truth table Mr. B.VARUNKUMAR, IJRIT
84
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 80- 88
shown in Fig. 4. The output of the decoder is AND-ed withthe node’s match vector. The result is then AND-ed with theoutput of the “string comparator,” which compares thepattern label and the match label to produce an 8bitmatching vector (MSV).
3. FSM STATES The Matching Pattern is done in a FSM flow that flows through a number of states based on the complexity of patterns.
Fig 5 FSM STATE BLOCK The memory efficiency has to be analyzed by two ways:
number of states used in matching. number of Logic elements utilized after synthesis.
Fig.6.Fsm flow for ASCII based bits Mr. B.VARUNKUMAR, IJRIT
85
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 80- 88
Fig.7.Fsm flow for bit based approach
4. AREA REPORT OF THE DESIGN Table .4.1 Area Report of Patterrn Matching NAME OF BITS TOP LEVEL ENTRY ENTITY NAME TOTAL LOGIC ELEMENTS TOTAL COMBINATIONAL FUNCTION AREA CONSUMED
ASCII bits AHOCORASICK
BIT BASED SHORT TOP MODULE
137
128
137
120
3%
2%
4.1.Fmax REPORT Table.4.2. Comparison of Fmax
ASCII BASED BIT BASED
Fmax 160.21MHZ 361.01MHZ
Mr. B.VARUNKUMAR, IJRIT
Clock CLK CLK
86
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 80- 88
5. SIMULATION RESULTS
Simulation result of pattern matching
Simulation Result for series of Patterrn The series fo pattern matching for the text c,l,i,e,n,t are matched at each cycle i,e, at each states.
6. CONCLUSION A fixed length and arbitrary length matching algorithms for string matching has been done. The algorithm achieves better memory efficiency compared with that of the state of the arts and the speed has been increased for the processor system.
Mr. B.VARUNKUMAR, IJRIT
87
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 80- 88
7. REFERENCES [1] Z.K. Baker and V.K. Prasanna, “A Methodology for Synthesis ofEfficient Intrusion Detection Systems on Fpgas,” FCCM ’04: Proc.12th Ann. IEEE Symp. Field-Programmable Custom ComputingMachines, pp. 135-144, 2004. [2] A. Basu and G. Narlikar, “Fast Incremental Updates forPipelined Forwarding Engines,” Proc. IEEE INFOCOM ’03,pp. 64-74, 2003.[6] CACTI Tool, http://quid.hpl.hp.com:9081/cacti/, 2012. [3] C.R. Clark and D.E. Schimmel, “Scalable Pattern Matching forHigh Speed Networks,” FCCM ’04: Proc. 12th Ann. IEEE Symp.Field-Programmable Custom Computing Machines, pp. 249-257, 2004. [4] P. Gupta and N. McKeown, “Algorithms for Packet Classification,”IEEE Network, vol. 15, no. 2, pp. 24-32, Mar/Apr. 2001. [5] N. Hua, H. Song, and T.V. Lakshman, “Variable-Stride Multi-Pattern Matching for Scalable Deep Packet Inspection,” Proc. IEEEINFOCOM ’09, Apr. 2009. [6] H.-J. Jung, Z. Baker, and V. Prasanna, “Performance of FPGAImplementation of Bit-Split Architecture for Intrusion DetectionSystems,” Proc. Int’l Parallel and Distributed Processing Symp.,p. 177, 2006. [7] J. Dharmapurikar, S. Lockwood. Fast and scalable pattern matching for network intrusion detection systems. IEEE Journal on Selected Areas in Communications, 24(10):1781– 1792, 2006. [8] M. French, E. Anderson, and D.-I. Kang. Autonomous system on a chip adaptation through partial runtime reconfiguration. In FCCM ’08: Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Ma-chines, pages 77–86, Washington, DC, USA, 2008. IEEE Computer Society. [9] P. Gupta and N. McKeown. Algorithms for packet classifica-tion. IEEE Network, 15(2):24–32, 2001. [10] N. Hua, H. Song, and T. V. Lakshman. Variable-stride multi-pattern matching for scalable deep packet inspection. In INFOCOM 2009. The 28th Conference on Computer Communications. IEEE, April 2009. . [11] R. Scrofano, M. B. Gokhale, F. Trouw, and V. K. Prasanna. Accelerating molecular dynamics simulations with recon-figurable computers. IEEE Trans. Parallel Distrib. Syst., 19(6):764–778, 2008. [12] I. Sourdis and D. Pnevmatikatos. Fast, large-scale string match for a 10gbps fpga-based network intrusion. FPL, 2003:880–889, 2003. [13] L. Tan, B. Brotherton, and T. Sherwood. Bit-split string-matching engines for intrusion detection and prevention. ACM Trans. Archit. Code Optim., 3(1):3–34, 2006. [14] L. Tan and T. Sherwood. A high throughput string matching architecture for intrusion detection and prevention. In ISCA ’05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages 112–122, Washington, DC, USA, 2005. IEEE Computer Society. [15] Y.-H. E. Yang and V. K. Prasanna. Memory-efficient pipelined architecture for large-scale string matching. In FCCM ’09: Proceedings of the 17th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Washington, DC, USA, 2009. IEEE Computer Society. [16] F. Yu, R. H. Katz, and T. V. Lakshman. Gigabit rate packet pattern-matching using tcam. In ICNP ’04: Proceedings of the 12th IEEE International Conference on Network Protocols, pages 174–183, Washington, DC, USA, 2004. IEEE Com-puter Society.
Mr. B.VARUNKUMAR, IJRIT
88