Accelerating String Matching Using Multi-threaded Algorithm on GPU Cheng-Hung Lin*, Sheng-Yu Tsai**, Chen-Hsiung Liu**, Shih-Chieh Chang**, Jyuo-Min Shyu** *National Taiwan Normal University, Taiwan **National Tsing Hua University , Taiwan
Introduction • Network Intrusions Detection System (NIDS) has been widely used to detect network attacks. • The pattern matching engine dominates the performance of an NIDS. • Traditional pattern matching approaches on uniprocessor are too slow for today’s networking. • Hardware approaches for acceleration pattern matching. – Logic-based – Memory-based – Multiprocessor-based 2
GPU for Pattern Matching • Parallel computation on GPU is suitable for accelerating pattern matching.
AAAAAAAAAAAAAAAAAAAAAAAB
AAAAAAAAAAAAAAAAAAAAAAAB Thread #1 Thread #2
Thread #3 Thread #4
1 thread 24 cycles
4 segments 4 threads 6 cycles 3
Boundary Problem • Boundary Problem – Pattern occurring in the boundary of adjacent segments cannot be detected. – False negative results False Negative
AAAAAAAAAAAAAAAAAABBBBBB Thread #1 Thread #2
Thread #3 Thread #4 4
Overlapped Computation • To resolve boundary problem – Scan across boundaries
• Problem – Overhead of overlapped computation – Throughput reduction Thread #3 can identify "AB" Thread #1
Aho-Corasick Algorithm • Aho-Corasick (AC) algorithm has been widely used for pattern matching due to its advantage of matching multiple patterns in a single pass – Compiling multiple patterns into a composite state machine
B
Patterns (1) AB (2) ABG (3) BEDE (4) EF
2
1
A
G 3
[^ABE] B
0
4
E 8
E
F
E
D 5
6
7
9 6
Aho-Corasick Algorithm (cont.) • Aho-Corasick (AC) state machine composes of – Solid line represents valid transitions. – Dotted line represents failure transitions.
• Failure transition backtracks the state machine to recognize patterns in different start locations. B 1
A
G 2
3
[^ABE]
Input strings : A B E D E
B
0
4
E 8
location 1
E
F
E
D 5
6
7
9
location 2 7
Problems of AC on GPU • Direct implementation of AC on GPU – To resolve the boundary problem, each thread has low bound constraint of scanning length • Constraint = segment length + overlapped length • Overlapped length = the length of longest pattern -1
– Overhead of overlapped computation AAAAAAAAAAAAAAAAAABBBBBB
8
Problems of AC on GPU (cont.)
9
Failureless-AC State Machine • AC state machine – Failure transition backtracks the state machine to recognize patterns in different start locations. B
1
A
2
3
[^ABE]
Input strings :
4
E
location 2
E
D
E
B
0
A B E D E location 1
G
6
5
7
F 8
9
• Failureless-AC state machine – Remove failure transition – Terminated when no valid transitions – Recognize patterns in location 1. Input strings : Location 1
A B E D E
0
Stop B 1
A B
4
E 8
G 2
E
F
3 E
D 5
9
6
7
10
Parallel Failureless-AC Algorithm • Parallel Failureless-AC (PFAC) Algorithm – Allocate each byte of input a thread to traverse Failureless-AC state machine.
XXXXXXXXXABEDEXXXXXXXXXX
11
Mechanism of PFAC Thread #n
…XXXXABEDEXXX… Thread #n+1 1
A B 0
B
2
E 4
E 8
G D
5 F
3 E 6
1
A 7
B 0
Thread #n
2
E 4
E 9
B
8
3 E
D 5
F
G
6
7
9
Thread #n+1
12
Reducing Overlapped Computation • Direct Implementation of AC Algorithm – Each thread has low bound constraint of scanning length – Overlapped computation (overlapped length = 3)
• PFAC Algorithm – – – –
Without boundary problem. Each thread has variable scanning length Most thread terminates early Reducing overlapped computation to 1 1 3 …CCCCCCCCBCCCCCC… 13
Experimental Environments • CPU: Intel® Core™ i7 CPU 950 @3.07 GHz – 4 cores – 12 GB DDR3 memory • GPU: NVIDIA ® GeForce ® GTX 480 @ 1.4 GHz – 480 cores – 1536MB DDR5 memory • Patterns: String pattern of Snort V2.4 – 1,998 rules containing 41,997 characters – Total 27,754 states • Input: Normal and worst case – DEFCON packet 14
Experimental Results Table 1: Throughput of normal case inputs
Memory efficiency= (Throughput x # of characters) / Memory 19
Conclusions • We have proposed a novel parallel string matching algorithm which is well-suited to be performed on GPUs and is free from the boundary detection problem. • The proposed algorithm creates a new state machine which has less complexity and memory usage compared to the traditional Aho-Corasick state machine. • The new algorithm achieves a significant speedup compared to the traditional Aho-Corasick algorithm accelerated by OpenMP on CPU. • Compared to other GPU approaches, the new algorithm achieves 11.6 times faster than the state-of-the-art approach. 20
processor are too slow for today's networking. ⢠Hardware approaches for .... less complexity and memory usage compared to the traditional. Aho-Corasick state ...
AbstractâNetwork Intrusion Detection System has been widely used to protect ... malware. The string matching engine used to identify network ..... for networks. In. Proceedings of LISA99, the 15th Systems Administration Conference,. 1999.
method to avoid a quadratic number of character com- parisons in most practical situations. It has been in- troduced ... Its expected number of text character comparisons is O(n + m). The algorithm of Wu and ...... structure for pattern matching, in:
Jun 14, 2006 - means by definition that P [j] = i. If any of ..... with realistic real world data. .... Parameterized duplication in strings: algorithms and an application.
AbstractâA string dictionary is a data structure for storing a set of strings that maps them ..... been proposed [9], [10] and implemented as open-source software, such as the .... ENWIKI: All page titles from English Wikipedia in. February 2015.9.
times faster with significant improvement on memory efficiency. Furthermore, because the ... become inadequate for the high-speed network. To accelerate string ...
scalability has been a dominant issue for implementation of NIDSes in hardware ... a preprocessing algorithm and a scalable, high-throughput, Memory-effi-.
They are critical net-work security tools that help protect high-speed computer ... Most hardware-based solutions for high-speed string matching in NIDS fall into ...
One of the largest areas deals with speech recognition, where the ... wireless networks, as the air is a low qual- ..... there are few algorithms to deal with them.
(2) The solid-state sensors are increasingly used, which capture only a portion ... file is small. We have ... the ridge-based system will not degrade dramatically.
where the end point of the alignment maybe be unknown. How- ever, it needs to know where the two matching sequences start. Very recently [6] proposed an ...
A. Averbuch is with the School of Computer Science, Tel Aviv University,. Tel Aviv ... convert from pyramid beam projection data into parallel projec- tion data. II.
the availability of unused logic elements on the. FPGA such ... FPGA, the unused programmable logic can be .... dereferences map to Avalon master ports and.
Raj Singh, Head, IC Design Group, CEERI Pilani (Email: [email protected] ). Accelerating Blowfish ... of the NIOS II IDE, which is used for software development for the NIOS II ..... Automation Conference, Proceedings of the. ASP-DAC 2000.
variants such as evolutionary strategies (ES) [2], real coded ge- netic algorithms .... tions in the reproduction stage [5], [23]. In order to distinguish ... uses an additional mutation operation called trigonometric mu- tation operation (TMO).
A version of this work will appear in the IEEE Transactions on Parallel and Distributed ... is a binary array; applying a mask to a string involves computing a dot product ... comparison of the proposed approach with the well known open-source ...
Nov 13, 2010 - closure during myomectomy have been used [1â3]. Despite using these methods, some amount of bleeding may con- tinue from open vessels of the exposed raw surface resulting from myomectomy until myometrial defect is occluded with appro
process that creates two or more threads is called a multithreaded process. ... In C#, you create a thread by creating an object of type Thread, giving its ...
speech, gait, signature) characteristics, called biometric identifiers or traits or .... lies in the pre processing of the bad quality of fingerprint images which also add to the low ... Images Using Oriented Diffusionâ, IEEE Computer Society on Di
Jan 29, 2010 - Stereo matching is generally defined as the problem of discovering points or regions ..... Scheme of the software architecture. ..... In Proceedings of the 1995 IEEE International Conference on Robotics and Automation,Nagoya,.
Apr 2, 2010 - show that the run-time performance is promising and that our ap- ...... pattern matchings, such as Perl, python, awk and sed, programmers.