Inferring Protocol State Machine from Real-World Trace Yipeng Wang12 , Zhibin Zhang1 , and Li Guo1 1

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2 Graduate University, Chinese Academy of Sciences, Beijing, China [email protected]

Abstract. Application-level protocol specifications are helpful for network security management, including intrusion detection, intrusion prevention and detecting malicious code. However, current methods for obtaining unknown protocol specifications highly rely on manual operations, such as reverse engineering. This poster provides a novel insight into inferring a protocol state machine from real-world trace of a application. The chief feature of our method is that it has no priori knowledge of protocol format, and our technique is based on the statistical nature of the protocol specifications. We evaluate our approach with text and binary protocols, our experimental results demonstrate our proposed method has a good performance in practice.


Introduction and System Architecture

Finding protocol specifications is a crucial issue in network security, and detailed knowledge of a protocol specification is helpful in many network security applications, such as intrusion detection systems and vulnerability discovery etc. In the context of extracting protocol specifications, inferring the protocol state machine plays a more important role in practice. ScriptGen [1] is an attempt to infer protocol state machine from network traffic. However, the proposed technique is limited for no generalization. This poster provides a novel insight into inferring a protocol state machine from real-world packet trace of an application. Moveover, we propose a system that can automatically extract protocol state machine for stateful network protocols from Internet traffic. The input to our system is real-world trace of a specific application, and the output to our system is the protocol state machine of the specific application. Furthermore, our system has the following features, (a) no knowledge of protocol format, (b) appropriate for both text and binary protocols, (c) the protocol state machine we inferred is of good quality. The objective of our system is to infer the specifications of a protocol that is used for communication between different hosts. To this end, our system carries on the whole process in four phases, which are shown as follows: Network data collection. In this phase, network traffic of a specific application (such as SMTP, DNS etc.) is collected carefully. In this poster, The method of collecting packets under specific transport layer port is adopted. S. Jha, R. Sommer, and C. Kreibich (Eds.): RAID 2010, LNCS 6307, pp. 498–499, 2010. c Springer-Verlag Berlin Heidelberg 2010 

Inferring Protocol State Machine from Real-World Trace




















d a








d e






e b


250 220














0x32 0x00 0x00 0x00 0x06 0x00 0x00


0x32 0x00 0x00 0x00 0x07


0x32 0x00 0x00 0x00 0x08


0x32 0x00 0x00 0x00 0x11


0x32 0x00 0x00 0x00 0x12


Fig. 1. The Protocol State Machine of SMTP and XUNLEI Protocol

Packet analysis. During the part of packet analysis, we first look for high frequency units from off-line application-layer packet headers, which is obtained by the phase of network data collection. Then, we employ Kolmogorov-Smirnov (K-S) test to determine the optimal number of units. Finally, we replay each applicationlayer packet header and construct protocol format messages with objective units. Message clustering. In this phase, we extract the feature from each protocol format message. The feature is used to measure the similarity between messages. Then, the partitioning around medoids (PAM) clustering algorithm is applied to group similar messages into a cluster. Finally, the medoid message of a cluster will become a protocol state message. State machine inference. In order to infer protocol state machine, we should be aware of the packet state sequence of flows. For the purpose of labeling the packet state, initially we have to find the nearest medoid message of each packet and assign the identical label type to the packet. Then, by finding the relationship between different state types, a protocol machine is constructed. After state machine minimization, we will get the ultimate protocol state machine.



We make use of SMTP (text protocol) and XUNLEI (binary protocol) to test and verify our method. The protocol state machine of SMTP we inferred is shown in Fig. 1 left, and XUNLEI in right. Moreover, our evaluation experiments show that our system is capable of parsing about 86% flows of SMTP protocol and about 90% flows of XUNLEI protocol.

Reference 1. Leita, C., Mermoud, K., Dacier, M.: Scriptgen: an automated script generation tool for honeyd. In: Annual Computer Security Applications Conference (2005)

Inferring Protocol State Machine from Real-World Trace - Springer Link

... Protocol State Machine from Real-World Trace. 499. EHLO. /. HELO. MAIL ... Leita, C., Mermoud, K., Dacier, M.: Scriptgen: an automated script generation tool.

86KB Sizes 0 Downloads 275 Views

Recommend Documents

Improved Optimal Link State Routing (OLSR) Protocol
performance by selecting an appropriate “Hello Refresh Interval” for better throughput and select suitable MPR nodes, to reduce overhead and packet duplicity.

Trace-metal pollution of soils in northern England - Springer Link
Apr 6, 2002 - historical spreading of sewage waste, and those related to road vehicles. The statistical analysis of geochemical data classified by local, human ...

Extended Hidden Vector State Parser - Springer Link
on the use of negative examples which are collected automatically from the semantic corpus. Second, we deal with .... TION, PLATFORM, PRICE, and REJECT because only these concepts can be parents of suitable leaf ..... Computer Speech.

Subtidal macrozoobenthos communities from northern ... - Springer Link
Nov 27, 2007 - EN) on northern Chile and South America in general was not as catastrophic as ..... P = 0.005). The SIMPER analysis revealed that the poly-.

An English-Arabic Bi-directional Machine Translation ... - Springer Link
rule-based generation, Arabic natural language processing, bilingual agricul- ... erature and web content) is far larger than the amount of Arabic content available. ..... In: 40th Annual Meeting of the Association for Computational Lin-.

Development of meso-scale milling machine tool and ... - Springer Link
technologies for meso-scale manufacturing such as. MEMS and ultra ..... Manufacturing Grantees and Research Conference Proc,. Dallas, TX, 2004: 1–9 (in ...

VMMB: Virtual Machine Memory Balancing for ... - Springer Link
Mar 28, 2012 - Springer Science+Business Media B.V. 2012. Abstract Virtualization ... weight solution, the number of total migration in a data center should ..... 1800 memory size (MB) guest swapping (MB) time (sec) actual working set size.

An English-Arabic Bi-directional Machine Translation ... - Springer Link
For each natural language processing component, i.e., analysis, transfer, and generation, we ... The size of the modern English content (e.g. lit- erature and web ...

Automated Promotion Machine: Emerging IS for the ... - Springer Link
by in-store promotion terminal or online Internet terminal. Potential impacts of .... attention to consumer's unique needs with a low cost. Automated ... Level 1 APM, can track consumer buying process and can be accessed by wireless devices.

Automated Promotion Machine: Emerging IS for the ... - Springer Link
individuals via in-store machine, Internet, or mobile devices. The most .... Keams et al. [19], the degree that IT planning represents business plans determines.

Planning for manual positioning: the end-state comfort ... - Springer Link
Aug 29, 2007 - One class of such anticipatory effects is the end-state comfort effect, a tendency to take hold of an object in an awkward way to permit a more ...

Tinospora crispa - Springer Link
naturally free from side effects are still in use by diabetic patients, especially in Third .... For the perifusion studies, data from rat islets are presented as mean absolute .... treated animals showed signs of recovery in body weight gains, reach

Chloraea alpina - Springer Link
Many floral characters influence not only pollen receipt and seed set but also pollen export and the number of seeds sired in the .... inserted by natural agents were not included in the final data set. Data were analysed with a ..... Ashman, T.L. an

GOODMAN'S - Springer Link
relation (evidential support) in “grue” contexts, not a logical relation (the ...... Fitelson, B.: The paradox of confirmation, Philosophy Compass, in B. Weatherson.

Bubo bubo - Springer Link
a local spatial-scale analysis. Joaquın Ortego Æ Pedro J. Cordero. Received: 16 March 2009 / Accepted: 17 August 2009 / Published online: 4 September 2009. Ó Springer Science+Business Media B.V. 2009. Abstract Knowledge of the factors influencing

Quantum Programming - Springer Link
Abstract. In this paper a programming language, qGCL, is presented for the expression of quantum algorithms. It contains the features re- quired to program a 'universal' quantum computer (including initiali- sation and observation), has a formal sema

BMC Bioinformatics - Springer Link
Apr 11, 2008 - Abstract. Background: This paper describes the design of an event ontology being developed for application in the machine understanding of infectious disease-related events reported in natural language text. This event ontology is desi

Candidate quality - Springer Link
didate quality when the campaigning costs are sufficiently high. Keywords Politicians' competence . Career concerns . Campaigning costs . Rewards for elected ...

Mathematical Biology - Springer Link
Here φ is the general form of free energy density. ... surfaces. γ is the edge energy density on the boundary. ..... According to the conventional Green theorem.

Artificial Emotions - Springer Link
Department of Computer Engineering and Industrial Automation. School of ... researchers in Computer Science and Artificial Intelligence (AI). It is believed that ...

Bayesian optimism - Springer Link
Jun 17, 2017 - also use the convention that for any f, g ∈ F and E ∈ , the act f Eg ...... and ESEM 2016 (Geneva) for helpful conversations and comments.

Contents - Springer Link
Dec 31, 2010 - Value-at-risk: The new benchmark for managing financial risk (3rd ed.). New. York: McGraw-Hill. 6. Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7, 77–91. 7. Reilly, F., & Brown, K. (2002). Investment analysis & port

(Tursiops sp.)? - Springer Link
Michael R. Heithaus & Janet Mann ... differences in foraging tactics, including possible tool use .... sponges is associated with variation in apparent tool use.