1. INTRODUCTION Today more and more sensitive data is stored digitally. Bank accounts, medical records and personal emails are some categories that data must keep secure. The science of cryptography tries to encounter the lack of security. Data confidentiality, authentication, non-reputation and data integrity are some of the main parts of cryptography. The evolution of cryptography drove in very complex cryptographic models which they could not be implemented before some years. The use of systems with increasing complexity, which usually are more secure, has as result low throughput rate and more energy consumption. Then the evolution of cipher has no practical impact, if it has only theoretical background. Every encryption algorithm should exploit as much as possible the conditions of the specific system without omitting the physical, area and timing limitations. This fact requires new ways in design architectures for secure and reliable Crypto Systems [2]. Last years many cryptographic implementations have been proposed. They are implemented in software or in hardware. The choice of the implementation depends on the application and on the algorithm that is to implement. Software-

based approaches lead to slow implementations and they are very energy inefficient [3]. A cryptographic algorithm, also called cipher, is the mathematical function used for encryption and decryption. A symmetric key algorithm is an algorithm for cryptography that uses the same cryptographic key to encrypt and decrypt the message.There are two basic types of symmetric algorithms: block ciphers and stream ciphers [4]. Block ciphers operate on blocks of plaintext and cipher text-usually of 64 bits but sometimes longer. Stream Ciphers operate on streams of plaintext and cipher text one bit or byte (sometimes even one 32-bit word) at a time. With a block cipher, the same plaintext block will always encrypt to the same cipher text block, using the same key. With a stream cipher, the same plaintext bit or byte will encrypt to a different bit or byte every time. RC5 cipher is used for symmetric encryption in order to achieve privacy into many cryptographic standards as into WTLS layer of the Wireless Application Protocol [5-7]. It is a fully parameterized block cipher. The key length, the number of rounds and the block size may all be specified before this cipher starts ciphertext generation. RC5 outperforms other ciphers with intrinsic algorithmic simplicity and it is considers as one of the fastest block ciphers. The key element in RC5 is based on circular rotations. RC5 security strength relies on non-linear register rotations, as its sole non-linear operator. This kind of diffusion is simpler to be implemented in comparison with other ciphers like DES, which uses S-Boxes and Data Permutations. In general, the software implementations of ciphers are often computationally not fast and therefore the use of hardware devices is considered as an efficient alternative. In this architecture of a reconfigurable cryptographic system is proposed. The proposed system operates as encoder and decoder for symmetric cryptographic algorithms. It checks the requirements in throughput and security and re-defines the number of cryptographic rounds and the number of block bits. Finally a parametric VLSI design methodology is proposed for constructing cryptographic systems for symmetric encryption easily.

Authorized licensed use limited to: IMEC. Downloaded on April 10, 2009 at 10:57 from IEEE Xplore. Restrictions apply.

This paper is organized as follows: In section 2 the RC5 standard is presented. Information about RC5 system architecture is given in section 3. In the section 4 the proposed cryptographic architecture is examined. A parametric VLSI design methodology for constructing cryptographic systems is proposed in section 5. Finally experimental results and conclusions are given in section 6. 2. RCS ENCRYPTION ALGORITHM RC5 is a parameterized encryption algorithm. A particular RC5 algorithm is denoted as RC5-w/rlb. These three parameters are summarized below: w: word size in bits. Each input (plaintext) contains two w-bit words and the output (ciphertext) block is 2w-bit long. The standard value of w is 32-bit; allowable values ofware 16, 32 and 64-bit. r: The number of rounds. Allowable values of rare 0, 1, ... 255. b: The number of bytes in the user's secret key K. Allowable values ofb are 0, 1, ... , 255. RC5 consists of a key expansion process, an encryption and a decryption scheme. These procedures use three primitive operations and their inverses: Addition of words modulo

2

W ,

The conventional architecture of RC5 Core Implementation is illustrated in Figure 2. It performs both encryption and decryption with two different cores, Encrypt and Decrypt, respectively. This architecture design is based on RC5 specifications, which define two different schemes: one for encryption and one for decryption. The Initial Unit divides the input message into two w-bit words (A and B). Especially in the case of encryption operation, modulo additions are performed between the first two keys S(O), S(l) and the words A and B, respectively. Encryption Core performs the encryption transformation of r rounds. In every round i, two proper keys S(2i), S(2i+1) are used. The defined components for the primitive decryption operation are modulo subtraction 2w, right circular shift register (R.C.S.) and XOR operation (Figure 3). When the r rounds are performed, the Final Unit concatenates the two w-bit words into the 2w-bit output message. In the case of decryption operation, modulo subtraction 2w with the first two keys S(O), S(l) are performed before the 2wbit cipher is produced. 2w-bit plaintext

and the inverse

operation, modulo- 2 W subtraction. Bit-wise exclusive -OR of words, denoted by XOR. Rotation: the rotation of word x left by y bits is denoted L.C.S (left circular shift). The inverse operation is the rotation of word x right by y bits, denoted by R. C. S.. (right circular shift) The Key expansion process expands the user's secret key K to fill the expanded key table S, so that S resembles an array of t=2(r+1) random binary words determined by K.

3. RCS BASIC SYSTEM ARCHITECTURE The main units of the system are: the key expansion unit, the RAM blocks which are used for keys storage, and the RCS Core that is the unit which both encryption/decryption schemes are performed. RCS needs an extended table of keys which we symbolize with S and it results from the used secret key K. The size of S depends from the r and is t=2*(r+1) words. First the Key Expansion Unit computes the values of the keys. The keys are stored in the RAM and the RC5 core uses these keys. In figure 1 a typical implementation of a com lete RC5 s stem is resented. plain! cipher

KEY

key expansion

unit

Figure 1. The main units ofRC5 Crypto System

Figure 2. The conventional architecture ofRC5 core

4. PROPOSED CRYPTO IMPLEMENTATIONS 4.1. Applying Pipelining Technique We implemented the RC5 Round (encoder and decoder) based on the RC5 standard [1] and then by applying the pipeline technique we had the following results presented in Table 1. Table 1. S thesis Results for the RC5 Round

Architecture no i elined i elined

Area Clbs 1884 1905

Authorized licensed use limited to: IMEC. Downloaded on April 10, 2009 at 10:57 from IEEE Xplore. Restrictions apply.

Dela ns 12.98 ns 47.03 ns

By the above results we conclude that by using pipelined technique, the covered area is increased by 1.1 % with corresponding increase of system's throughput by 262%. Therefore for the proposed architecture the basic element RC5 Round implemented with pipelined logic. In the Figure 3 the encoder and decoder for the RC5 Round are presented.

The implementation of system for B internally RC5 Crypto Cores is presented in figure 4. However when the number of B increases, it leads to expensive systems because of covered area resources. For this reason, the introducing of feedback choice is essential. Then the system can support high secure transactions without increased area covered system resources. In this case, the achieved throughput decreases. The keys of system come from the memory, where are stored temporarily. At the end of every operation of encryption/decryption, the keys are erased from memory and they are updated. Consequently, each cryptanalytic attacker begins his attacks from the beginning.

I REGISTER 1 I S(2i) S(2i+1)

REGISTER 1 S(2i) S(2i+1)

Figure 3. The Encoder and Decoder ofRc5 Round

4.2. Top Level Architecture of Proposed System The architecture of proposed system is presented in Figure 4. The red coloured lines present the signals which they come from control unit. The basic signals of control unit are encryption/decryption signal that decides which type of coding will be executed and mode signal which determines the size of maximum block that can be processed by the system. In the proposed system the size of maximum block is considered to be 128-bit. The system can support the coding of 32, .64, .96 and 128 bits. The case of 96 bits rejected because it is not supported from RC5 standard. An other basic signal is the line that checks the number of cryptographic rounds that the system executes. It controls the last multiplexer of the system. The multiplexer decides if the data feed the input or be forwarded to the output.

REGISTER B

MUX

Figure 4. Top Level Architecture of proposed system

The system has the possibility of processing 32, 64 and 128 bits messages. This system is suitable for systems as servers and smart card readers. It communicates with entities of different degree of security and block bits. In the first stage of process, it is examined if the two systems can they cooperate. If this step passed then the process of cryptography starts.

Authorized licensed use limited to: IMEC. Downloaded on April 10, 2009 at 10:57 from IEEE Xplore. Restrictions apply.

5. CONSTRUCTING CRYPTO CORES In order to study the behavior of system for different number of cryptographic rounds, block bits, and available Crypto Cores we wrote parametric VHDL code. All VHDL files are parametric and their values are determined by the file pack. vhd. With this technique is easy to construct reconfigurable systems determining each time the number of cipher Cores and the maximum system's block bits. Below the first lines pack.vhd file are presented and gives values to the basic parameters of system. library ieee; use ieee.std_Iogic_1164.all; package myJack is comtant N:integer :=1()#127#; comtant P:integer :=10#31#; comtant S:integer :=10#6#; -IIIlI1Wer ofpipe/tiled stages of Res comtant R:integer :=10#6#; -Illlnwer ofroullds comtant B:integer :=10#3#; -IlUI1Wer ofb/ocks=(RCS core + reg) constant C:integer :=R!B; --nunwer offeedbacks

Figure 5. The first lines ofpack.vhd

6. EXPERIMENTAL RESULTS The proposed architecture has been captured by using VHDL. All the internal components of the design were synthesized placed and routed using XILINX FPGA devices [8]. Synthesis results for the proposed system implementation are illustrated in Table 2. . R esu ts T a bl e 2. Coml anson 0 fS.ynth eSlS Number of B=2 B=3 B=4 B=5 RC5 Cores lOs 774 646 902 1030 Function 12745 8933 16564 20377 Generators 4467 6373 8282 CLB Slices 10189 4485 7951 6216 9682 Dffs Maximum 68.2 MHz Frequency

B=6 1158 24189 12094 11413

From the table above we observe that importing RC5 Crypto Cores influences the area covered system resources. However, the maximum frequency of system is not decreased because of pipelined technique. The introducing of Crypto Cores gives secure and high throughput systems. For B=6 with the application of the feedback method the system operates in 12 cryptographic rounds, so it is a high secure cryptographic system. Table 3 presents the system's throughput as a function of number bits, the internal RC5 Crypto Cores, and the cryptographic rounds. The parameter B and the number of block bits increase the throughput of system linearly. Also [3] the system becomes more secure. On the contrary, when we make the system secure (increasing the number of cryptographic rounds) by using feedback method we have losses in the throughput.

T a bl e 3. The fu nchon 0 f througJh tput Word (bit) 64 32 128 Throughtput (B/R)*2182,4 (B/R)*4364,8 (B/R)*8729,6 (Mbps)

7. CONCLUSIONS In this paper a reconfigurable hardware design architecture for symmetric encryption is proposed. The level of security and the available system resources control system parameters as the number of cryptographic rounds and the processed bits. It is clear that the number of additional rounds in a cryptographic system has as result the system to become more secure. Depending on system specifications, the system can increase its security applying feedback method and increasing the processed bits. Feedback method leads to less efficient systems and increasing the number of processed bits to more expensive and area resources demanded. Every time we desire a golden ratio between covered area system resources and security. When there is need for high secure systems, we design implementations with more internal RC5 cores, as it is presented in section 5.

8. REFERENCES [1] Ronald 1. Rivest, "The RC5 Encryption Algorithln", proceedings of the 1994 Leuven Workshop on Fast Software Encryption (Springer 1995), pages 86-96.. [2] P. Kitsos, O. Koufopavlou, G. Selimis and N. Sklavos, "Low Power Cryptography", accepted for publication in the Journal of Physics Conference Series. [3] James Goodman and Anatha P. Chandrakasan, "An EnergyEfficient Reconfigurable public-key Cryptography Processor", IEEE Journal of solid-state circuits, vol. 36, NO 11, November 2001. [4] Bruce Scneier, "Applied Cryptography: Protocols, Algorithms, and Source in C", published by John Willey & Sons, Inc 1996. [5] Burton S. Kalinski jr., Yiqun Lisa Yin, "On the security of the RC5 Encryption Algorithm", RSA Laboratories technical Report, September 1998. [6] N. Sklavos and o. Koufopavlou, "Mobile Communications World: Security Implementations Aspects- A state of the Art", Computer Science Journal of Moldova, Institute of Mathematics & Computer Science, Vol. 11, Number 2,2003. [7] N. Sklavos, A. P. Fournaris, and O. Koufopavlou, "WAP Security: Implementation Cost and Performance Evaluation of a Scalable Architecture for RC5 Parameterized Block Cipher", proceedings of IEEE Mediterranean Electrotechnical Conference (MELECON '04), Dubrovnik, Croatia, May 12-15, 2004. [8] Xilinx, San Jose, California, USA, Virtex, 2.5 V Field Programmable Gate Arrays, www.xilinx.com. 2003.

Authorized licensed use limited to: IMEC. Downloaded on April 10, 2009 at 10:57 from IEEE Xplore. Restrictions apply.