EEL 5934 Reconfigurable Computing Course Project Implementation of Data Encryption Standard in FPGA

Course Instructor Dr. Herman Lam Associate Professor, Department of Electrical and Computer Engineering University Of Florida, Gainesville, FL

STUDENT SHANKAR MYILSWAMY UF ID: 42159129

ABSTRACT: Data Encryption Standard has been adopted in cryptographic applications due to its simplicity and powerful encryption and decryption capability. The DES has many varieties that include Simple DES, Triple DES etc. A Simple DES algorithm itself has many rounds of key generation and encryption of messages. The final encrypted text is achieved after some intensive calculation with the key. In this project we propose to improve the speedup of the calculation of the encrypted message by introducing pipelining and parallelism. Since DES algorithm is more intensive in calculation, implementing the same in Field Programmable Arrays (FPGA’s) can enhance their speedup and hence their performance. In this paper we are developing a SDES algorithm with the FPGA. The 64 bit key for the encryption of the data will be fed first into the data path. 4 input messages of 64 bits will be read from the file simultaneously and after an initial latency there will be 4 output encrypted messages of 64 bits will be obtained. Since there is a pipeline we will be able to get 4 messages per clock cycle after the initial latency. The parallelism helps to have 4 encrypted messages instead of 1 message per clock cycle. KEY WORDS: DES, SDES, Parallelism, Pipelining, encryption, speed up. INTRODUCTION: Cryptography referred almost exclusively to encryption, which is the process of converting ordinary information (plaintext) into incomprehensible text (i.e., cipher text). Decryption is the reverse, in other words, moving from the incomprehensible cipher text back to plaintext. A cipher (or cipher) is a pair of algorithms that create the encryption and the reversing decryption. The detailed operation of a cipher is controlled both by the algorithm and in each instance by a key. This is a secret parameter (ideally known only to the communicants) for a specific message exchange context. Keys are important, as ciphers without variable keys can be trivially broken with only the knowledge of the cipher used and are therefore less than useful for most purposes. Modern cryptography intersects the disciplines of mathematics, computer science, and engineering. Applications of cryptography include ATM cards, computer passwords, and electronic commerce. DES ALGORITHM: Data Encryption Standard is one of the widely used Cryptographic algorithms in ATM card encryption to e-mail privacy and secure remote access. It operates on 64 bit message bit and a 56 bit key to encrypt the message. Basic Algorithm: The 64 bit message is initially permuted and shuffled. The 64 bit input key is permuted such that the 64 bit value is converted into 56 bit values neglecting the bits which are multiples of 8, as they are just the parity bits and of no use. The permuted key value is split into two values of 28 bits namely C0, D0.

These values are shifted left once or twice to form values from (C1, D1) to (C16, D16). The C1 and D1 values are concatenated to form K1 and again permuted to form a 48 bit key. Thus we will obtain 16 sub keys by concatenating 16 values of C and D. The permuted message bits are separated as L0 and R0 each of 32 bits. Then the values from (L1, R1) to (L16, R16) are calculated using the following function. Ln = Rn-1 Rn = Ln-1 + f (Rn-1, Kn) where “f” is described as a function which is explained below. Rn-1 is 32 bits and the key Kn is 48 bits, so make the Rn-1 48 bits long for performing the modulo 2 addition. This is done by E- bit selection table, which is a table that maps 32 bits to 48 bits by padding some extra bits to the existing one. To perform the XOR with Ln-1 , the 48 bit intermediate vale should be passed into an S box table, which gives a 32 bit output. The 32 bit value is permuted and XORed with Ln-1 to get the Rn.

The (Ln,Rn) values are concatenated and applied an inverse permutation to obtain the Ciper text is available. The decryption algorithm is same as the encryption but we have to the pass the ciper text as the input and the same key. DESIGN AND INPLEMENTATION: The block diagram for the DES algorithm implemented is shown in the figure below

The functional components of the DES are shown in the figure. The algorithm has 10 BRAMS at the input and 8 BRAMS at the output. The first 2 BRAMS at the input are used to get the input key from the file. The input key is passed to the data path at the first clock cycle. The remaining BRAMS will be inactive during that time. In the next clock cycle 4 64 bit input messages are fed to the remaining 8 BRAMS. Since they are in parallel all the message bits are sent to the data path at the same clock cycle. The data path process the input messages and give the output to the 8 BRAMS present at the output. The 4 64 bit cipher texts are available at the output.

DATA ARRANGEMENT: The 4 64 bit input messages are fed into 8 BRAMS. The capacity of each BRAM is 32 bits. So the data is split into 32 bits and fed into the BRAMS placed adjacently. The data path reads the input from the BRAMS placed adjacently and concatenates the values to form the 64 bit value. After the calculation the data path again splits the cipher text into 32 bits and passed to the output BRAMS.

The arrangement of the data at the BRAMS is shown in the figure above. The 32 bit values are loaded continuously and these values are concatenated to form 64 bit messages or key values. DATAPATH: The data path performs the actual calculation of the cipher generation. The 32 bit messages from the different BRAMS are concatenated into 64 bit messages. Since in this implementation there are 4, 64 bit messages they are fed into four message encryption blocks. The 64 bit key in passed to the key generation block and the 16 sub keys are generated. Each key will be generated every clock cycle, so there is a latency of 16 clock cycles. The 4 64 bit messages are initial permuted with the permutation matrix, and this takes one clock cycle. The messages are fed parallel so in a single clock cycle the 4 initial permuted values are obtained. These values are passed into the LR calculation block along with the sub keys. The LR calculation also happens in parallel with the sub key calculation to gain performance. The L16

and R16 values are calculated and concatenated to form the 64 bit value. The 64 bit value is inversed to form the Cipher text. CONTROLLER: The controller controls the data path, address generators. The data placement scheme in block ram and data path design greatly simplifies our controller design. Generally, controller is responsible for enabling different components at the right time. When go signal is asserted, controller will enable address generator the next clock rising edge and BRAM data is ready to be fetched. At the next rising edge comes, data path is enabled and reads BRAM values into it. The controller wait for initial latency and enables each block in data path to regularize the calculation. After the latency the controller enables the output address generator to write the calculated cipher text values into the output BRAMS. Once the go signal is disabled the controller stops enabling the blocks and hence the whole operation is stalled until the go signal is again asserted. ADDRESS GENERATOR: Address generator is basically an internal counter. The counter changes its value at every clock cycle. Address generator assigns the counter value for block ram as address. The address generator is activated by controller and triggers enable signal to enable BRAMs and sends address information. The input and output address generator gives 8 address to different BRAMS. GLUE LOGIC: The glue logic sends the control signals to the controller. The go signal enables the controller to send control signals to the data path and address generators. The done signal is asserted from the controller to the glue logic to indicate the end of the computation. VHDL, DIMETALK AND NALLATECH BOARD The given input data is read by the main function of the API for Nalla tech board testing. The entire algorithm is implemented in C code to determine the software performance of the algorithm on microprocessor. Since our ultimate aim is to see the speedup when implemented on fpga, we built the entire system in VHDL. To demonstrate the validity of the VHDL codes, the below figures show the waveforms of the input, intermediate and output signals for some random testbench waveforms. The entire DES circuit from VHDL is now instantiated in Dimetalk. It waits for ”go”, ”n” signals from microprocessor, and generates 4 pairs of signals (en, wen and addr) to communicate with the block rams. The output 32-bit data is given through data out pin and stored in block ram. After the entire data has been processed, done signal should be asserted. The entire circuit along with available modules in Dimetalk like Memory Map, BRAMs and Dimetalk clock is shown in the figure below.

TESTING AND RESULTS Our system has been tested with several input messages and in all cases; the output obtained in hardware is verified successfully. The cipher texts obtained after execution conform to the proper cipher text of given input messages. Figures in the following page shows the sample input data and corresponding output data for the hardware case. Initially, in our design we had pipelining and 2 stages of parallelism implemented. In that design we were feeding at the rate of 2 messages as input per clock cycle and got the proper cipher text stored in the output BRAM. But once we increased the number of parallel stages to four, to improve the speedup, the computed cipher texts are not getting stored in the output BRAMS properly. Though the cipher texts are not getting stored in the output BRAMS properly, we have been successful in improving the speedup (4 cipher texts will be available every clock cycle after initial latency) and it has been verified that the cipher texts obtained by having 4 stages of parallelism conforms to the proper cipher text of the given input messages. The software implementation of the algorithm was not successful and hence the software execution time is yet to be determined. But, since we are getting 4 cipher texts every clock cycle there will a fourfold improvement when executed using FPGAs and if the number of parallel stages are improved the speedup obtained using FPGA will be larger than the one obtained through software execution (which is sequential in nature).

The waveforms shown below indicates the following,  The sub key calculation at each clock cycle  The L R value calculation at each clock cycle  The cipher text output after the initial latency for one set of 4 64 bit messages SPEED UP:  400 messages are fed as an input to the Software C code and the Hardware  Time taken by Software is 20,000 us  Time taken by FPGA with 4 stage parallel architecture is 62 us (Time calculated after the Data is written into the BRAM)  Time taken by FPGA with 4 stage parallel architecture is 241 us (Time calculated before the Data is written into the BRAM) Case 1: Speed Up = Tsw/Tfpga = 20000/62 = 323 times faster Case 2: Speed Up = Tsw/Tfpga = 20000/241 = 83 times faster So if we input more messages in parallel we will get a better speed up.

CONCLUSION: The 64 bit DES implemented in FPGA is very useful if the data set is very large. If we are using for communication of data in satellites, mobile the amount of data to be encrypted will be huge and hence the pipelined and parallelized FPGA implementation of DES will improve the speed up drastically. But for smaller data the speed up will not be much significant.

REFERENCES:  William Stallings, “Cryptography And Network Security – Principles and Practices”, Pearson Education, Third Edition, 2003.  http://orlingrabbe.com/des.htm  http://www.tropsoft.com/strongenc/des.htm  Bruce Schneier, “Applied Cryptography”, John Wiley & Sons Inc, 2001.  “The improved Data Encryption Standard (DES) Algorithm” Seung-Jo Han, Heang-Soo Oh, Jongan Park, Spread Spectrum Techniques and Applications Proceedings, 1996, IEEE 4th International Symposium on 22-25 Sep 1996.

EEL 5934 Reconfigurable Computing Course Project

Reconfigurable Computing Course ... University Of Florida, Gainesville, FL .... The given input data is read by the main function of the API for Nalla tech board ...

1MB Sizes 1 Downloads 129 Views

Recommend Documents

Reconfigurable computing iee05tjt.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Reconfigurable ...

19106853-Reconfigurable-Computing-Accelerating-Computation ...
Connect more apps... Try one of the apps below to open or edit this item. 19106853-Reconfigurable-Computing-Accelerating-Computation-With-FPGAs.pdf.

MORFOLOGI EXTERNAL IDENTIFICATION OF EEL (Anguilla spp ...
MORFOLOGI EXTERNAL IDENTIFICATION OF EEL (A ... IN THE RIVER OF AIR HITAM BENGKULU CITY.pdf. MORFOLOGI EXTERNAL IDENTIFICATION OF ...

Reconfigurable Models for Scene Recognition - Brown CS
Note however that a region in the middle of the image could contain water or sand. Similarly a region at the top of the image could contain a cloud, the sun or blue ..... Last column of Table 1 shows the final value of LSSVM objective under each init

Trellis-Coded Modulation Course Project
Mar 14, 2005 - details of computing the weight spectra of convolutional codes with several .... lousy performance of the m = 6 is due to the high multiplicity at dfree: .... accumulated into the master histogram that eventually forms the WEF. 6.

Protection Primitives for Reconfigurable Hardware
sound reconfigurable system security remains an unsolved challenge. An FPGA ... of possible covert channels in stateful policies by statically analyzing the policy enforced by the ...... ranges, similar to a content addressable memory (CAM).

Reconfigurable processor module comprising hybrid stacked ...
Jul 23, 2008 - (75) Inventors: Jon M. Huppenthal, Colorado Springs, .... Conformal Electronic Systems, University of Arkansas, Fay ..... expanding the FPGA's capacity and performance. The tech nique of the present invention may also be ...

Reconfigurable Path Restoration Schemes for MPLS ... - CiteSeerX
(Received November 09, 2008 / Accepted April 26, 2009). 1 Introduction. The Internet is based on a connectionless, unreliable service, which implies no delivery ...

Band-Reconfigurable Multi-UAV-Based ... - Semantic Scholar
CSOIS, Electrical & Computer Engineering Department, Utah State. University, Logan, USA ... Proceedings of the 17th World Congress ... rate mapping but with a limited range (inch level spatial ..... laptop as a MPEG file for further processing.

Quantitative Verification of Reconfigurable ...
(SPM) vs parallel PMs (PPM), and low-performance TM. (LTM) vs ..... [2] E.W. Endsley and M. R. Lucas and D.M. Tilbury, “Software Tools for Verification of ...

BBC Computer Literacy Project - The Centre for Computing History
part series on applications of computers in business and industry, 'Managing the ... The BBC BASIC specification is fairly standard, close to a number of BASICS.

project report on cloud computing pdf
project report on cloud computing pdf. project report on cloud computing pdf. Open. Extract. Open with. Sign In. Main menu.

Project management for cloud computing development
computing architectures in the field of software systems development. We analyze the individual influence of ... software) that will be running on a cloud computing architecture. High quality cloud computing based .... project progress and for antici

Reconfigurable Plasmonic Filters and Spatial ...
combined with the scaling law, and the commercial software HFSS. ..... analytical relation between the circuit elements and graphene's conductivity [55] allows to ...

Quantitative Verification of Reconfigurable Manufacturing Systems ...
Min and Max processing times as quantitative verification indices th,at reflect the .... quantitative analysis to the processing time of an activity that starts and ends with ..... [2] E.W. Endsley and M. R. Lucas and D.M. Tilbury, “Software Tools.

Neural Dynamics in Reconfigurable Silicon
is exponentially decaying node currents as the distance of the node from the input ..... degrees in Electronics and Electrical Communication. Engineering from ... circuit design and modeling biological learning processes in silicon. Csaba Petre ...

Chip-based Reconfigurable Task Management - UNSWorks
Field-programmable logic (FPL) continues to grow in importance as a digital ... tribute the components that are to be computed between hardware and software ... ation are known, an optimal bespoke on-chip controller can be constructed to .... to limi

Neural Dynamics in Reconfigurable Silicon
is the half-center oscillator where the neurons are coupled with inhibitory synapses. ... which we can call a resonant time period. Input signals that .... [3] C. Petre, C. Schlottman, and P. Hasler, “Automated conversion of. Simulink designs to ..

Characterizing the Opportunity and Feasibility of Reconfigurable ...
best memory hierarchy configuration for each application, ..... Includes dynamic power only, no accounting ..... and Software (ISPASS), White Plains, NY, 2010.

Characterizing the Opportunity and Feasibility of Reconfigurable ...
tablet, laptop, and server environments. As Moore's law continues to deliver ... the memory wall [10], multi-level caches have been a key element of computer architecture for decades with research studies spanning organization [11], write and ...

Reconfigurable Path Restoration Schemes for MPLS ... - CiteSeerX
(Received November 09, 2008 / Accepted April 26, 2009). 1 Introduction. The Internet is based on a connectionless, unreliable service, which implies no delivery ...