CNN CHIP AND FPGA TO EXPLORE COMPLEXITY P. Arena, L. Fortuna, G. Vagliasindi DIEES - Dipartimento di Ingegneria Elettrica, Elettronica e dei Sistemi Facolta di Ingegneria - Universita degli Studi di Catania Viale A. Doria, 6. 95125 Catania, Italy
[email protected]
ABSTRACT The novel ACEl6Kvl chip has proved the performances of the Cellular Nonlinear Networks. Its signal processing capabilities need a powerful interface to be exploited in deep. In this paper, the authors would present the main outline of a project able to carry out the fully performance of the ACE 16Kv 1 chip and its core in order to study the emergent chaotic dynamics. The hardware system has been mainly based on two chips: the ACEl6Kvl and a FPGA, which interface the 128x128 CNN cells with a PC, where a high-level program permits to interact with the system.
1. INTRODUCTION The VLSI implementation of the Cellular Nonlinear Network (CNN) [1] opens the way to a series of experiment related to the studies of chaotic dynamics. Recently in [2][3] have been presented the results of firstorder autonomous and non autonomous space-invariant CNN. Particularly, in [3] the study focuses the mathematical model of the CNN chips: the Full Signal Range (FSR) [4]. The presence of 128x128 differential equation solvers on a single chip yields obsolete the numerical simulation performed by the digital machines. In fact, these machines work in compliance with the Single Instruction Single Data (SISD) paradigm; this means that to simulate more differential equations the computing power to solve each one should be multiplied by its number. Three VLSI implementations of the CNNs have been realized: the ACE4K [5], the CACE1K [6] and the ACE16Kvl [7]. The first one is composed by 64x64 identical analog CNN cells interfaced by an analog bus with the external world. Around it, two boards have been made: Aladdin System [8] and Aladdin Visual Computer [9]. The CACElK is an analog chip of 32x32 cells, but each one has a couple of state variables, by making a two-layer array. Its interface is also an analog bus. The third one is the real innovative one; it is composed by 128x 128 cells (four times bigger than the ACE4K) all interfaced by a 32-bit digital bus. Moreover, it hosts a CMOS sensor capable to transfer the image sensed on the internal cells. These features carry out a real parallel processor easy to use thanks to the digital interface.
154
A. Basile Automation and Robotics Team STMicroelettronics Stradale Primosole 50, 95121 Catania, Italy
[email protected]
The easier way to manage the ACE16Kvl chip is by using a FPGA, so a VHDL program could work-around a complex Printed Circuit Board (PCB) development. In fact, one of the recent FPGA chips is able to manage the databus and the control bus of the ACEl6Kvl and, at the same time, to interface with a memory that will store the instruction sequence and a parallel port in order to receive this sequence by a high-level program. This paper is organized as follows: in Section 2 will be presented the board composed by the ACE16Kvl and the Xilinx Spartan3 XC3S400-pq208 FPGA chip and their features. Also an overview of the main characteristics of the VHDL composing the Finite State Machines within the FPGA will be given. Here, a little focus will be devoted to the PC based software layer. In Section 3, the test environment based on this board will be presented. Finally, Section 4 will summarize the performance of the complete project particularly to some test-bench.
ACE 16K Mnhz :5P Cbekgr.i
l eezater _ I
_
Led hicatoer
liaF
v~~~~~~~~W
I
l DCM I
v
Lb
I PClfO
tX
1
FPCLA,
AC 16 UO
Bvtn
~E NGIN E
Cdu1b
A(_"Ell6K
EfEPROMI FLASH
ParvaUel Buffer
Iio
Figure 1. Schematic view of the designed platform.
I I
2. THE PLATFORM The designed platform is based of several hardware, firmware and software components. The hardware is composed by two PCBs; the first one is a simple bank of buffer that interfaces the parallel port of the Personal Computer and the other PCB. Its task is preventing conflict on the parallel data bus that could damage the 1/0. The second PCB contains all the main component of the system: the FPGA chip, the ACE16Kvl, a flash memory and a series of led and push buttons to interact with the system. An overall schematic view is given in Fig. 1. Particular attention has been devoted to the link between the ACEl6Kvl and the FPGA, this is due to the high number of pins available around the ACEl6Kvl. In fact, it is packaged in a ceramic QFP-144, where: * 32 pin ofData Bus; * 20 pin ofAddress Bus; * 5 pin of Memory Access Control Signals; * 3 pin ofData Communications Control Signals; * 3 pin ofAddress Event Control Signals; * 11 pin for Test purposes. The other pins are related to Supply and Ground. The complete communication between the hosting system and the ACE16Kvl pass through the control of all these pins. The communication is completely digital and based on a simple hand-shaking protocol controlled by the ACEl6Kvl 1/0 entity. The role of these pins is to manage the three blocks of the CNN chip: the Central Core, the Program Circuitry and the Input/Output Block. Where the first one is the main one, it consists of the array of the 128 x 128 cells plus the following functional blocks: * Photo-receptors: different kind of them are present in the chip in order to realize the acquisition of images directly through the focal plane in very different lighting conditions. * Analog Processing Units: which permits the execution of some analog processing instructions like convolution on grey scale images * Logic Processing Units: a programmable twoinput one-output logic operator which permits the execution of logic operations on black and white images, like XOR, AND or OR operations. * Memory Units: which allow the storage of up to eight grayscale images and two binary ones permitting the execution of complex algorithms that need for the storage of the intermediate processing results. The complexity of the ACE16Kvl is related to its potentialities and its flexibility; in fact, it is intemally tuned via a series of micro-operations that control all the microswitch of the chip (see Fig. 2). Varying the behavior of the functional blocks, change the evolution of the cells and so the phenomena observed to the macroscopic level. The way
to modify these switches is through a programming block, which is constituted by 8 SRAM blocks where the digital and the analog instructions are stored. Each instruction can be selected independently by composing a high number of different configurations. In order to manage all these blocks and these micro switches, an intelligent entity has been designed, the Engine ACE16K. It knows how these switches have to be controlled during the evolution of any algorithm. Globally, the developed platform would represent the medium to explore the signal-processing world by using a PC and simple software. To satisfy this task, the hardware and software aspects are driven by the following guidelines: * The hardware should be flexible so to drive totally the CNN chip. The FPGA permits this plasticity via a high-level language, the Hardware Description Language. In order to exploit the maximum performances of the chip in various applications, the HDL program could be rearranged without any hardware modification; * the realization of a PC user-friendly interface. This abstracts the platform and the ACEl6Kvl to any users.
0( C4
CONVEYOR 0
OPTIC.AL MOD-ULE
C_M
Figure 2. A global block diagram of the ACEl6Kvl elementary cell [7].
155
The first guideline moves to put all the hardware parts implemented within the FPGA, meanwhile the fixed one are outside it. Therefore, by referring to Fig. 1, outside the Xilinx Chip there are: * An EEPROM used to store the configuration bitstream to construct the device; * A FLASH memory to store the instruction sequence dictated by the PC; * The Led Indicators used to provide information about the status of the FPGA intemal blocks; * The Push Buttons that provide a way of drive the configuration of the FPGA. * An I/O Board that buffers the data to and from the PC; * A 50 MHz oscillator. The 1/0 between the PC and the boards occurs through the parallel interface based on the EPP protocol [10], which permits a transfer rate of 2MByte/s. Within the FPGA, the entities developed are: * A PC I/O Block, which is connected to the PCB attached to the PC Parallel Port by realizing the physical implementation of the EPP protocol and codes the data in a 32 bit format; * The Engine ACE16K, it is the smarter block which interprets the 32 bit coming from the PC and makes the correct operation to manage the pins of the ACE chip; * An ACE16K I/O block that is interfaced to the ACE16Kvl and prevent damages due to unsafe signal combinations. * A Memory Manager, which drives a common 29Fxxx Flash Memory. * the Digital Clock Manager (DCM), which is an Intellectual Property CORE developed by Xilinx which main role is the skew elimination in the distribution of the clock inside the FPGA and which provide clock multiplication and division capability.
Figure 3. Graphical User Interface.
156
The intelligent entity is the Engine ACE16K which represents a microprocessor core. In fact, it translates the incoming bits in a sequence of micro-ops that have to be performed in the correct order. This block is also devoted to drive the other entities. From the other side, to provide the user-friendliness of the platform, a Graphic User Interface (GUI) software was developed. It offers to the users the possibility to exploit the CNN chip by a simple point and click system. This software runs in a Windows OS environment (see Fig. 3). Its functionalities are: to perform some basic tests on the ACE16Kvl; to implement low level and high-level commands; to collect single instructions in a more complex flow of commands. In other words, it is an easy interface, which allows to access through a series of push buttons and combo boxes to the programmability of the ACE16Kvl chip and the FPGA. The low-level instructions are strictly related to the microswitch and the micro-operation, while the high-level commands are collection of these for pre-programmed tasks. The program itself translates then the command in the correct sequence of bit sent to the FPGA, where the Engine ACE16K entity decodes these instructions in order to drive the 74 pins of the ACE16Kvl .
3. TEST ENVIRONMENT The FPGA optimizations carry out a minimum clock time of about 5.3ns. This means that the core could work at about 190 MHz, fully capable to satisfy the ACE16KV1 rating specification [7] of 30 MHz. However, the platform performances are penalized by the bottleneck of the PC Parallel Port. For example, to transfer an image occurs 8ms while to transfer an instruction 2ls. Future version could enhance the PC - FPGA connection by implementing within the FPGA a PCI manager, so to obtain a throughput of 132MByte/s. This could reduce the communication payload and offer a PC coprocessor system. Currently, on the designed platform have been executed a series of tests. For example, a simple image download and readout is able to reveal the correct behavior of the Digital to Analog Converter (DAC), of the Analog to Digital Converter (ADC), of the intemal buffers and of the implemented entities. More detailed tests have been performed by using the eleven pins already described, they permits to check the Weight and Reference Buffers, the Programming Memory, the Input Node of the Cells, the DAC and the ADC. More complete test will regard the use of this platform to replicate the identification of MARFE in Tokamak machines (FTU) [11], the medical image processing [12] and the robotic locomotion via wave generation [13]. This last one could verify the CMOS sensor of the ACE16Kvl by implementing a fully autonomous navigation system.
4. CONCLUSION The ACE16Kv1 is a real-time signal processor; it is able to solve many complex systems. The author have designed a modular platform able to exploit its performances, it becomes fundamental for a wide range of applicability of CNN based systems. At current state, a fully study of the performance has not been yet performed. However, the preliminary obtained results reveal that many application fields have to be explored; where the wave based computing could demonstrate its potentialities.
Plasma Instability In Nuclear Fusion Applications," in Proc. of the 2004 IEEE ISCAS, Vancouver, Canada, May 2004. [12] P. Arena, A. Basile, M. Bucolo, and L. Fortuna, "Image processing for medical diagnosis using cnn," Nuclear Instruments and Methods in Physics Research A, vol. 497, pp. 174-178, 2003. [13] A. Adamatzky, P.Arena, A. Basile, R.Carmona-Galan, B. De Lacy Costello, L. Fortuna, M. Frasca, A. RodriguezVazquez,, "Reaction-Diffusion Navigation Robot Control: From Chemical to VLSI Analogic Processors", IEEE Trans. on CAS-I, vol. 51, no.5, May 2004.
5. ACKNOWLEDGMENT The authors wish to thank Prof. A. Rodriguez-Vazquez of the Institute of Microelectronics of Seville, Centro Nacional de Microelectronica (IMSE-CNM), Universidad de Sevilla (Spain) for the precious tips provided.
6. REFERENCES [1] L.O. Chua, L. Yang, "Cellular Neural Networks: Theory and Applications", IEEE Trans. ircuits and Systems-I, 35, pp. 1257-1290, 1988. [2] M. Biey, M. Gilli, P. Checco, "Complex dynamic phenomena in space-invariant cellular neural networks", IEEE Trans. on Circuits and Systems-I, vol: 49, Issue: 3, pp: 340- 345, March 2002. [3] F. Corinto, M. Gilli, P.P. Civalleri, "On dynamic behavior of full range CNNs", Proc. of the 2003 IEEE ISCAS, vol. 5, pp: V-765 - V-768, May 2003. [4] S. Espejo, R. Carmona, R. Dominguez-Castro and A. Rodriguez-Vazquez, "A VLSI-Oriented Continous-Time CNN Model". Intemational Joumal of Circuit Theory and Applications. Vol. 24, pp 341-356, May-June 1996. [5] G. Lifnan, S. Espejo, R. Domiguez-Castro, A. RodriguezVazquez, "The CNNUC3: an analog 10 64x64 CNN universal machine chip prototype with 7-bit analog accuracy", Proc. of the 6th Intemational Workshop on CNNA, Catania, Italy, May 2000. [6] Carmona, R.; Jimenez-Garrido, F.; Dominguez-Castro, R.; Espejo, S.; Rodriguez-Vazquez, A., "CMOS realization of a 2-layer CNN universal machine chip", in Proc.of the 7th Intemational Workshop on CNNA 2002, Frankfurt, Germany, July 2002, pp. 444- 451. [7] A. Rodriguez-Vazquez, G. Lifinan, L. Carranza, E. RocaMoreno et al., "ACE16k: the Third Generation of MixelSignal SIMD-CNN ACE Chips Toward VSoCs", IEEE Trans. on Circuits and Systems-I, vol. 51, pp. 851-863, May 2004. [8] A. Zarandy, T. Roska, P. Szolgay, S. Z6ld, P. Foldesy and I. Petris, "CNN Chip Prototyping and Development Systems", European Conference on Circuit Theory and Design ECCTD'99, Design Automation Day proceedings, pp. 69-81. (ECCTD'99-DAD), Stresa, Italy, 1999. [9] http://www.analogic-computers.com. [10] IEEE 1284-1994, "Standard Signaling Method for a Bidirectional Parallel Peripheral Interface for PCs". [11] P. Arena, A. Basile, L. Fortuna, G. Mazzitelli, A. Rizzo, M. Zammataro, "CNN-Based Real-Time Video Detection of
157