Modeling FPGA-Based Cyber-Physical Systems

Viewer
Transcript

Modeling FPGA-Based Cyber-Physical Systems Dan Fay

Graham Schelle

Li Shang

Dirk Grunwald

Department of ECE University of Colorado Boulder, CO 80309

Department of Computer Science University of Colorado Boulder, CO 80309

Department of ECE University of Colorado Boulder, CO 80309

Department of Computer Science University of Colorado Boulder, CO 80309

I. A BSTRACT Cyber-Physical Systems (CPS), such as autonomous robot navigation and other computing systems that must dynamically interact with the real world in real-time, are an important class of emerging applications. Many CPS applications employ soft computation, where the computations do not require numerically exact answers. This error tolerance can be potentially exploited to improve performance, efficiency, or reliability when implemented on an FPGA. This paper presents a software-based framework for studying how best to exploit soft computation when implementing CPS applications on FPGAs. Initial results from running a robot vision application, the Polynomial Mahalanobis Distance, are presented. II. I NTRODUCTION Cyber-Physical Systems (CPS) are an important set of emerging applications. CPS include systems that must dynamically interact with the real world, such as systems ranging from avionics to advanced vehicle dynamics controllers to autonomous robots. CPS face complex real-world interactions which make their design a complicated multi-dimensional optimization problem. Since CPS can involve safety-critical applications, they often have very high reliability requirements both in terms of correctness as well as by possessing hard realtime deadlines. CPS applications possess several important characteristics that make them well-suited for execution on an FPGA. On an FPGA, it is very straightforward to implement high performance applications with deterministic execution behavior, making it easier to guarantee real-time performance. FPGAs are relatively power efficient compared to microprocessors, making them a good choice for the embedded systems that CPS use. Finally, CPS can profitably exploit the run-time reconfiguration capabilities of FPGAs to improve their execution performance and efficiency. While CPS often have very high reliability requirements, computing on CPS frequently employs soft computations, where a correct answer is not a single, numerically exact answer. As a result, it is possible to obtain correct answers even while using less-accurate compute operations such as reduced-precision floating point operations. The resulting improvements in performance, area, and power efficiency can either be employed for increased functionality, efficiency, or can even be used to actually improve the system’s overall reliability by using extra resources to implement fault tolerant

computing techniques such as re-execution and Triple Modular Redundancy (TMR). FPGAs’ configuration flexibility make them a promising platform for exploiting soft computation. A major problem with exploring CPS designs on an FPGA is that FPGA development continues to be significantly more difficult than writing microprocessor-based software. While the tool flow for targeting applications for FPGAs has vastly improved with applications such as Handel-C [6], Single Assignment C (SA-C) [8], Altera’s DSP Builder [2], and Xilinx’s System Generator for DSP [9], these tools suffer several serious shortcomings. The first problem is that many cuttingedge CPS applications are written in languages other than C/C++, such as MATLAB or Soar [5]. While Xilinx’s System Generator for DSP [9] and Altera’s DSP Builder [2] work with MATLAB applications, they do not support implementing floating point on FPGAs. Even when the application happens to be written in C/C++, many C-to-RTL programs require substantial programmer intervention in order to successfully compile to RTL. To successfully target an application to the FPGA, the researcher must hand-select functions and frequently must also annotate the code with special compiler directives. The researcher must then wait a considerable amount of time for the code to be synthesized into a configuration bitstream for the FPGA. III. T EST S ETUP To facilitate rapid design-space exploration of CyberPhysical Systems implemented on FPGAs, we present a tool flow that allows researchers to quickly conduct value profiles and study the effects of reducing the precision of floating point values and of injecting faults into floating point values. We also describe a metric to estimate the effects on area/performance, power consumption, and reconfiguration time that is based on synthesizing a variable precision floating point library onto an FPGA target. The tool suite consists of several tools written for the Pin dynamic binary instrumentation [7] framework. These Pinbased tools, whose flow is outlined in Figure 1, improve researcher productivity by allowing the researcher to study an application’s dynamic data values without needing access to the source code and/or having to rewrite the program (and the libraries that it uses) by hand. The test suite currently consists of three tools: 1) A value profiling tool. This tool provides several different forms of information about the exponent: the tool

Pin Framework Value Profiler

Application

Minimum Exponent

Pin Framework Precision Clamping

Baseline Results

Application

Tested Results

Pass/Fail

Perceptual Difference Utility Minimum Mantissa

Minimum Exponent

Minimum Mantissa

Variable Precision FPU Library

FPGA Synthesis Software

Area Estimate

Power Estimate Reconfiguration Time Estimate

Fig. 1. The Tool Flow for the area, power consumption, and reconfiguration time estimates. First, the value profiler is run to determine the smallest exponent width needed. Next, the precision clamping framework seeks to find the lowest floating point precision that can still produce correct results. Once the minimum exponent and floating point precision are determined, the area, power consumption, and reconfiguration time are estimated.

outputs the minimum and maximum dynamic exponent, a histogram depicting the number of dynamic values for different exponent ranges, and it calculates the minimum bit-width needed to express the full range of exponents seen by the tool. 2) A precision clamping tool, which provides a way to study the application-level effects of reducing the floating point precision. It accomplishes this by masking off the user-specified lower-order bits from the result of every floating point operation in the application or in one or more user-specified functions. 3) A fault injector. This tool simulates the effects on an application when a single-bit, hard stuck-at fault occurs. The current version of the fault injector tool simulates a single-bit stuck-at fault at any bit position in the floatingpoint value. The bit’s position is specified by the user, along with the type of stuck-at fault: stuck-at-0, stuckat-1, or stuck-at-flipped (e.g. the fault injector changes

0->1 and 1->0). Soft computations’ inexact nature makes it potentially difficult to determine what is “correct” and what is “incorrect”. Initial testing of the tool flow used the Polynomial Mahalanobis Distance [4] application, a robot vision algorithm that uses training data taken from near-field stereo vision processing to label traversable paths in the far field. Written in MATLAB, the Polynomial Mahalanobis Distance benchmark provides an image of the labeled path as its output. To compare the outputs, The Perceptual Image Difference Utility [1], which uses several published techniques to see whether there is a human-visible difference between two images, was used to check whether the output image was perceptibly different from the reference image. The raw data for the FPGA area calculations come from synthesizing a single floating point unit onto an FPGA. The floating point units used come from the RPL Floating Point Library [3], a parameterizable, synthesizable floating point library. Written in VHDL, these library modules can leverage specialized blocks on the FPGA. IV. I NITIAL R ESULTS Presented in the attached appendix are the initial results from the studies on the Polynomial Mahalanobis Distance application. Figures 2 and 3 show the effect on the final result of reducing the floating point precision of certain functions. Table I shows the minimum exponent size for the functions that cumulatively take up 90% of the dynamic floating point operations. Table II shows the results of the precision clamping test for these top functions. Finally, Table III shows the fault injection results. R EFERENCES [1] Perceptual image difference utility, April 2008. http://pdiff. sourceforge.net. [2] Altera. Dsp builder reference manual, December 2007. http://www. altera.com/literature/manual/mnl_dsp_builder.pdf. [3] P. Belanovic and M. Leeser. A library of parameterized floating-point modules and their use. In FPL ’02: Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on FieldProgrammable Logic and Applications, pages 657–666, London, UK, 2002. Springer-Verlag. [4] G. Z. Grudic and J. Mulligan. Outdoor path labeling using polynomial mahalanobis distance. In Robotics: Science and Systems, 2006. [5] J. E. Laird, A. Newell, and P. S. Rosenbloom. Soar: an architecture for general intelligence. Artif. Intell., 33(1):1–64, 1987. [6] I. Page. Hardware-software co-synthesis research at oxford, 1997. [7] PIN Dynamic Instrumentation Tool. http://rogue.colorado.edu/pin/. [8] S.-B. Scholz. Single assignment c: efficient support for high-level array operations in a functional setting. J. Funct. Program., 13(6):1005–1059, 2003. [9] Xilinx. System generator for dsp, March 2008. http://www.xilinx. com/support/sw_manuals/sysgen_ref.pdf.

Fig. 2. Effect on the result from reducing the floating point precision of the pow function for the 8th order Polynomial Mahalanobis Distance. From left to right: unaltered double precision, 16-bit precision, 8-bit precision, and 6-bit precision.

Fig. 3. Effect on the result from reducing the floating point precision of the dgemm function for the 8th order Polynomial Mahalanobis Distance. From left to right: unaltered double precision, 12-bit precision, 10-bit precision, and 8-bit precision. Function Name dgemm Z21find nonzero elem idxI7NDArrayE17octave value listRKT iii floor Z6D NINTd ZN10idx vector14idx vector rep15tree to mat idxEdRb ieee754 pow ZNK7NDArray3sumEi ieee754 exp ZmlIdE7MArrayNIT ERKS2 RKS1 Z7productIdE7MArrayNIT ERKS2 S4 Z4kronIdEvRK6Array2IT ES4 RS2 ZmiIdE7MArrayNIT ERKS2 S4 ZplIdE7MArrayNIT ERKS2 RKS1

Minimum Exponent Size 5 3 4 3 3 6 5 10 3 5 0 4 4

TABLE I T HE MINIMUM EXPONENT SIZE , IN BITS , REQUIRED FOR EACH FUNCTION TO FULLY EXPRESS THE ENTIRE DYNAMIC RANGE OF VALUES .

Function Name ZplIdE7MArrayNIT ERKS2 RKS1 floor ZNK7NDArray3sumEi Z7productIdE7MArrayNIT ERKS2 S4 dgemm Z4kronIdEvRK6Array2IT ES4 RS2 Z21find nonzero elem idxI7NDArrayE17octave value listRKT iii ZmiIdE7MArrayNIT ERKS2 S4 ieee754 pow Z6D NINTd ZmlIdE7MArrayNIT ERKS2 RKS1 ieee754 exp ZN10idx vector14idx vector rep15tree to mat idxEdRb

2nd Order 2:40 (-38) 0:N/A (N/A) 10:12 (-2) 9:10 (-1) 16:18 (-2) 0:N/A (N/A) 0:N/A (N/A) 6:5 (1) 9:36 (-27) 15:32 (-17) 9:8 (1) 52:N/A (N/A) 15:N/A (N/A)

4th Order 2:1 (1) 0:N/A (N/A) 12:11 (1) 11:10 (1) 13:12 (1) 0:N/A (N/A) 0:N/A (N/A) 6:5 (1) 10:9 (1) 15:N/A (N/A) 9:8 (1) 52:N/A (N/A) 15:N/A (N/A)

8th Order 2:1 (1) 0:N/A (N/A) 17:16 (1) 14:13 (1) 15:14 (1) 0:N/A (N/A) 0:N/A (N/A) 7:6 (1) 13:12 (1) 15:N/A (N/A) 13:12 (1) 52:N/A (N/A) 15:N/A (N/A)

TABLE II R ESULTS FROM THE PRECISION CLAMPING . TABLE ENTRIES USED THE FOLLOWING FORMAT: lowest passing precision : highest failing precision : difference between lowest passing precision and highest passing precision

Function Name ZplIdE7MArrayNIT ERKS2 RKS1 floor ZNK7NDArray3sumEi Z7productIdE7MArrayNIT ERKS2 S4 dgemm Z4kronIdEvRK6Array2IT ES4 RS2 Z21find nonzero elem idxI7NDArrayE17octave value listRKT iii ZmiIdE7MArrayNIT ERKS2 S4 ieee754 pow Z6D NINTd ZmlIdE7MArrayNIT ERKS2 RKS1 ieee754 exp ZN10idx vector14idx vector rep15tree to mat idxEdRb

2nd Order 0:0:0 0:0:0 0:0:1 1:2:2 1:1:1 2:2:1 0:0:0 0:0:1 1:1:1 1:0:0 2:1:1 8:4:2 0:0:1

4th Order 0:0:0 0:0:0 0:0:0 1:0:1 0:0:1 1:0:1 0:0:0 0:0:1 1:0:1 0:0:0 1:1:1 8:6:2 0:0:0

8th Order 0:0:0 0:0:0 0:0:0 1:0:1 0:0:1 1:0:1 0:0:0 0:0:1 1:0:1 0:0:0 1:1:1 8:6:2 0:0:0

TABLE III R ESULTS FROM THE FAULT INJECTION TESTS . D ESCRIBES THE NUMBER OF BITS THAT ULTIMATELY LEAD TO A PERCEPTUALLY DIFFERENT RESULT. TABLE ENTRIES USE THE FOLLOWING FORMAT: stuck-at-zero : stuck-at-one : stuck-at-flip

Modeling Large-Scale Systems-of-Systems with ...

Modeling of Thermal Systems