FPGA CAD Research: An Experiment (Progress Report #2: 13/4/05~20/4/05) Xiaoxiang Shi ([email protected]) Department of Computer Science , Xidian University

Abstract In this report, we introduce our work that we’ve done this week. First, we outline a typical CAD flow that is the same as the one we used in the previous demo experiment. Then we do another experiment using a sequential MCNC benchmark circuit. Also we detail our experiment methodology which we did not in the previous report (Report 1). After that, we present the main results we obtained. Finally, we draw some conclusions and make plans for our future research.

1 Introduction Last week, we failed in compiling the SIS[1] synthesis package provided by University of California, Berkeley. We know that there will be no meaning to do research on the SIS tool if we couldn’t compile it. However, this week, through our great efforts, we succeed in compiling the famous synthesis package and also integrating the technology-mapping algorithm FlowMap[2] into it. So it would be the greatest achievement in this week. Then, we conduct another experiment using the tools we compiled and it seems to be working very well. The organization of this progress report is as follows. In the following section we give a typical CAD flow which we used in this demo experiment. In Section 3 we introduce the experiment methodology which describes the file formats and commands detailly. In Section 4 we make a demo experiment and provide some results. In the final Section we conclude and outline some future work in this field. Also a few appendixes supplemented for this report.

2 Typical CAD Flow Figure 1 illustrates the CAD flow we typically use. First, the SIS synthesis package is used to perform technology-independent logic optimization of each circuit. Next, each circuit is technologymapped into 4-LUTs and flip flops by FlowMap. The output of FlowMap is a .blif format netlist of LUTs and flip flops. Our T-VPack[3,4,,5,6] program then packs this netlist of 4-LUTs and flip flops into more coarse-grained logic blocks, and outputs a netlist in the .net format VPR uses. VPR[3, 4, 7, 8, 9, 10, 11] can then place the circuit and either globally route it or perform combined global and detailed routing on it. The output of VPR consists of a file describing the circuit placement, another file describing the circuit’s routing, and various statistics concerning the minimum number of tracks per channel required to successfully route, the total wirelength, etc. In order to find the minimum number of tracks required for successful routing, VPR actually attempts to route the circuit several times with different numbers of tracks allowed per channel in each attempted routing.

-1-

C ir c u it L o g ic O p tim iz a tio n ( S I S ) T e c h n o lo g y M a p to L U T s ( F lo w M a p )

.b lif F o r m a t N e t lis t o f L U T s a n d F lip F lo p s L o g ic B lo c k P a ra m e te rs FP G A A r c h ite c tu r e D e s c r iptio n F ile

P a c k F F s a n d L U T s in to L o g ic B lo c k s (T - V p a c k )

.n e t F o r m a t N e t lis t o f L o g ic B lo c k s

V PR : P la c e C ir c u it o r R e a d in a n E x is tin g P la c e m e n t

E x is t in g P la c e m e n t o r P la c e m e n t fr o m A n oth e r C A D T ool

P e r fo r m E ith e r G lo b a l o r C o m b in e d G lo b a l / D e t a ile d R o u t in g

P la c e m e n t a n d R o u tin g O u t p u t F ile s , P la c e m e n t a n d R o u t in g S t a t is t ic s F i g u r e 1 : T y p ic a l C A D F lo w

3 Experiment Methodology In this section, we will give detail instructions on the demo experiment, include the circuit description, synthesis commands, target FPGA architecture (Device database), and some file formats (the files were generated in the experiment). Circuit: In the last demo experiment, we use a MCNC benchmark circuit e64 which is a combinational circuit. To compare with it, we take another MCNC benchmark circuit s1423 to conduct this experiment, and it is a sequential circuit and may be a bit larger than e64. The circuit file is in .blif format. To learn more about BLIF format, see the paper[13]. Synthesis commands: In the demo experiment, we synthesize and logic optimize the circuit s1423 via script.algebraic and script.rugged command in SIS, with the smaller output circuit generated by the two scripts taken in each case. The script.rugged command is a new script which is very robust according to our experiments. Details about these script commands see Section 4. Technology mapping commands: After synthesized and logic optimized, the circuit is then technology-mapped into 4-input look-up tables and flip flops via script.4(FlowMap) command in SIS.Details about the script command see Section 4. -2-

FPGA architecture: After synthesis, technology mapping and packing, the circuit becomes a netlist which is used by our place and route tool VPR. In the experiment, we must specify the architecture of the target FPGA, that is, a device database file or .arch file we used to describe the FPGA. Details of the FPGA architecture see Appendix A. Netlist file format: The output of the packing tool T-VPack is a netlist file in .net format. It is very important for the place-and-route phase. Details of this file description see Appendix B. Placement file format: The placement file is either generated by VPR after placement or placement from another CAD tool. It’s in .p format and described in Appendix C detailly. Routing file format: The routing file is generated by VPR after successfully routes the circuit. It’s in .r format, for details about it goes to Appendix D.

4 Demo Experiment Results These results are of the MCNC benchmark circuit s1423. This is one of the smallest circuits we use to benchmark FPGAs---it contains 329 four-input look-up tables. It is, however, faster to download pictures from a circuit this size than from a larger one, and s1423 is still large enough to be interesting. Detail commands in this demo experiment: (In Linux 9.0, shell terminal) [root@xidian demo]#./sis (or ./xsis) sis> read_blif s1423.blif sis> print_stats top pi=18 po=6 nodes=813 latches=74 lits(sop)=1289 sis> source script.algebraic sis> print_stats top pi=18 po=6 nodes=219 lits(sop)= 710 sis> source script.rugged sis> print_stats top pi=18 po=6 nodes=149 lits(sop)= 805 sis> source script.4 sis> print_stats top pi=18 po=6 nodes=321 lits(sop)= 1355

latches=74

latches=74

latches=74

sis> write_blif s.blif -3-

sis> quit [root@xidian demo]#./t-vpack s.blif s.net Input netlist file: s.blif Model: top Primary Inputs: 18. Primary Outputs: 5. LUTs: 321. Latches: 74. Total Blocks: 418. Total Nets: 413 After packing to LUT+FF Logic Blocks: LUT+FF Logic Blocks: 329. Total Nets: 347. Completed clustering consistency check successfully. [root@xidian demo]#./vpr s.net devicedb.arch s.p s.r Initial Random Placement

-4-

Final Placement

Completely (Detailed) Routed Circuit The minimum channel width factor for successful routing is 6, that is the routing shown here. We’ve highlighted one block (in green) by clicking on it. Its fanout is shown in red, and its fanin is shown in blue.

-5-

5 Conclusions and Future Work This week, FPGA CAD research group member, Lv, Wang, Zhao and I have compiled the synthesis package SIS successfully. And also we integrate the technology-mapping algorithm FlowMap into it. It would be good news for all of us. On the basis of these tools, we conduct another demo experiment using a sequential MCNC benchmark circuit s1423 and give our results. However, the circuits we accept are in .blif,.eqn,.slif,.edif format at present. Next step, we would like to enhance the tool with new features, such as supporting VHDL,Verilog format circuits. Currently, Xiaoguang Wang is studying SIS and Vis, the front end of SIS which supports Verilog format circuits. Jinpeng Lv is doing research on technology-mapping algorithms. And Gang Zhao is looking for some methods to describe FPGAs in terms of Device database. All of the three above are busy with their graduation papers. -6-

Appendix A: FPGA Architecture for the Experiment The FPGA we used is an array or island-style FPGA. It consists of an array of logic blocks and routing channels. Two I/O pads fit into the height of one row or the width of one column, as shown below. All the routing channels have the same width (number of wires). FPGA architecture

Each circuit must be mapped into the smallest square FPGA that can accommodate it. For example, a circuit containing 14 logic blocks and 10 I/O pads would be mapped into an FPGA consisting of a 4x4 array of logic blocks. The FPGA logic block consists of a 4-input look-up table (LUT), and a flip flop, as shown below. There is only one output, which can be either the registered or the unregistered LUT output. The logic block has four inputs for the LUT and a clock input. Since the clock is normally routed via a special-purpose dedicated routing network in commercial FPGAs, we do not route it. That is, we may completely ignore the clock net, since it is assumed to be routed on a special global network. Logic Block Structure

The locations of the FPGA logic block pins are shown below. Each input is accessible from one side of the logic block, while the output pin can connect to routing wires in both the channel to the right and the channel below the logic block.

-7-

Logic Block Pin Locations

Each logic block input pin can connect to any one of the wiring segments in the channel adjacent to it. Each logic block output pin can connect to any of the wiring segments in the channels adjacent to it. (In the usual FPGA terminology, then, Fc = the number of tracks per channel, W). The figure below should make the situation clear. Logic Block Pin to Routing Channel Interconnect

Similarly, an I/O pad can connect to any one of the wiring segments in the channel adjacent to it. For example, an I/O pad at the top of the chip can connect to any of the W wires (where W is the channel width) in the horizontal channel immediately below it. The FPGA routing is unsegmented. That is, each wiring segment spans only one logic block before it terminates in a switch box. By turning on some of the programmable switches within a switch box, longer paths can be constructed.

-8-

Unsegmented FPGA Routing

Whenever a vertical and a horizontal channel intersect there is a switch box. In this architecture, when a wire enters a switch box, there are three programmable switches that allow it to connect to three other wires in adjacent channel segments. (In terms of the usual FPGA terminology then, Fs = 3.) The pattern, or topology, of switches used in this architecture is the planar or domainbased switch box topology. In this switch box topology, a wire in track number one connects only to wires in track number one in adjacent channel segments, wires in track number 2 connect only to other wires in track number 2 and so on. The figure below illustrates the connections in a switch box..

Switch Box Topology

-9-

Appendix B: Netlist File Format In either a netlist or a placement file, a sharp (#) sign indicates that the remainder of the line is a comment. A backslash (\) at the end of a line means the line is continued on the line below. Three different circuit elements exist: input pads, output pads, and logic blocks. In addition, the netlist file can also declare that a signal is global; that is, that a signal will be routed via a special, dedicated resource and should therefore be ignored during routing. Input Pads Input pads are declared via .input lines: .input my_pad pinlist: my_net The lines above indicate that there is an input pad named my_pad which drives a net called my_net. Pads can have the same name as signal nets with no conflicts. .input alpha pinlist: alpha # No conflict between pad and net name.

Output Pads Output pads are declared via .output lines: .output my_opad pinlist: some_net The lines above declare a pad named my_opad. It is connected to net some_net. FPGA Logic Blocks Logic blocks are declared via the .clb keyword: .clb my_logic_block pinlist: in_a in_b in_c in_d out_net clk subblock: sb_one 0 1 2 3 4 5

# Ignore this line.

Recall that the logic block contains a 4-input LUT and a flip flop. The lines above declare a logic block named my_logic_block. The pinlist line lists the nets connected to this logic block. The nets connected to the four LUT inputs are listed first, followed by the net connected to the logic block output and then the net connected to the logic block clock pin. The subblock line gives information that is useful for timing analysis. A logic block may not need signals connected to all of its LUT inputs or to its clock pin. In this case the unconnected pins are marked as open. .clb my_logic_block pinlist: in_a open open in_d out_net open - 10 -

# Only 2 LUT inputs, no clock subblock: sb_one 0 1 2 3 4 5

# Ignore this line.

Global Signals Some signals in FPGAs are normally routed via dedicated, special-purpose networks, rather than the general routing resources. For the demo experiment, clocks are assumed to be routed on a dedicated network so they should not be routed by our tools. Global signals are indicated in the netlist file by .global lines: .global clk # Don't route clk net. For the demo experiment, the only net marked as global is the clock net.

Appendix C: Placement File Format The first line of the placement file lists the netlist file and the architecture description file used by VPR when it created the placement. The second line of the placement file gives the size of the logic block array (e.g. 20 x 20 logic blocks). All the following lines have the format: block_name x y subblock_number The block name is the name of this block, as given in the input netlist. X and y are the row and column in which the block is placed, respectively. The subblock number is meaningful only for pads. Since we can place two pads in a row or column (see Appendix A: FPGA architecture for the experiment) the subblock number specifies which of the possible pad locations (either location 0 or location 1) in row x and column y contains this pad. Note that the first pad occupied at some (x, y) location is always that with subblock number 0. For logic blocks (.clbs), the subblock number is always zero. The placement files output by VPR also include a fifth field as a comment. You can ignore this field. The figure below shows the coordinate system used by VPR via a small 2 x 2 (logic block array) FPGA. Logic blocks all go in the area with x between 1 and 2 and y between 1 and 2, inclusive. All pads either have x equal to 0 or 3 or y equal to 0 or 3. Notice that there are no pads in the chip corners, so no I/O pins can be placed there.

Placement Coordinate System

- 11 -

A sample placement file is given below. The first six blocks are I/O pads, while the last two blocks are logic blocks. Netlist file: xor5.net Architecture file: sample.arch Array size: 2 x 2 logic blocks #block name x y subblk block number

#---------a b c d e out:xor5 xor5 [1]

--

0

--

------

0 1 0 1 1 2 1 1

1 0 2 3 3 0 2 1

-----------0 0 1 0 1 0 0

#0 NB: block number #1 is a comment. #2 Ignore it. #3 #4 #5 #6 #7

Appendix D: Routing File Format The first line of the routing file gives the array size, nx x ny. The remainder of the routing file lists the global or the detailed routing for each net, one by one. Each routing begins with the word net, - 12 -

followed by the net index used internally by VPR to identify the net and, in brackets, the name of the net given in the netlist file. The following lines define the routing of the net. Each begins with a keyword that identifies a type of routing segment. The possible keywords are SOURCE (the source of a certain output pin class), SINK (the sink of a certain input pin class), OPIN (output pin), IPIN (input pin), CHANX (horizontal channel), and CHANY (vertical channel). Each routing begins on a SOURCE and ends on a SINK. In brackets after the keyword is the (x, y) location of this routing resource. Finally, the pad number (if the SOURCE, SINK, IPIN or OPIN was on an I/O pad), pin number (if the IPIN or OPIN was on a clb), class number (if the SOURCE or SINK was on a clb) or track number (for CHANX or CHANY) is listed --- whichever one is appropriate. The meaning of these numbers should be fairly obvious in each case. If we are attaching to a pad, the pad number given for a resource is the subblock number defining to which pad at location (x, y) we are attached. See Figure in Appendix C for a diagram of the coordinate system used by VPR. In a horizontal channel (CHANX) track 0 is the bottommost track, while in a vertical channel (CHANY) track 0 is the leftmost track. Note that if only global routing was performed the track number for each of the CHANX and CHANY resources listed in the routing will be 0, as global routing does not assign tracks to the various nets. For an N-pin net, we need N-1 distinct wiring segment to connect all the pins. The first wiring path will always go from a SOURCE to a SINK. The routing segment listed immediately after the SINK is the part of the existing routing to which the new path attaches. It is important to realize that the first pin after a SINK is the connection into the already specified routing tree; when computing routing statistics be sure that do not count the same segment several times by ignoring this fact. An example routing for one net is listed below.

Net 5 (xor5) SOURCE (1,2) Class: 1

# Source for pins of class 1.

OPIN (1,2) Pin: 4 CHANX (1,1) Track: 1 CHANX (2,1) Track: 1 IPIN (2,2) Pin: 0 SINK (2,2) Class: 0 CHANX (1,1) Track: 1

# Sink for pins of class 0 on a clb. # Note: Connection to existing routing!

CHANY (1,2) Track: 1 CHANX (2,2) Track: 1 CHANX (1,2) Track: 1 - 13 -

IPIN (1,3) Pad: 1 SINK (1,3) Pad: 1

# This sink is an output pad at (1,3), subblock 1.

Nets which are specified to be global in the netlist file (generally clocks) are not routed. Instead, a list of the blocks (name and internal index) which this net must connect is printed out. The location of each block and the class of the pin to which the net must connect at each block is also printed. For clbs, the class is simply whatever class was specified for that pin in the architecture input file. For pads the pinclass is always -1; since pads do not have logically-equivalent pins, pin classes are not needed. An example listing for a global net is given below: Net 146 (pclk): global net connecting: Block pclk (#146) at (1, 0), pinclass -1. Block pksi_17_ (#431) at (3, 26), pinclass 2. Block pksi_185_ (#432) at (5, 48), pinclass 2. Block n_n2879 (#433) at (49, 23), pinclass 2.

6. References [1] E. M. Sentovich et al, “SIS: A System for Sequential Circuit Analysis,” Tech. Report No. UCB/ERL M92/41, University of California, Berkeley, 1992. [2] J. Cong and Y. Ding, “FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs,” IEEE Trans. CAD, Jan. 1994, pp. 1 - 12. [3] V. Betz, J. Rose and A. Marquardt, Architecture and CAD for Deep-Submicron FPGAs, Kluwer Academic Publishers, 1999. [4] V. Betz, “Architecture and CAD for the Speed and Area Optimization of FPGAs,” Ph.D. Dissertation, University of Toronto, 1998. [5] V. Betz and J. Rose, “Cluster-Based Logic Blocks for FPGAs: Area-Efficiency vs. Input Sharing and Size,” CICC, 1997, pp. 551 - 554. [6] A. Marquardt, V. Betz and J. Rose, “Using Cluster-Based Logic Blocks and Timing-Driven Packing to Improve FPGA Speed and Density,” ACM/SIGDA Int. Symp. on FPGAs, 1999, pp. 37 - 46. [7] V. Betz and J. Rose, “Directional Bias and Non-Uniformity in FPGA Global Routing Architectures,” ICCAD, 1996, pp. 652 - 659. [8] V. Betz and J. Rose, “On Biased and Non-Uniform Global Routing Architectures and CAD Tools for FPGAs,” CSRI Technical Report #358, Department of Electrical and Computer Engineering, University of Toronto, 1996. [9] V. Betz and J. Rose, “VPR: A New Packing, Placement and Routing Tool for FPGA Research,” Seventh International Workshop on Field-Programmable Logic and Applications, 1997, pp. 213 -222. [10] A. Marquardt, V. Betz and J. Rose, “Timing-Driven Placement for FPGAs,” ACM/SIGDA Int. Symp. on FPGAs, 2000, pp. 203 - 213. [11] V. Betz and J. Rose, “Automatic Generation of FPGA Routing Architectures from High-Level Descriptions,” ACM/SIGDA Int. Symp. on FPGAs, 2000, pp. 175 - 184. [12] Robert K. Brayton, Richard Rudell, Alberto Sangiovanni-Vincentelli, and Albert R.Wang. - 14 -

MIS:AMultiple-Level Logic Optimization System. IEEE Transactions on Computer-Aided Design, CAD-6(6):1062–1081, November 1987. [13]R.Lisanke. Logic synthesis benchmark circuits for the International Workshop on Logic Synthesis, May, 1989.

- 15 -

FPGA CAD Research

Apr 13, 2005 - Department of Computer Science , Xidian University ..... of a certain output pin class), SINK (the sink of a certain input pin class), OPIN (output ...

842KB Sizes 2 Downloads 256 Views

Recommend Documents

FPGA CAD Research: An Introduction - Semantic Scholar
FPGA CAD Research: An Introduction. (Report 1: 6/4/05~13/4/05). Xiaoxiang Shi. ([email protected]). Department of Computer Science , Xidian University. Abstract. In this report, we firstly introduce several CAD tools in FPGA architecture and CAD r

FPGA CAD Research: An Introduction
Apr 6, 2005 - areas of FPGA architecture and CAD research. ... pictures from a circuit this size than from a larger one, and e64 is still large enough to be ...

Spector: An OpenCL FPGA Benchmark Suite - Kastner Research Group
for design optimization since even experts typically need to test many designs due to the ... Unfortunately, a major drawback of these OpenCL FPGA tools is that the .... requiring 1 to 4 hours on a modern server, and occasionally taking more ...

An FPGA-based Prototyping Platform for Research in ... - CMU (ECE)
vide an aggregate bandwidth of 4×2.5 = 10 Gb/s. We ..... Table 1 shows the person-months spent in the .... packet from the NIC, namely 40 bytes, we spend 16.

An FPGA-based Prototyping Platform for Research in ... - CMU (ECE)
cial “Start Flag” bit is set in the last one of the clus- tered requests; at ..... specific FPGA environment and the specific RocketIO ..... [20] Broadcom Corporation.

An FPGA-based Prototyping Platform for Research in ...
written into memory. – headers wait in resequ. Q's ... (fast, on-chip, up to. 306 KB). Next Generation (2007) Node: Block Diagram. 10 Gb/s. 10 Gb/s. NI must be ...

CAD CAM.pdf
(a) Summarize the position of patenting of software and business methods with case studies. [7M]. (b) Explain how the infringement of trade dress is involved in ...

KS10 FPGA Processor Manual - GitHub
Secure Digital High-Capacity (SDHC) Card Driver . ...... The peripherals will be significantly different: modern peripherals like solid state Secure Digital High- ...... KS10 FPGA Processor Manual. Page 152. 1 January 2018. Table 65 – RP Error Regi

FPGA PERFORMANCE OPTIMIZATION VIA CHIPWISE ...
variation and optimize performance for each chip. Chipwise place- ..... vided by VPR, which is a deterministic placement engine without ..... search, 1961, pp.

FPGA Implementation Cost & Performance Evaluation ...
IEEE 802.11 standard does not provide technology or implementation, but introduces ... wireless protocol for both ad-hoc and client/server networks. The users' ...

FPGA Performance Optimization Via Chipwise ...
Both custom IC and FPGA designs in the nanometer regime suffer from process variations. ... First, we obtain the variation map for each chip by synthesizing.

cad standard
Page 1. 1. A. B. C. D. E. FG. H. 2. 3. 4. 4'. 5. E. H. 1st FLOOR PLANSCALE 1:100. แนวชายคา.

FPGA Implementation of Encryption Primitives - International Journal ...
Abstract. In my project, circuit design of an arithmetic module applied to cryptography i.e. Modulo Multiplicative. Inverse used in Montgomery algorithm is presented and results are simulated using Xilinx. This algorithm is useful in doing encryption

FPGA SDK for Nanoscale Architectures
From the tool-flow perspective, this architecture is similar to antifuse configurable architectures hence we propose a FPGA SDK based programming environment that support domain-space exploration. I. INTRODUCTION. Some nanowire-based fabric proposals

0470054379 - (2007) Advanced FPGA Design.pdf
Advanced FPGA. Design. Architecture, Implementation,. and Optimization. Steve Kilts. Spectrum Design Solutions. Minneapolis, Minnesota. Page 2 of 355 ...

FPGA Implementation of Encryption Primitives - International Journal ...
doing encryption algorithms in binary arithmetic because all computers only deal with binary ... This multiplicative inverse function has iterative computations of ...

CAD Modeling Tips.pdf
Page 2 of 27. CAD Modeling Tips. • “CAD” here covers both solid modeling and. mathematical software. • A little on what kinds of tools are available at.

RITES CAD Operator.pdf
Page 1 of 5. RITES LIMITED. (A Govt. of India Enterprise). RITES Bhawan, Plot No. 1, Sector – 29, Gurgaon – 122001. Recruitment of CAD operators on contractual basis. RITES Ltd., a Mini Ratna Central Public Sector Enterprise under the Ministry of

CAD Modeling Tips.pdf
Page 2 of 8. Page 2 of 8. Page 3 of 8. CAD Modeling Tips.pdf. CAD Modeling Tips.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying CAD Modeling Tips.pdf. Page 1 of 8.

WorkNC-CAD-WEB.pdf
Page 2 of 24. Page 2. 前言. WorkNC CAM 編寫NC程式,簡單易學,所以使用者都是. 拿來編程為主,但CAD仍是一知半解,甚至都沒使用。 Page 2 of 24. Page 3 of 24. Page 3. 主題. 1.尺寸標註 2.曲面分析與應用 3.逆向å

Speech Recognition Using FPGA Technology
Department of Electrical Computer and Software Engineering ..... FSM is created to implement the bus interface between the FPGA and the Wolfson. Note that ...

MDE-based FPGA Physical Design
General Terms Design, Management. Keywords ... The design of the Madeo infrastructure, as a frame- .... ture(MDA) by the Object Management Group (OMG)[8].

FPGA IMPLEMENTATION OF THE MORPHOLOGICAL ...
used because it might be computationally intensive in some applications, however, the available current hardware resources overcome this disadvantage.

Speech Recognition Using FPGA Technology
Figure 1: Two-line I2C bus protocol for the Wolfson ... Speech recognition is becoming increasingly popular and can be found in luxury cars, mobile phones,.