Thursday, January 31, 2008

FPGA-Based Prototyping - "Productivity to Burn"

FPGA-Based Prototyping - "Productivity to Burn"

This article highlights recent tool advances that can help you setup, implement, and verify your FPGA-based ASIC prototype faster than ever before.

Programmable Logic DesignLine

These days, a large portion of ASIC, SoC, and ASSP designs are at least partially prototyped in one or more FPGAs. This amounts to many thousands of prototyping projects per year. Compared with other ASIC verification methods, however, FPGA Prototyping is mistakenly seen by many as an ad-hoc mix of tools that must be cobbled together by hand. In reality, powerful integrated tools, platforms, and expertise exist that greatly improve the productivity of FPGA-based ASIC prototyping. This article highlights recent tool advances that can help you set-up, implement, and verify your prototype faster than ever before.

Prototype setup

Use the right FPGAs
ASIC designs are generally larger – and often faster – than FPGA Devices, and they tend to push the envelopes of FPGA performance and density. Thus, we will almost always be using the largest FPGAs (in the fastest speed grade) available. Anything less might seem to save money, but in the end could make timing closure harder to reach or make the design harder to fit (or both).

Less obvious is the fact that we should also use the device in a package with the most available I/O pins and with the most flexible support for clocking, voltage, and I/O standards. The I/O often becomes the most valuable resource on the devices, especially when designs are partitioned across multiple FPGAs. Synplicity builds HAPS (High-Performance ASIC Prototyping System) boards using the largest Xilinx devices – namely the Virtex-5 LXC5V330 in the -2 speed grade and the 1760 ball grid array package.

Design partitioning
If a design takes up more than 70% to 80% of the largest FPGA available to you, then it is worth considering partitioning it across more than one device. Some designs have natural and obvious partitions based on existing hierarchical boundaries. The aim is to ensure that we do not create new critical paths in the design that cross between FPGAs on the board.

In order to address this, Synplicity's Certify tool has numerous partitioning aids from low level block zippering up to completely automatic full-RTL partitioning. These allow as much or as little manipulation of the design as required to meet the performance goals, but all involve no changes at all to the original ASIC RTL.

An important manipulation of the design that Certify can perform without changing the RTL is to automatically add I/O pin multiplexing between FPGAs. This uses time-sharing of the wires between the FPGAs so that they carry two or more signals, thereby alleviating potential I/O bottlenecks which often arise at the partition boundaries.

Many of Certify's partitioning aids have been developed to meet the needs of users performing hundreds of partitioning projects within major Semiconductor and System labs since 1999.

RTL manipulation
ASIC designs often contain design features that are FPGA-Hostile. For example, ASIC designs are typically sprinkled with instantiations of elements or macros from ASIC technology libraries or macro generators. This leaves "black-box" holes in the RTL for which some functionality must be described in order to complete the FPGA implementation. Some of this functionality is provided automatically by Certify, which can extract the required RTL from the ASIC library itself. Synopsys DesignWare instantiations are dealt with automatically in a similar way.

Another FPGA-hostile facet associated with many ASIC designs is their complex clocking structures. Multiple clock domains, asynchronous parallel channels, and gated clocking trees will quickly overflow the global synchronous clock resources of even the largest FPGAs. Certify will automatically simplify gated or generated clock networks back to a common system clock and build the required enable signal to ensure equivalent functional behavior. An example of this is shown in Fig 1. The result is to match to the resources available inside the FPGA.

1. Automatic gated clock conversion.

Implementation and verification

Fast design iterations
Implementation is a critical phase in the FPGA prototyping flow. The partitioned design may undergo many iterations as bugs are discovered and fixed, or design blocks are tweaked and re-tweaked for higher performance.

It is very important to keep this iteration loop as short as possible. However, the combination of leading-edge ASIC designs in the multi-million system gate range, and stretch performance goals to model real-world operation, can lead to lengthy synthesis and place-and-route passes. The great advantage that FPGAs offer for debugging and design exploration begins to diminish when using traditional FPGA flows and large design sizes. The answer is to use incremental implementation methods.

Xilinx's ISE 9.2i design software offers a technology called SmartCompile which is ideal for the ASIC prototyping flow. Designed to speed up the implementation flow by 2-to-6 times versus traditional flows, SmartCompile is comprised of three components: SmartPreview, SmartGuide, and Partitions (not to be confused with multi-FPGA Partitioning).

SmartPreview allows you to halt the ISE tool flow in mid-stream to see how a particular implementation pass is proceeding. While halted, you can check key implementation information like the number of timing violations, the timing score, or the number of constraints met so far. You can even save the intermediate design and timing reports and create a bitstream for lab debug. If the implementation is proceeding as expected you can resume the pass; or you cancel a run that is not proceeding as planned, thereby saving valuable design time.

SmartGuide delivers automated incremental design to the FPGA design flow. SmartGuide can speed up the implementation phase by 2-to-6 times depending on design size and hierarchy setup.

With SmartGuide turned on, your first full implementation run is "guided", or marked for component and route placement. Let's say after a debug session you decide to make a design tweak and change one HDL source. As you re-enter the implementation flow, SmartGuide examines the hierarchy and identifies where the design needs to be re-implemented. Where possible, SmartGuide will reuse the placement and routing that didn't change from the prior implementation pass, thereby speeding up the re-implementation flow (sometimes dramatically).

Synplicity and Xilinx have collaborated closely, as part of our Timing Closure Task Force, to enforce name consistency in both tools. This means that names remain constant from run to run of the Synthesis and Place-and-Route, thus ensuring best possible guided flow results.

Some incremental tool flows can produce worse timing paths from having to route around "locked" modules, but SmartGuide has the ability to identify critical timing paths and – if necessary – free up portions of an otherwise unchanged module for re-implementation, emphasizing critical paths and keeping timing a priority.

The third component of SmartCompile is Partitions, which offers the ability to completely lock down a completed module's placement and routing. In this way, a debugged "known-good" module or piece of purchased IP can be implemented and then set aside while you concentrate on debugging your other modules, while still enjoying the benefits of an incremental implementation flow. Partitions can be locked down early in the tool flow by using Compile Point Technology within the Synplify Pro and Synplify Premier synthesis tools. The partition information is automatically passed on to ISE.

All the components of SmartCompile work directly with either Xilinx or Synplicity synthesis and can cut the implementation flow for large designs by between 2 and 6 times. SmartCompile delivers more time for critical module debug, thereby freeing the engineer from watching lengthy and cryptic synthesis and place-and-route runs.

This is typically where the majority of your time is spent on a prototyping project. The ability to debug any portion of the design quickly and accurately is critical to project success. Let us consider two methods for embedding Virtual Logic Analyzers into the design so as to allow logic and embedded software designers to debug their FPGAs in real time. The two methods are Chipscope Pro from Xilinx and Identify Pro with TotalRecall Technology from Synplicity.

ChipScope Pro
With the ChipScope Pro system – which is available as a separately purchased option to Xilinx ISE software – design problems can be quickly found while the chip is running on the board and interacting with the rest of the system. Then, leveraging the FPGA's re-programmability, design changes can be quickly implemented and sent back to the device on board in a matter of minutes through the FPGA programming cable.

The ChipScope Pro package of tools includes a set of configurable and synthesizable software debug cores that are either instantiated into your FPGA design during HDL capture or inserted directly into the project netlist (Fig 2). Following implementation, using these cores you can view any internal signal within the FPGA.

2. ChipScope Pro core insertion options.

Signals are captured at or near operating system speed and then brought out through the programming interface, thereby freeing up pins for your design as opposed to gobbling them up for debug. And ChipScope Pro is one of the only tools that allow you to change probe points without having to re-synthesize and re-route your design. Using the ISE FPGA Editor, you can change signal probe points and then quickly reprogram your FPGA and debug a whole new set of signals in a matter of minutes.

You can analyze captured signals through the ChipScope Pro software logic analyzer. This is an advanced display and debug tool that makes logic and embedded bus analysis easy. The ChipScope Pro logic analyzer supports multiple window views, and bus plotting can be in either data-versus-time or data-versus-data formats. Capture mode lets you compare data captured after multiple trigger events; meanwhile signal filtering lets you ignore data that's not critical to your analysis, thereby saving you memory and analysis time. Using the listing viewer, you can import bus token files and view instructions in the order they occur.

To facilitate processor system debug environments that use software debuggers in addition to ChipScope Pro tools, you can share the JTAG connection to the FPGA with the ChipScope Pro analyzer.

In addition to providing data capture capabilities, the ChipScope Pro system also includes the Virtual I/O console, the interface to the industry's first real-time virtual input/output core. Through the Virtual I/O console, you can set virtual inputs and pulse trains and view output activity.

ChipScope Pro tools can run in server/client mode over a TCP/IP connection. You can sit in your office while debugging a board next door in the lab or on the other side of the world. You can share a single prototyping board in the lab with other debug engineers on your team.

Identify RTL Debugger
Going beyond beyond the functionality of ChipScope Pro, Synplicity's Identify tool makes is possible to perform the on-chip debug at multiple hierarchical points within the RTL source and to do this without altering the source at all. Identify uses an automated instrumentation technique in order to create and attach sampling, trigger and communication logic into each FPGA forming the prototype as required.

Waveform views such as those seen in the ChipScope Pro logic analyzer are possible, but a significant added bonus is that the samples and triggers are overlaid onto the RTL source code using the same symbolic names as in the RTL. Thus, for example, it is possible to see the actual value of an enumerated type in which a state machine is captured on the FPGA. Triggers may be set in a similar way, using the source name-space. A unique benefit is that triggers can be set for when a particular line of RTL is reached, much like a software engineer would set breakpoints in a program.

Identify Pro with TotalRecall Technology
Newly available in a superset of Identify – called Identify Pro – is Synplicity's TotalRecall full visibility debug technology (see also Programmable Logic DesignLine article 196801895 titled How to achieve 100% visibility with FPGA-based ASIC prototypes running at real-time speeds).

Whereas ChipScope Pro and Identify rely on sample points to be set in advance, Identify Pro with TotalRecall provides visibility into the entire design. This allows the automatic extraction of a testcase from the FPGA and rerun in a standalone simulator. Upon a trigger occurring in the FPGA, the full status of the module under test (not just certain sample points) is captured as it was many thousands of clocks BEFORE the trigger occurred.

This module state is extracted and converted for use in the users' own simulator. A significant advantage is that the testbench for the simulator is also extracted from the FPGA, so that the module under test is re-stimulated with the actual inputs it had received from the point at which the module's state was captured, right up to the point at which the trigger occurred.

Once in the simulator, the full suite of analysis tools including single-step, force, freeze etc. can be brought to bear. The prototype is freed up for further verification task while the simulation takes place.

Summary – putting it all together
In conclusion, significant and continual progress in FPGA devices, Synthesis, Place-and-Route tools, and debug capabilities has made FPGA prototyping much more accessible and useful to ASIC verification teams than ever before.

Observability and controllability have been added to the already unchallenged superior performance of FPGA prototypes to offer cheap, fast platforms for RTL debug and software integration into real world test environments. If you are ready to explore the benefits of FPGA Prototyping, Synplicity and Xilinx are ready to help you to prototype your project.

Xilinx offers the most advanced silicon, software, and support available in FPGA prototyping, while Synplicity has created a complete prototyping environment – called the Confirma Platform – which includes leading prototyping hardware based on Virtex-5 FPGAs and the EDA tools mentioned in this article. For more information, go to or



No comments: