Monday, January 28, 2008

Tackle team-based FPGA design

Tackle team-based FPGA design

You see them at almost every user seminar or industry trade show workshop: the Methodology Managers from XYZ Corporation, who describe the system they use to help the company make sense of the intellectual property (IP) produced by their design teams. And it's got to be a daunting task--development staff working in different time zones; language barriers; software tool versions to track and synchronize; VHDL/Verilog/C/C++, CAD databases. All of these (and more) diminish the reusability of any design module. You can respect the role these managers play and envy the sanity they bring to the organization.
But XYZ Corporation is a large company. How does a small or midsize firm like yours grapple with this imposing housekeeping chore?

Distributed design teams happen. With the right mix of specialty tools and culture, it's becoming more practical to create and manage distributed FPGA design teams by using modular FPGA design methods that allow multiple designers to work on parts of a single design independently.

Collaborative design
The benefits of collaboration are intuitive enough. Use the best people for the job, no matter where they are. Assign more people to the project when the schedule or feature set isn't flexible (and they seldom are). And, ideally, a natural and beneficial consequence of a partitioned design task is that FPGA building blocks (or IP modules) are created that can be reused for enhanced for derivative products.

Sometimes collaboration is imperative, if only because the one person knowledgeable about a particular task is in a different time zone from everyone else. Just as software teams might use a driver or GUI specialist, FPGA design teams might include a high-speed I/O specialist, a DSP designer, and a guru who knows the tools and the ins and outs of timing closure.

The scale of the designs intended for high-capacity FPGAs creates a formidable, likely impossible, demand for one or two design engineers to deliver the required content. It's often necessary to assign multiple engineers to the job.

FPGA design tools haven't provided much support for partitioning design work across multiple engineers until recently. Advances in FPGA design planning and place-and-route (PAR) tools, however, now help support a modular design style and complement synthesis and simulation tools that have traditionally supported design teams. Contemporary FPGA design tools are more aware of modular IP, can accommodate distributed development, and are focused on high-capacity devices.

Modular FPGA design
Modular FPGA design is an approach that enables multiple designers to work independently on parts of a single design. In this approach, a lead designer creates the top-level design; the rest of the design team works on constituent designs that will be merged into one cohesive design in the final assembly stage.

All major FPGA vendors now support a modular implementation strategy. Altera's LogicLock, Lattice Semiconductor's Block Modular Design method, and Xilinx's Modular Design Flow all provide strategies and tools that support partitioning, independent implementation, and assembly of design modules.1, 2, 3

Figure 1 illustrates an FPGA design partitioned across a team. The convention for most FPGA tools today is to allocate a branch of the design hierarchy to a team member, along with some "budget" for timing and device resources. That engineer can then establish a logical user hierarchy to whatever degree is appropriate for that design module.

Device-resource budgeting is a recent advancement in FPGA implementation tools that makes it possible for a subset of the design to be implemented independently of other design modules. A distinct advantage of this block-style flow is the ability to update the logic in any of the blocks while preserving the placement and performance of the surrounding blocks. The block-style flow enables the design tools to focus on new blocks or those that are changed. Being able to isolate and work on individual blocks shortens potentially long run times since, given a smaller design problem, PAR tools will typically produce better results sooner. Individual blocks can be implemented and updated separately, enabling quicker iterations and a more rapid path to design closure. In some cases, blocks can be reused in other designs, further leveraging resources and shortening those design cycles. From the perspective of a team collaborating on an embedded system, this makes future revisions of the platform more straightforward and faster to expand or modify.

Team-based design naturally works best for larger designs that can be easily partitioned into self-contained regions of the chip. Thorough preliminary planning, iterative experiments, and explicit, deliberate, and clear communication among design team members are all essential to ensure the partitioned designs work together in the final assembly step of the process.

Of course, partitioning the design presents challenges as well as benefits. For example, if the design architect's initial floor plan doesn't budget enough resources for a certain module, some sort of refinement loop is needed so a team member can negotiate for more room. Timing objectives also are likely to be handed down to team members and may not be as easily met as those for a design that is not floor-planned. The design architect must account for the availability of features such as high-performance embedded memories, DSP functions, and high fan-out routing spines because these anchor and size regions for each block. Given that contemporary FPGA architectures are a mixture of programmable fabric and rows of embedded functions, it can be awkward to create a block diagram that nicely accommodates the design logic. In this regard, modular design both benefits and suffers from a handmade floor plan where placement algorithms are constrained by the borders of each block.

What follows is a typical procedure for modular FPGA design that's common among the leading FPGA vendors. Figure 2 illustrates a typical data flow in a modular FPGA design

1. Partition the top-level design and synthesize design modules

Create a top-level design, along with constituent design modules, in HDL. The top-level design serves as the design documentation and is also the first opportunity to influence the performance results of the design. By using FPGA-friendly design-partitioning principles, you can dramatically reduce overall design time by simplifying the coding, synthesis, simulation, floor-planning, and optimization phases of the design. Here good guidelines include:

• Sub-blocks should be synchronous with registered outputs. Registering outputs helps the synthesis tool implement the combinatorial logic and registers into the same logic block. Registering outputs also makes the application of timing constraints easier, since it eliminates potential problems with logic optimization across design boundaries.

• Related combinatorial and arithmetic logic should be collected into the same design module. Keeping related combinatorial terms and arithmetic in the same design module allows sharing of logic hardware resources. It also allows a synthesis tool to optimize the entire critical path in a single operation.

• Separate logic with different optimization goals. Separating critical paths from non-critical paths will make logic synthesis more efficient. If one portion of a design module needs to be optimized for area and a second portion needs to be optimized for speed, those portions should be separated into two design modules.

As this top-level partitioning is underway, team members are also (in theory) modeling design modules. The order of operation isn't so important here, as long as each design module will eventually comply with the architect's top-level connections and organization. Even without a clear picture of the FPGA resources needed to implement all design modules, the team at least has the luxury of simulating and verifying the function of any combination of design modules by mixing register-transfer-level (RTL) logic with gate-level logic.

2. Create a floorplan, place blocks, and optimize

This step is the most critical one in the process. It includes area budgeting and reserving space in the top-level design for constituent blocks, determining I/O for each block along with the device's external I/Os, and determining the position of each block relative to the others.

FPGA design tools ease block floor planning with the following utilities:

• A schematic view of the register-transfer-level description (RTL) will help you view the data paths of the design along with the relative order of design modules. This view makes it obvious where blocks should be located relative to external I/Os.

• An abstract graphical floor plan view is used to size and anchor blocks. Once blocks are defined to hold one or more design modules, logical connections can be shown as "flywires" superimposed on the device resource floor plan. Heavily interconnected blocks will be placed adjacent to one another, while covering enough physical resources to accommodate the design logic.

Some systems allow you to direct which side of the block the logical interconnect will enter or exit to help define the data flow. Commonly recommended guidelines for this step include:

• Lock global logic resources like PLL/DLL-driven clocks and the external I/O plan. The floor plan should also define optimal positions for global logic such as clock drivers (whether they are programmable I/Os or embedded PLL/DLL) and external signals, with an eye toward signal types and the device's package organization. FPGAs may provide specialized I/O drivers for double-data-rate (DDR) or serializer/deserializer (SERDES) interfaces at specific locations of the device package, so you must account for these locations in the block floor plan.

• Define design module timing objectives. Performance objectives are defined by assigning physical and timing preferences in the respective FPGA synthesis tool.

• Place and orient blocks. At this point, you will budget device resources such as lookup tables (LUTs), registers, and memory or DSP elements by reserving a region (or block) of the floor plan. The resources allocated can be handled in a top-down manner based on rough estimates, or bottom-up based on the actual results produced by logic synthesis of a block. It's essential that the block's region is large enough to accommodate the design module's logic and allow for adequate I/O resources. The relative position and orientation of the blocks will depend largely on the device's internal interconnect. Modern FPGA tools provide floor-planning utilities to help visualize block interconnect and the physical resources available.

Ideally, this step is done in parallel with logic synthesis to help size the regions. This is almost always an iterative process because it's unlikely you'll get the floor plan right the first time. Some FPGA systems will automatically size regions based on logic consumption and allow blocks to include or exclude other logic, depending on resource needs.

3. Block-level PAR

This step implements each block with the top-level design constraints applied. This step must be completed before final assembly can be performed and is done in parallel with Step 2 but requires that the top-level floor plan with region constraints be completed.

Successful block implementation will depend largely on the preferences assigned for area budgeting and reservation and I/O placement determined in the previous step. If incorrect, repeat Steps 3 and 4.

The HDL design files for each block are generally synthesized into Electronic Design Interchange Format (EDIF) netlists. The FPGA software then imports the EDIF files. This step is required for all team members. The synthesis can be performed in any order.

4. Top-level assembly

In this final step, merge all the blocks into one cohesive design. The top-level design file must be configured and all blocks implemented before the design can be assembled. Successful assembly depends primarily upon the decisions made in Step 3 and the successful implementation of all constituent blocks.

Design example
This design example illustrates the application of modular FPGA design by a development team charged to implement a large communications design. The modular approach was an attractive means to partition, implement, and stabilize a significant portion of the design.

For the team creating a reference design demonstrating an orthogonal frequency-division multiplexing (OFDM) application, the objective was to establish timing and resource use for a major design module, which served as the Viterbi algorithm. The design module was targeted for a particular block/region of the floor plan composed of programmable function units (PFUs) and embedded block RAM (EBR). The team observed that overall PAR time decreased significantly versus the entire flattened design. In this case, the FPGA implementation tool treated block region resources as sharable, which allowed the top-level PAR phase to take advantage of the unused resources of the block region allocated for the design block. This was useful because the Viterbi module relied on both EBRs and a number of PFUs, but a side effect of rectangular block regions meant the architect had to enclose more EBRs in the block region than it practically needed. At assembly time, however, a top-level FFT block was able to use the excess EBRs.

The moral of the story for this team's partition effort was not to make more blocks than absolutely necessary in order to avoid over-constraining the layout. Since only one block was defined, the data path was not locked and the remaining "floating" logic naturally landed next to related logic of the block.

This project was a success story for the modular design technique. The team concluded that modular design would be highly useful when using IP blocks for the sake of standalone PAR. Modular design permitted timing closure of the Viterbi block in one step, and then the remainder of the design logic during assembly, leaving PAR results of the Viterbi block stable.

Going modular
New modular FPGA design techniques provide major advantages to distributed design teams. Portions of an entire design can be approached independently, allowing multiple designers to work in parallel. Working in parallel enables the application of additional resources, as necessary, to particular design modules.

Functional modules can be analyzed separately. This affords you a better vantage point from which to debug or enhance designs, because design problems can be traced to a specific portion of the design.

The timing of each constituent functional module is preserved because each module can be assigned to a particular region on the device, and the tools are constrained to use resources from that region.

The modular flow can be used for performance optimization and preservation: it can be used to place modules into regions in a device's floor plan. Because modular assignments are generally hierarchical, teams have more control over the placement and performance of modules and groups of modules. Typically, a block's region size, state, width, height, and origin can be modified.

All the leading FPGA vendors provide some type of a modular design method that follows a typical procedure of partitioning, floor planning, block implementation, and assembly.

Troy Scott has been helping design, document, test, and promote EDA products for about 14 years. He is a product marketing engineer at Lattice Semiconductor Corporation. Troy holds a BSCE from Oregon Institute of Technology and a Graduate Certificate in Computer Architecture and Design from Portland State University. He welcomes feedback and can be reached at

Endnotes:1. Altera's LogicLock:
2. Lattice Semiconductor's Block Modular Design method:
3. Xilinx's Modular Design Flow:

No comments: