"Designers have been looking for the construction method of system architecture to provide the best computing solution that can meet all application requirements. In many cases, this optimal solution often requires the use of field programmable gate array (FPGA), but unfortunately, many designers are not familiar with the functions of these devices and how to integrate them.
This article will briefly describe the design scenarios that can benefit from using FPGA. Then, after explaining the basic working principle, some interesting FPGA solutions and development kits will be introduced.
Why use FPGA?
Computing applications are diverse, and the best method to meet application requirements may vary from application to application, including off the shelf microprocessors (MPUs) and microcontrollers (MCUs), off the shelf graphics processing units (GPUs), FPGAs, and custom system on chip (SOC) devices. In order to determine which approach to use, application requirements and considerations need to be carefully examined.
For example, when studying cutting-edge technologies such as 5g base stations, designers need to consider that the basic standards and protocols are still developing. This means that designers need to be able to respond quickly and effectively to any specification changes beyond their control.
Similarly, they need to be able to respond flexibly to future standard and protocol changes after the system is deployed to the site. In addition, they must be able to respond to unexpected errors in system functions or vulnerabilities in system security, modify existing functions or add new functions, so as to prolong the service life of the system.
Although the highest performance is usually provided by SOC, this method is expensive and time-consuming. In addition, any algorithm implemented in the chip structure is essentially "frozen in silicon". In view of the above considerations, this inherent inflexibility becomes a problem. In order to find the optimal balance between high performance and flexibility, an alternative route is needed. This route is often provided by the combination of FPGA, microprocessor / microcontroller and FPGA, or FPGA with hardware processor core as part of the structure.
What is FPGA?
This is a difficult question to answer, because FPGA is different for different people. Moreover, there are many types of FPGA, and each type has different capabilities and function combinations.
The programmable structure is the core of any FPGA (i.e. the defining aspect of "FPGA DOM") and is presented in the form of programmable logic block array (Fig. 1a). Each logical block is a collection of multiple elements, including lookup tables (LUTS), multiplexers, and registers, all of which can be configured (programmed) to perform operations as needed (Figure 2).
Figure 1: the simplest FPGA only includes programmable structure and configurable GPIO (a). Different architectures are formed by adding other components to this basic structure: SRAM block, PLL and clock manager (b), DSP block and SerDes interface (c), as well as hardware processor core and peripherals (d)( Image source: Max Maxfield)
Figure 2: each PLB is a collection of multiple elements, including lookup tables, multiplexers, and registers, all of which can be configured (programmed) to perform operations as needed( Image source: Max Maxfield)
Many FPGAs use 4-input LUTS and can be configured to implement any 4-input logic function. In order to better support the wide data path adopted by some applications, some FPGAs provide 6-input, 7-input or even 8-Input LUTS. The output of the LUT is directly connected to one of the logic block outputs and one of the multiplexer inputs. The other input of the multiplexer is directly connected to the logic block input (E). The multiplexer can be configured to select one of the inputs.
The output of the multiplexer is fed into the register input. Each register can be configured as an edge triggered trigger or level sensitive latch (however, it is not recommended to use asynchronous logic in the form of latch inside the FPGA). The clock (or enable signal) of each register can be configured as high-level active or low-level active. Similarly, the effective level of the set / reset input is configurable.
These logic blocks can be regarded as "programmable logic islands" floating in the "sea of programmable interconnection". This interconnection may be configured to connect any output of any logical block to any input of other logical blocks. Similarly, the main input of FPGA can be connected to the input of any logic block, and the output of any logic block can be used to drive the main output of the device.
The main general purpose input / output (GPIO) is presented in groups, which can be configured to support different interface standards, such as lvcmos, LVDS, LVTTL, HSTL or SSTL. In addition, the input impedance is also configurable, and the output slew rate is also configurable.
The FPGA structure can be further extended to include SRAM blocks (called block RAM (Bram)), phase locked loops (PLLs), and clock managers (Fig. 1b). In addition, digital signal processing (DSP) blocks (DSP slices) can be added. They include configurable multipliers and configurable adders that can perform multiplication accumulation (MAC) operations (Fig. 1c).
High speed SerDes block is another common feature of FPGA, which can support Gigabit serial interface. It must be noted that not all FPGAs support all of the above features. Different FPGAs provide different feature sets for different markets and applications.
The programmable structure in FPGA can be used to realize any required logic function or function set, up to the processor core or even multiple cores. If these kernels are implemented in a programmable structure, they are called "soft kernels". In contrast, some FPGAs (commonly referred to as SOC FPGAs) contain one or more "hard core" processors, which are implemented directly in silicon (Fig. 1D). These hard processor cores may include floating point units (fpus) and L1 / L2 caches.
Similarly, peripheral interface functions (such as can, I2C, SPI, UART and USB) can be implemented as soft cores in programmable structures, but many FPGAs implement them in silicon as hard cores. The communication between processor core, interface function and programmable structure is usually realized by high-speed buses such as AMBA and Axi.
The first batch of FPGAs were launched by Xilinx in 1985 and only included an 8 x 8 programmable logic block array (without ram block, DSP block, etc.). In contrast, today's high-end FPGAs can contain hundreds of thousands of logic blocks, thousands of DSP blocks and megabit (MB) ram. Overall, they may contain billions of transistors, equivalent to tens of millions of equivalent gates (such as 2-input NAND gates).
Alternative configuration technology
In order to determine the function of the logic block and the wiring of the interconnection, it is necessary to use the configuration unit, which can be vividly represented by a 0 / 1 (off / on) switch. These units are also used to configure GPIO interface standards, input impedance, output slew rate, etc. According to the specific FPGA, these configuration units can be realized by one of the following three technologies:
Anti fuse: these configuration units are one-time programmable (OTP) units, which means that once the device is programmed, it cannot be withdrawn. Such devices are often limited to space and high security applications. Its sales volume is very small, so the price is very high, which can be described as an expensive design choice.
Flash memory: like anti fuse based configuration units, flash based units are nonvolatile. Unlike the anti fuse unit, the flash unit can be reprogrammed as needed. The flash memory configuration unit can withstand radiation, so these devices are suitable for Space Applications (but the upper metallization layer and package need to be modified).
SRAM: when this method is adopted, the configuration data is stored in the external memory. Each time the FPGA is powered on, the data shall be loaded from the memory (or in the case of dynamic configuration, the data shall be loaded according to the instruction requirements).
The advantage of FPGA based on anti fuse or flash memory for configuration unit is that they are "instant on" with low power consumption. One disadvantage of these technologies is that they require other processing steps in addition to the basic CMOS process used to create the rest of the chip.
For the FPGA based on SRAM technology, the advantage is that it is manufactured using the same CMOS process as the rest of the chip and has higher performance, because it is usually one or two generations ahead of anti fuse and flash memory technology. The main disadvantage is that the SRAM configuration unit consumes more power than the anti fuse and flash memory unit (of the same technology node), and is prone to single event flip (SEU) due to radiation.
For a long time, the latter disadvantage has led to the fact that FPGA based on SRAM is considered not suitable for aerospace applications. Recently, the industry has adopted a special mitigation strategy, so that the FPGA based on SRAM and the FPGA based on flash memory appear on the Mars rover curiosity and other systems.
Provide flexibility with FPGA
FPGA is suitable for a variety of applications, especially for intelligent interface function, motor control, algorithm acceleration and high-performance computing (HPC), image and video processing, machine vision, artificial intelligence (AI), machine learning (ML), deep learning (DL), radar, beamforming, base station and communication.
A simple example is to provide an intelligent interface between other devices using different interface standards or communication protocols. Consider an existing system in which an application processor is connected to camera sensors and display devices using an old-fashioned interface (Figure 3a).
Figure 3: FPGA can be used to provide intelligent interface between other devices using different interface standards or communication protocols, so as to prolong the life of existing designs based on legacy devices( Image source: Max Maxfield)
Now, suppose the creator of the system wants to upgrade camera sensors and display devices to modern products that are lighter, cheaper and lower power consumption. The only problem is that two new peripherals or one of them may use modern interface standards, but the original application processor (AP) cannot provide support. Alternatively, they may support completely different communication protocols, such as the mobile industry processor interface (Mipi). In this case, FPGA supporting multiple I / O standards, coupled with some soft Mipi IP cores, will provide a fast, low-cost and risk-free upgrade path (Fig. 3b).
As another application example, consider some computationally intensive tasks, such as signal processing required for radar systems or beamforming in communication base stations. Conventional processors based on von Neumann or Harvard architecture are very suitable for some tasks, but they are not suitable for tasks that need to repeat the same sequence of operations. This is because a single processor kernel running a single thread can execute only one instruction at a time (Figure 4a).
Figure 4: microprocessor can only execute one instruction at a time (sequentially). Unlike this, multiple function blocks in FPGA can be executed simultaneously (concurrently). In addition, FPGA can implement appropriate algorithms in large-scale parallel( Image source: Max Maxfield)
In contrast, FPGA can perform multiple functions at the same time, supporting a series of operations in a pipelined manner, so as to achieve greater throughput. Similarly, the FPGA does not perform the same operation as the processor, such as performing 1000 operations on 1000 pairs of data values, but instantiates 1000 adders in the programmable structure to perform the same calculation in a large-scale parallel manner in a single clock cycle (Fig. 4b).
Which manufacturers make FPGAs?
This is an evolving picture. There are two major manufacturers of high-end devices with the highest capacity and performance, namely Intel (which acquired Altera) and Xilinx。
Intel and Xilinx offer a wide range of products from low-end FPGA to high-end SOC FPGA. Another vendor that focuses almost entirely on FPGA is Lattice semiconductor, which is aimed at medium and low-end applications. Last but not least Microchip technology (through the acquisition of Actel, ATMEL and MICROSEMI) can now provide a variety of small and medium-sized FPGA and low-end SOC FPGA products.
Due to the large number of product families, each family provides different resources, performance, capacity and packaging style, so selecting the best device for the immediate task can be difficult. Here are some examples: Intel devices; Lattice semiconductor device; as well as Xilinx device.
How to design with FPGA?
The traditional FPGA design method is that engineers use hardware description languages such as Verilog or VHDL to capture the design intent. Firstly, these descriptions can be simulated to verify whether they meet the requirements, and then transmitted to the synthesis tool to generate the configuration file for configuring (programming) FPGA.
Each FPGA supplier either has its own internally developed tool chain or provides a tool version customized by professional suppliers. In either case, these tools are available from the FPGA vendor's website. In addition, mature tool suites may be available in free or low-cost versions.
To make FPGA easier for software developers, some FPGA vendors now provide advanced synthesis (HLS) tools. These tools will parse the algorithm description of the desired behavior captured in high-level abstraction in C, C + + or OpenCL, and generate input to provide to the lower level synthesis engine.
For designers who want to get started, there are many development and evaluation boards available, each of which provides different functions and features. Here are three examples: dfrobot of DFR0600 Development kit with Xilinx zynq-7000 SOC FPGA; Terasic Inc. of De10 nano with Intel's cyclone V SOC FPGA; ICE40HX1K-STICK-EVN Evaluation board with lattice semiconductor's low-power ice40 FPGA.
If designers plan to use FPGA based PCIe daughter card to accelerate applications running on X86 motherboard, they can pay attention to Alveo PCIe daughter card and other products are also provided by Xilinx
Our other product: