Off-the-shelf data acquisition and control cards with an onboard DSP CPU often perform at only a fraction of their benchmark-based expectations. Increasing the number of channels, or increasing the sampling rate, begins to show signs of DSP CPU overload well before reaching full channel capacity.
Present-day market pressures are forcing data acquisition and control card providers to increase the number of channels on a data acquisition card, increase the sampling rate, and at the same time, reduce cost. Further, the actual function of the card is determined either during manufacturing or configured in-system. Sometimes designers expect the same card to perform different functions at different slots on the same system.
The use of a DSP CPU onboard, while providing the flexibility of hardware reuse across a wide range of applications, also becomes a bottleneck for board-level performance.
In this article, Shyam Chandra discusses a proposed architecture that would provide a low-cost, high-performance solution.
DSP CPU efficiency decreases with increased channel loading
As the number of channels increases, the load on the DSP engine also increases. This is due to the serial processing nature of current DSP CPUs, which are fetch/execute engines. In most applications, front-end preprocessing tasks, such as offset shifting, gain adjustment, preliminary filtering, etc., consume most of the DSP CPU s bandwidth. Developing this code is also difficult and time consuming because, for efficiency reasons, it is usually already in assembly language. Even though DMA (Direct Memory Access) transfers samples from each channel to different memory locations, CPU performance suffers due to reduced availability of memory bandwidth. The faster the sample rate, the less time is available for the processor to perform the actual DSP functions such as signal analysis, video processing, compression, etc.
Since the DSP processor must switch contexts from one channel to the next, the cache thrashing further reduces the available bus bandwidth. In addition to processing the data, the DSP CPU must also manage data buffering, movement, etc., and transfer the processed data to the host processor through the backplane, backplane protocol management, and so forth.
To satisfy the processing demands of an increased number of channels, the DSP CPU performance should increase exponentially. Increasing both the processor operating frequency and/or using a more powerful processor mitigates the demand for increased DSP performance. Either way, the result is a much higher board cost.
This proposed architecture enables an economical increase in the number of channels and the sample rate per channel for a given DSP CPU. Using low-cost FPGAs, such as the LatticeECP and LatticeEC, as a coprocessor (offload engine) to the main DSP CPU achieves this goal by minimizing the front-end preprocessing as well as the non-DSP operational load. he resulting design continues to remain flexible to address a wide variety of applications. As these FPGA devices provide ample local storage, it is possible to realize a programmable data-flow architecture that further enhances performance. Data-flow architecture helps the DSP perform its computation once all the operands are available, instead of performing the computation sequentially as the operands arrive.
The following section describes two approaches to implementing data acquisition. The first method uses an FPGA to interface with an ADC bank, generate digital I/O interface, and manage the data transfer between the ADC and DAC and the SDRAM memory. The second method uses an FPGA with DSP math processing abilities not only to interface the CPU bus to the ADC and DAC, but also to preprocess the acquired digital samples, as would a DSP coprocessor.
Off-the-shelf data acquisition and control cards with an onboard DSP CPU often perform at only a fraction of their benchmark-based expectations. Increasing the number of channels, or increasing the sampling rate, begins to show signs of DSP CPU overload well before reaching full channel capacity.
Present-day market pressures are forcing data acquisition and control card providers to increase the number of channels on a data acquisition card, increase the sampling rate, and at the same time, reduce cost. Further, the actual function of the card is determined either during manufacturing or configured in-system. Sometimes designers expect the same card to perform different functions at different slots on the same system.
The use of a DSP CPU onboard, while providing the flexibility of hardware reuse across a wide range of applications, also becomes a bottleneck for board-level performance.
In this article, Shyam Chandra discusses a proposed architecture that would provide a low-cost, high-performance solution.
DSP CPU efficiency decreases with increased channel loading
As the number of channels increases, the load on the DSP engine also increases. This is due to the serial processing nature of current DSP CPUs, which are fetch/execute engines. In most applications, front-end preprocessing tasks, such as offset...
More >>