Typical characteristics
Digital signal processing algorithms typically require a large number of mathematical operations
to be performed quickly on a set of data. Signals are converted
from analog to digital, manipulated digitally, and then converted
again to analog form, as diagrammed below. Many DSP applications
have constraints on latency; that is, for the system to work, the DSP operation must be
completed within some time constraint.

A simple digital processing system
Most general-purpose microprocessors and operating systems can
execute DSP algorithms successfully. But these microprocessors are
not suitable for application of mobile telephone and pocket PDA
systems etc. because of power supply and space limit. A specialized
digital signal processor, however, will tend to provide a
lower-cost solution, with better performance and lower latency.
The architecture of a digital signal processor is optimized
specifically for digital signal processing work. Some useful
features for optimizing DSP algorithms are outlined below.
Architecture
- Hardware modulo addressing, allowing circular buffers to be implemented without having to constantly test for
wrapping.
- A memory architecture designed for streaming data, using DMA extensively.
- Separate program and data memories (Harvard architecture)
- Special SIMD (single instruction, multiple data) operations
- Special arithmetic operations, such as fast multiply-accumulates (MACs). Many fundamental DSP algorithms, such as FIR filters or the Fast Fourier transform (FFT) depend heavily on multiply-accumulate performance.
- Bit-reversed addressing, a special addressing mode useful only for calculating FFTs
- Deliberate exclusion of a memory management unit. DSPs frequently use multi-tasking operating systems, but have no
support for virtual memory or memory protection. Operating systems that use virtual
memory require more time for context switching among processes, which increases latency.
Program flow
Memory architecture
- DSPs often use special memory architectures that are able to fetch
multiple data and/or instructions at the same time:
- Use of direct memory access
- Memory-address calculation unit
Data operations
- Saturation arithmetic, in which operations that produce overflows will accumulate at the
maximum (or minimum) values that the register can hold rather than
wrapping around (maximum+1 doesn't overflow to minimum as in many
general-purpose CPUs, instead it stays at maximum). Sometimes
various sticky bits operation modes are available.
- Fixed-point arithmetic is often used to speed up arithmetic processing
- Single-cycle operations to increase the benefits of pipelining
Instruction sets
History
Prior to the advent of stand-alone DSP chips discussed below, most
DSP applications were implemented using bit slice processors. The
AMD2901 bit slice chip with its family of components was a very
popular choice. There were reference designs from AMD, but very
often the specifics of a particular design were application
specific. These bit slice architecture would sometimes include a
peripheral multiplier chip. Examples of these multipliers were a
series from TRW including the TRW1008 and TRW1010, some of which
included an accumulator, providing the requisite
multiply-accumulate (MAC) function.
In 1978, Intel released the 2920 as an "analog signal processor".
It had an on-chip ADC/DAC with an internal signal processor, but it
didn't have a hardware multiplier and was not successful in the
market. In 1979, AMI released the S2811. It was designed as a microprocessor peripheral, and it had to be
initialized by the host. The S2811 was likewise not successful in
the market.
In 1980 the first stand-alone, complete DSPs – the NEC µPD7720 and AT&T DSP1 – were presented at the IEEE International Solid-State Circuits Conference '80. Both processors were inspired by the research
inPSTN telecommunications.
The Altamira DX-1 was another early DSP, utilizing quad integer
pipelines with delayed branches and branch prediction.
The first DSP produced by Texas Instruments (TI), the TMS32010 presented in 1983, proved to be an even bigger success. It
was based on the Harvard architecture, and so had separate
instruction and data memory. It already had a special instruction
set, with instructions like load-and-accumulate or
multiply-and-accumulate. It could work on 16-bit numbers and needed
390ns for a multiply-add operation. TI is now the market leader in
general-purpose DSPs. Another successful design was the Motorola 56000.
About five years later, the second generation of DSPs began to
spread. They had 3 memories for storing two operands simultaneously
and included hardware to accelerate tight loops, they also had an addressing unit capable of loop-addressing. Some of them operated on 24-bit variables and a typical model
only required about 21ns for a MAC (multiply-accumulate). Members
of this generation were for example the AT&T DSP16A or the
Motorola DSP56001.
The main improvement in the third generation was the appearance of
application-specific units and instructions in the data path, or
sometimes as coprocessors. These units allowed direct hardware
acceleration of very specific but complex mathematical problems,
like the Fourier-transform or matrix operations. Some chips, like
the Motorola MC68356, even included more than one processor core to
work in parallel. Other DSPs from 1995 are the TI TMS320C541 or the
TMS 320C80.
The fourth generation is best characterized by the changes in the
instruction set and the instruction encoding/decoding. SIMD and MMX
extensions were added, VLIW and the superscalar architecture
appeared. As always, the clock-speeds have increased, a 3ns MAC now
became possible.
Modern DSPs
Modern signal processors yield greater performance. This is due in
part to both technological and architectural advancements like
lower design rules, fast-access two-level cache, (E)DMA circuit and a wider bus system. Of course, not all DSPs
provide the same speed and many kinds of signal processors exist,
each one of them being better suited for a specific task, ranging
in price from about US$1.50 to US$300. A Texas Instruments C6000 series DSP clocks at 1.2 GHz and implements separate
instruction and data caches as well as an 8 MiB 2nd level cache,
and its I/O speed is rapid thanks to its 64 EDMA channels. The top
models are capable of as many as 8000 MIPS (million instructions per second), use VLIW (very long instruction word) encoding, perform eight operations per clock-cycle and are
compatible with a broad range of external peripherals and various
buses (PCI/serial/etc).
Another player at the high-end signal processor manufacturer today
is Freescale. The company provides a multi-core DSPs family MSC81xx. The
MSC81xx is based on StarCore Architecture processors. The latest
MSC8144 DSP combines four programmable SC3400 StarCore DSP cores.
Each SC3400 StarCore DSP core runs at 1 GHz. The SC3400 performed
higher than any other programmable DSP at 1 GHz on BDTIsimMark2000
results published by Berkeley Design Technology, Inc. (BDTI).
Another major signal processor manufacturer today is Analog Devices. The company provides a broad range of DSPs, but its main
portfolio is multimedia processors, such as codecs, filters and
digital-analog converters. Its SHARC-based processors range in performance from 66 MHz/198 MFLOPS (million floating-point operations per second) to 400
MHz/2400MFLOPS. Some models even support multiple multipliers andALUs, SIMD instructions and audio processing-specific components and
peripherals. Another product of the company is the Blackfin family of embedded digital signal processors, with models
like the ADSP-BF531 to ADSP-BF536. These processors combine the
features of a DSP with those of a general use processor. As a
result, these processors can run simple operating systems like μCLinux, velOSity and Nucleus RTOSwhile operating relatively efficiently on real-time data.
Most DSPs use fixed-point arithmetic, because in real world signal processing the additional range
provided by floating point is not needed, and there is a large
speed benefit and cost benefit due to reduced hardware complexity.
Floating point DSPs may be invaluable in applications where a wide
dynamic range is required. Product developers might also use
floating point DSPs to reduce the cost and complexity of software
development in exchange for more expensive hardware, since it is
generally easier to implement algorithms in floating point.
Generally, DSPs are dedicated integrated circuits, however DSP
functionality can also be realized using Field Programmable Gate Array chips.
Embedded general-purpose RISC processors are becoming increasingly
DSP like in functionality. For example, ARM Cortex-A8 has a 128-bit
wide SIMD unit that can have impressive 16- and 8-bit performance
for industry standard benchmarks.
See also
References
- ^ A. John Anderson (1994). Foundations of Computer Technology. CRC Press.
External links