Put Soft-Core Processors to WorkWant to change your processor? No problem. Larry explains how to
use the flexibility of soft-core processors to make the processor
you want.
For a while now, selecting a microcontroller for an embedded design
has been a process of deciding which offerings yield the features
you want and need. Having been involved in selecting processors for
embedded devices, I know the frustration of trying to figure out
which controllers have sufficient memory and peripherals to get the
job done. Sometimes you need one more serial port or a few more I/O
lines. Worse yet, the other option is selecting a processor with
too many features,most of which your project doesn‘t require.
System on Chip (SoC) devices offer some flexibility in that you can
exchange the features of the controller and peripherals. But you
are usually tied to the IP cores that the vendor offers you. And
forget about changing the peripherals or even the processor itself!
One of the greatest benefits of programmable logic is the ability
to configure the device to do what you need and want. With
offerings such as those at www.opencores.org, you can have the
processor and peripherals you want and modify them for your
application. The genesis of this idea came from studying realtime
operating systems. When you study RTOS behavior, especially context
switching, it quickly becomes apparent just how much time it takes
to make a context switch from one task to another. In a RISC-type
processor, the time it takes to save and restore the registers
becomes a significant factor in that context switch time. I asked
myself, ※How could that time be reduced?”
Suppose you could have a set of registers in your processor for
each task in your software. I*m not talking about just reserving
registers 1-5 for task one, 6-10 for task two, and so on. I am
talking about having a full set of general-purpose registers for
each task. One instruction is all that would be needed to go from
one set of registers to another. Wouldn*t that save a lot of time?
With that idea in mind, I set out to find a processor that I could
alter and give that kind of feature. I wanted to be able to give a
processor an arbitrary number of register banks and to use one
instruction to switch those registers around.
FINDING THE PROCESSORThe toughest part of this project was selecting an appropriate
softcore processor to modify. I knew the ideal candidate would have
certain characteristics: an easily obtainable,modifiable source
(I‘m more familiar with VHDL, so this is the language I was leaning
toward); a debug interface to enable me to easily examine the state
of the processor and step through code to see if the desired
changes take place; a competent software toolset, such as GCC; and
an instruction set, which would enable me to add instructions
without causing problems for other parts of the processor.
I originally wanted to modify the Xilinx MicroBlaze soft-core
processor,because I had been using one at work and I was familiar
with it. I could have used the GCC toolchain and the JTAG debug
interface to monitor the program execution. However, while Xilinx
purportedly offers the source for the MicroBlaze for $5,000 (which
in itself was far too expensive), when I contacted Xilinx it
appeared that it did not actually sell or give away the source
code.
Next, I turned to www.opencores.org to find a core. At the site,
source code for IP cores is available under the GNU public license
(GPL) and the limited GNU public license (LGPL)。There are a variety
of cores that can be modified and have freely available sources.
Some of the processor cores are still in the early stages of
development,but several are mature and even in use in commercial
products.
One processor core is the Open- RISC 1200. The OpenRISC is planned
as a series of processors, and the 1200 is one of the first real
implementations of the specification. The OpenRISC 1200 includes
many of the same features as the MicroBlaze, including GCC support
for software development, a debug interface that can go over JTAG,
and a Wishbone bus master support,which enable it to communicate
with many different devices that also support Wishbone with minimal
effort.
The OpenRISC 1200 is written in Verilog, which is not a language I
am familiar with. Finding the portions of the source code that
needed to be modified was difficult,but it was not the hardest
challenge. Building a working system for my development board was
the biggest challenge. After downloading instructions for building
a system using the Wishbone bus and following the steps, I had a
system that could be built with Xilinx ISE tools. However, when I
tried to connect a debugger to the core to download a program, I
ran into problems. The gdb debugger kept complaining that I needed
either to enable or to include a debug module (even though it was
already included), and that it wasn‘t talking to the processor.
Instead of sorting out the problem,I abandoned the OpenRISC 1200
because I couldn't afford to spend more time trying to figure out
what I was doing wrong. After the problems with the OpenRISC 1200,
I was leery of searching for another core on www.opencores. org.
However, after another search, I found the miniMIPS project.
The miniMIPS started out as a school project by a group of French
students,and they submitted the results to www.opencores. org.
Based on the MIPS I processor, it is a 32-bit Von Neumann
architecture, with a fivestage pipeline (instruction extraction,
instruction decoding, execution, memory access,and update
registers)。 The processor normally holds 32 32-bit registers; but
one, R0, is always read as 0 and cannot be written to.
The processor has a simple interface and is written in VHDL. It
does not have GCC support as far as I can tell. The processor comes
with little to no documentation, nor does the assembler the
students built. There isn‘t a debug interface, so I wasn't able to
use a debugger such as gdb to see the registers change. But I was
able to build a system with the miniMIPS. This was the best
selection with a working system.
DEVELOPMENT SYSTEMThe development system I found was a Digilent S3BOARD Spartan-3
development board. It comes with a Spartan-3 XC3S200, 1 MB of
RAM,slide switches, push buttons, LEDs, a four-digit, seven-segment
display,and serial, VGA, and PS/2 ports. I had the board shipped
with a Spartan- 3 XC3S1000 because I needed as much room as I could
get for the project.
I also ordered an expansion board with access to various pins in
case I needed to probe the I/O pins of the FPGA. For the hardware
implementation,I included several items,although I did not use them
all. The modified miniMIPS processor, the interface to the SRAM
(primarily to have visibility to the address and data lines of the
miniMIPS), a program ROM device, the four-digit, sevensegment
display, and the LEDs. With the hardware selected, I built a system
with the miniMIPS, which provided the basis for the updated core.
MODIFICATIONSWith the physical hardware selected and implemented, I started
modifying the processor. The full source for the VHDL source code
is available on the Circuit Cellar FTP site. Note that the VHDL
source code shown in this article has had some of the comments
removed in order to make it presentable.
Let‘s start by looking at the register bank implementation, as
given in banc.vhd, which is posted on the Circuit Cellar FTP site.
The two versions are depicted in Figure 1. The original
implementation is on the left and the modified implementation is on
the right. The only addition is a new input, bank_sel,which decides
which register bank is being used.
Figure 1-These are changes to the register bank (banc) module, with
the addition of the bank_sel control lines. Everything else remains
the same functionally.
To create that addition,I changed the entity declaration of the
register bank module(see Listing 1)。This gave it a generic
parameter to specify how many register banks it was to
implement(default is one),and provided an 8-bit input, informing it
which register bank it should be using.
I chose an 8-bit input for the bank_sel line,because it seemed
reasonable to allow up to 256 register banks in an implementation.
If you go beyond 256,besides requiring an extremely large part,you
will run into issues. For instance,each thread will have little
processor time to actually accomplish anything despite how short
the context switch time may be.
Within the architecture definition of banc, I converted the
one-dimensional declaration of the registers into a two-dimensional
array, using num_banks as the limit on the array index. Each time a
register is referenced,either to read or write,the bank_sel input
indexes the group of registers (see Listing 2)。
Next, I wanted to create a new register that would act as the index
to the register bank array, preferably in a place that would not
require a completely new instruction to access and use it.
Fortunately,the miniMIPS has a system coprocessor that handles
interrupts and exceptions,and this module provides the necessary
features. Figure 2 shows the changes to this module. In this case,
I have added only an output,bank_sel.
Figure 2-These are the changes to the system coprocessor (syscop)
module. The register bank control line (bank_sel) is the only
addition here.
For this change I added a new register, which was accessible using
the MTC0 and MFC0 instructions. The 8-bit output bus from the
syscop provides the control signals that can then be routed to the
banc module (see Listing 3)。
Some problems arise from switching the register banks in a
pipelined architecture. One is what to do about the instructions
that are still in the pipeline when the switch happens. If those
instructions use values from or write results to registers,data
corruption will occur. Those instructions cannot be relied on.
One solution to the problem is to insert three NOP instructions
immediately after the register switch to have nothing in the
pipeline. A second option is to use three instructions that don‘t
use registers, such as absolute jumps. But both of these options
rely on the software programmer to insert the soltion and would be
difficult to debug.
Instead, this design takes advantage of the fact that the pipeline
is flushed under certain conditions. In the original design,
hardware interrupts,software exceptions, and a return from
interrupt instruction result in the clearing of the pipeline
stages, at which point the pipeline is reloaded. Now the syscop
module detects when the register switch is taking place and
triggers the same internal interrupt signal as the other triggers.
The trick is that the pipeline expects to see the location to use
to start loading the pipeline. Under hardware interrupts and
software exceptions, the interrupt vector register is placed on the
bus. When a return from interrupt instruction occurs, the address
saved from the interrupt is placed on the pipeline. Fortunately,
the syscop module is handed the current memory address in the
MEM_adr input. By adding four to that value, the syscop puts out
the instruction location just after the register switch
instruction. This creates the desired result of flushing the
pipeline and starting again at the next instruction.
What do you do with the bank selection register when an interrupt
or exception occurs? While this could be made configurable, the
design forces the selection register to 0 when an interrupt or
exception is triggered. This provides the means to use bank 0 as a
supervisory register bank. All other tasks use the other banks (see
Listing 4)。
The pack_mips.vhd file, which is posted on the Circuit Cellar FTP
site, provides a package of the interfaces to the various
components. Therefore, it had to be changed to accommodate the new
banc and syscop interfaces. Only those components are in Listing 5,
along with the top-level miniMIPS component. The minimips.vhd file
provides the toplevel definition and behavior of the processor (see
Listing 6 and Listing 7)。
The primary changes here involve creating the bank_sel bus for the
banc and syscop modules and connecting the bus to those components.
Most of the changes are purely internal to the CPU. In a system
that has a processor that was already being used, no physical
changes to the architecture would have been required.
Figure 3 shows the changes to the interface. In this case, I routed
the bank_sel signals out of the processor so I could monitor them
outside of the FPGA. I did this for debug purposes,but it takes up
pins that might be needed for other purposes and makes this
processor so it isn‘t a direct, drop-in replacement. The source
code, which is posted on the Circuit Cellar FTP site, has this
change, but you'll want to remove the change when debug is
complete. Under normal circumstances, this interface would not be
necessary. The basic system that was described earlier provided the
basis for a system with the new processor.
Figure 3-Under normal circumstances, the processor on the right
would be the same as the one on the left. The monitor lines were
added so I could see the action of the bank selection control
lines. They can be removed in normal usage.
HARDWAREAs I described earlier, the hardware design for the proof of
concept consisted of the modified miniMIPS processor, together with
the following components: an SRAM interface(unused by the software,
it is used to trace signals), a UART (also not used by the
software), an LED display to show which register bank is in use,and
a four-digit, seven-segment LED display to show the current results
of the thread‘s calculations.
The development board has a 50-MHz clock source, but I slowed the
system clock to almost 24 Hz so that I could see the behavior of
the system.
For the Spartan-3 XC3S1000, I had a maximum of 7,680 slices
available. Starting at one register bank, I built the system and
examined the use of the part. Then, I incremented the register
banks and regenerated. I was able to get eight register banks
implemented on my device. Table 1 shows some of the results.
The other item of interest in the results is how the usage
increased as the number of banks increased. Going from one bank to
two increased the logic usage by 25%. Going to five banks took
utilization up to almost 99%.
I generated bitmaps with up to eight register banks. However, the
software did not perform reliably and there were many anomalies in
the behavior of the system that I cannot explain without better
debugging ability.
SOFTWAREI had to write a program that would illustrate the concept as
simply as possible because I did not have a debugger (either the
original or the updated one) available for this processor. The
maximum number of register banks I was able to use was five;
therefore, I selected a program that would use all five banks to
increment a counter by different amounts and display that counter
on the seven-segment LED display.
The code is displayed in Listing 8. Lines 1 to 25 act as the
initialization of the system. The MTC0 instruction is used to
switch the bank control register, thus selecting which register
bank is in use. Let‘s review how each register bank is set up. R1
is the main counter, which is initialized to 0 for all tasks. R2 is
the increment counter, which is added to R1 to get a new total. For
task 0, the count is 1. For task 1,the count is 2. Finally, I get
to task 4, which has the increment count of 5. R3 holds the number
of the next task to be run, sort of like a linked list in
registers. Each task, except task 4, holds the same number as its
increment count. Task 4 needs to loop back to task 0, so 0 is
placed in R3.

In line 28, I switch to task 0(which was linked from task four),and
the work begins. On line 29,the current counter is sent to the LED
display, which is mapped to 0xFFFFFF80 in the address space. The
-128 is sign extended in the processor when the address is
generated. On line 30, the task initializes a loop counter to 5, as
each task will do its work five times before moving on to the next
task. On line 31, the task's total counter,R1, is added to the
increment count in R2 and placed back in R1. The result is the
output again to the LED display on line 32. On lines 33 and 34, the
task‘s loop counter is decremented, and I compare it with R0 to see
if I have completed all of my work for this time period. If I
haven't, I go back three instructions to line 31. Otherwise, I will
fall through.
The task will have reached line 35 when all five times through its
work are finished. So, it will jump to line 28, where the registers
are switched and the work begins again. The register switch clears
the pipeline, so the next three instructions still work;whereas if
the processor hadn‘t cleared the pipeline, I would have ended up
with a mixture of the prior set's inputs going into the current
set‘s registers.
The assembler that came with the miniMIPS processor did not appear
to have any documentation,so it was difficult to figure out how to
write the assembly language to generate a program. Rather than
having to work through that, and because I was going to need the
resulting program in a ROM space in the design anyway, I decided to
write a Perl program to "assemble"my source file into a VHDL source
file. I won‘t go into the details of the Perl program here, but it
is in the source code files posted on the Circuit Cellar FTP site.
Note that not all of the miniMIPS instructions are supported.
DESIGN LIMITATIONSThe design has a few limitations that need to be addressed before
it can be used in a full system intended for production. Interrupts
and exceptions currently trigger the register bank selection to
bank 0. This was deliberate to ensure that a guaranteed register
bank is used when these events occur. However, the previous
register value is not preserved. Because the state of the system
when the interrupt occurred is not truly preserved, the previous
state has to be restored prior to returning from the interrupt. One
possible solution to this problem would be to have a separate
portion of the BANKSEL register devoted to what register is used
for interrupts and exceptions. The processor knows when interrupts
occur, so it would be easy for it to switch to that portion of the
register and not modify the other register portion.
The miniMIPS does not appear to be designed to be in user versus
supervisory modes and thus restrict certain instructions from
executing in user mode. This means that a hostile piece of code
would be able to switch to another thread‘s registers and change
them, including the link return register R31. This is not a
particularly desirable feature, but it may not be that much of an
issue in certain embedded environments.
You may want to reduce logic usage by implementing the registers in
either distributed RAM or block RAM as dual-port memory. This might
increase the number of registers that can be implemented, but it
may not be able to keep single-cycle accesses required for the
pipeline implementation.
THE FUTUREThis project is a good starting point for expanding current
soft-core processors and giving them more flexibility. Doing so
enables real-time systems to meet their timing requirements. By
removing or reducing the need to save and restore registers, you
can reduce the context switch time. You will save more time for
processing.
Larry Standage (lstandage@gmail.com) holds an M.S. in embedded
engineering and a B.S.E. in computer systems engineering from
Arizona State University. He has been working on and off in the
embedded technology industry for the last 16 years. In his spare
time between being a father to five children and teaching high
school kids in a religious setting,Larry likes to play chess, read
a book or two, and catch up on shows stored on his TiVo.
PROJECT FILESTo download code, go to
ftp://ftp.circuitcellar.com/pub/Circuit_Cellar/2009/223.
RESOURCESDigilent, Inc., "S3 Starter Board Schematics,"
www.digilentinc.com/Data/Products/S3BOARD/S3BOARD-sch.pdf.
J. Frenzel, "MIPS Instruction Reference," University of Idaho,
Departmentof Electrical and Computer Engineering.
miniMIPS Overview,
www.opencores.org/projects.cgi/web/minimips/overview.
Xilinx, Inc., "Spartan-3 FPGA Family Data Sheet," DS099, 2008.
---, "Spartan-3 Starter Kit Board User Guide," UG130, 2005.
SOURCES3BOARD Spartan-3 Development board and Spartan-3 XC3S1000 starter
board
Digilent, Inc.
www.digilentinc.com