博客首页 | 排行榜 |

设计我最赞的博客

个人档案
博文分类
处理器改进(Processor Modification)  2009-02-07 15:39
Put Soft-Core Processors to Work

Want to change your processor? No problem. Larry explains how to use the flexibility of soft-core processors to make the processor you want.

For a while now, selecting a microcontroller for an embedded design has been a process of deciding which offerings yield the features you want and need. Having been involved in selecting processors for embedded devices, I know the frustration of trying to figure out which controllers have sufficient memory and peripherals to get the job done. Sometimes you need one more serial port or a few more I/O lines. Worse yet, the other option is selecting a processor with too many features,most of which your project doesn‘t require.

System on Chip (SoC) devices offer some flexibility in that you can exchange the features of the controller and peripherals. But you are usually tied to the IP cores that the vendor offers you. And forget about changing the peripherals or even the processor itself! One of the greatest benefits of programmable logic is the ability to configure the device to do what you need and want. With offerings such as those at www.opencores.org, you can have the processor and peripherals you want and modify them for your application. The genesis of this idea came from studying realtime operating systems. When you study RTOS behavior, especially context switching, it quickly becomes apparent just how much time it takes to make a context switch from one task to another. In a RISC-type processor, the time it takes to save and restore the registers becomes a significant factor in that context switch time. I asked myself, ※How could that time be reduced?”

Suppose you could have a set of registers in your processor for each task in your software. I*m not talking about just reserving registers 1-5 for task one, 6-10 for task two, and so on. I am talking about having a full set of general-purpose registers for each task. One instruction is all that would be needed to go from one set of registers to another. Wouldn*t that save a lot of time?

With that idea in mind, I set out to find a processor that I could alter and give that kind of feature. I wanted to be able to give a processor an arbitrary number of register banks and to use one instruction to switch those registers around.

FINDING THE PROCESSOR
The toughest part of this project was selecting an appropriate softcore processor to modify. I knew the ideal candidate would have certain characteristics: an easily obtainable,modifiable source (I‘m more familiar with VHDL, so this is the language I was leaning toward); a debug interface to enable me to easily examine the state of the processor and step through code to see if the desired changes take place; a competent software toolset, such as GCC; and an instruction set, which would enable me to add instructions without causing problems for other parts of the processor.

I originally wanted to modify the Xilinx MicroBlaze soft-core processor,because I had been using one at work and I was familiar with it. I could have used the GCC toolchain and the JTAG debug interface to monitor the program execution. However, while Xilinx purportedly offers the source for the MicroBlaze for $5,000 (which in itself was far too expensive), when I contacted Xilinx it appeared that it did not actually sell or give away the source code.

Next, I turned to www.opencores.org to find a core. At the site, source code for IP cores is available under the GNU public license (GPL) and the limited GNU public license (LGPL)。There are a variety of cores that can be modified and have freely available sources. Some of the processor cores are still in the early stages of development,but several are mature and even in use in commercial products.
 
One processor core is the Open- RISC 1200. The OpenRISC is planned as a series of processors, and the 1200 is one of the first real implementations of the specification. The OpenRISC 1200 includes many of the same features as the MicroBlaze, including GCC support for software development, a debug interface that can go over JTAG, and a Wishbone bus master support,which enable it to communicate with many different devices that also support Wishbone with minimal effort.

The OpenRISC 1200 is written in Verilog, which is not a language I am familiar with. Finding the portions of the source code that needed to be modified was difficult,but it was not the hardest challenge. Building a working system for my development board was the biggest challenge. After downloading instructions for building a system using the Wishbone bus and following the steps, I had a system that could be built with Xilinx ISE tools. However, when I tried to connect a debugger to the core to download a program, I ran into problems. The gdb debugger kept complaining that I needed either to enable or to include a debug module (even though it was already included), and that it wasn‘t talking to the processor.

Instead of sorting out the problem,I abandoned the OpenRISC 1200 because I couldn't afford to spend more time trying to figure out what I was doing wrong. After the problems with the OpenRISC 1200, I was leery of searching for another core on www.opencores. org. However, after another search, I found the miniMIPS project.

The miniMIPS started out as a school project by a group of French students,and they submitted the results to www.opencores. org. Based on the MIPS I processor, it is a 32-bit Von Neumann architecture, with a fivestage pipeline (instruction extraction, instruction decoding, execution, memory access,and update registers)。 The processor normally holds 32 32-bit registers; but one, R0, is always read as 0 and cannot be written to.

The processor has a simple interface and is written in VHDL. It does not have GCC support as far as I can tell. The processor comes with little to no documentation, nor does the assembler the students built. There isn‘t a debug interface, so I wasn't able to use a debugger such as gdb to see the registers change. But I was able to build a system with the miniMIPS. This was the best selection with a working system.

DEVELOPMENT SYSTEM
The development system I found was a Digilent S3BOARD Spartan-3 development board. It comes with a Spartan-3 XC3S200, 1 MB of RAM,slide switches, push buttons, LEDs, a four-digit, seven-segment display,and serial, VGA, and PS/2 ports. I had the board shipped with a Spartan- 3 XC3S1000 because I needed as much room as I could get for the project.

I also ordered an expansion board with access to various pins in case I needed to probe the I/O pins of the FPGA. For the hardware implementation,I included several items,although I did not use them all. The modified miniMIPS processor, the interface to the SRAM (primarily to have visibility to the address and data lines of the miniMIPS), a program ROM device, the four-digit, sevensegment display, and the LEDs. With the hardware selected, I built a system with the miniMIPS, which provided the basis for the updated core.

MODIFICATIONS
With the physical hardware selected and implemented, I started modifying the processor. The full source for the VHDL source code is available on the Circuit Cellar FTP site. Note that the VHDL source code shown in this article has had some of the comments removed in order to make it presentable.

Let‘s start by looking at the register bank implementation, as given in banc.vhd, which is posted on the Circuit Cellar FTP site. The two versions are depicted in Figure 1. The original implementation is on the left and the modified implementation is on the right. The only addition is a new input, bank_sel,which decides which register bank is being used.

Figure 1-These are changes to the register bank (banc) module, with the addition of the bank_sel control lines. Everything else remains the same functionally.

To create that addition,I changed the entity declaration of the register bank module(see Listing 1)。This gave it a generic parameter to specify how many register banks it was to implement(default is one),and provided an 8-bit input, informing it which register bank it should be using.

I chose an 8-bit input for the bank_sel line,because it seemed reasonable to allow up to 256 register banks in an implementation. If you go beyond 256,besides requiring an extremely large part,you will run into issues. For instance,each thread will have little processor time to actually accomplish anything despite how short the context switch time may be.

Within the architecture definition of banc, I converted the one-dimensional declaration of the registers into a two-dimensional array, using num_banks as the limit on the array index. Each time a register is referenced,either to read or write,the bank_sel input indexes the group of registers (see Listing 2)。

Next, I wanted to create a new register that would act as the index to the register bank array, preferably in a place that would not require a completely new instruction to access and use it. Fortunately,the miniMIPS has a system coprocessor that handles interrupts and exceptions,and this module provides the necessary features. Figure 2 shows the changes to this module. In this case, I have added only an output,bank_sel.

Figure 2-These are the changes to the system coprocessor (syscop) module. The register bank control line (bank_sel) is the only addition here.

For this change I added a new register, which was accessible using the MTC0 and MFC0 instructions. The 8-bit output bus from the syscop provides the control signals that can then be routed to the banc module (see Listing 3)。

Some problems arise from switching the register banks in a pipelined architecture. One is what to do about the instructions that are still in the pipeline when the switch happens. If those instructions use values from or write results to registers,data corruption will occur. Those instructions cannot be relied on.

One solution to the problem is to insert three NOP instructions immediately after the register switch to have nothing in the pipeline. A second option is to use three instructions that don‘t use registers, such as absolute jumps. But both of these options rely on the software programmer to insert the soltion and would be difficult to debug.

Instead, this design takes advantage of the fact that the pipeline is flushed under certain conditions. In the original design, hardware interrupts,software exceptions, and a return from interrupt instruction result in the clearing of the pipeline stages, at which point the pipeline is reloaded. Now the syscop module detects when the register switch is taking place and triggers the same internal interrupt signal as the other triggers. The trick is that the pipeline expects to see the location to use to start loading the pipeline. Under hardware interrupts and software exceptions, the interrupt vector register is placed on the bus. When a return from interrupt instruction occurs, the address saved from the interrupt is placed on the pipeline. Fortunately, the syscop module is handed the current memory address in the MEM_adr input. By adding four to that value, the syscop puts out the instruction location just after the register switch instruction. This creates the desired result of flushing the pipeline and starting again at the next instruction.

What do you do with the bank selection register when an interrupt or exception occurs? While this could be made configurable, the design forces the selection register to 0 when an interrupt or exception is triggered. This provides the means to use bank 0 as a supervisory register bank. All other tasks use the other banks (see Listing 4)。


The pack_mips.vhd file, which is posted on the Circuit Cellar FTP site, provides a package of the interfaces to the various components. Therefore, it had to be changed to accommodate the new banc and syscop interfaces. Only those components are in Listing 5, along with the top-level miniMIPS component. The minimips.vhd file provides the toplevel definition and behavior of the processor (see Listing 6 and Listing 7)。

The primary changes here involve creating the bank_sel bus for the banc and syscop modules and connecting the bus to those components. Most of the changes are purely internal to the CPU. In a system that has a processor that was already being used, no physical changes to the architecture would have been required.






Figure 3 shows the changes to the interface. In this case, I routed the bank_sel signals out of the processor so I could monitor them outside of the FPGA. I did this for debug purposes,but it takes up pins that might be needed for other purposes and makes this processor so it isn‘t a direct, drop-in replacement. The source code, which is posted on the Circuit Cellar FTP site, has this change, but you'll want to remove the change when debug is complete. Under normal circumstances, this interface would not be necessary. The basic system that was described earlier provided the basis for a system with the new processor.

Figure 3-Under normal circumstances, the processor on the right would be the same as the one on the left. The monitor lines were added so I could see the action of the bank selection control lines. They can be removed in normal usage.

HARDWARE
As I described earlier, the hardware design for the proof of concept consisted of the modified miniMIPS processor, together with the following components: an SRAM interface(unused by the software, it is used to trace signals), a UART (also not used by the software), an LED display to show which register bank is in use,and a four-digit, seven-segment LED display to show the current results of the thread‘s calculations.
 
The development board has a 50-MHz clock source, but I slowed the system clock to almost 24 Hz so that I could see the behavior of the system.

For the Spartan-3 XC3S1000, I had a maximum of 7,680 slices available. Starting at one register bank, I built the system and examined the use of the part. Then, I incremented the register banks and regenerated. I was able to get eight register banks implemented on my device. Table 1 shows some of the results.

The other item of interest in the results is how the usage increased as the number of banks increased. Going from one bank to two increased the logic usage by 25%. Going to five banks took utilization up to almost 99%.

I generated bitmaps with up to eight register banks. However, the software did not perform reliably and there were many anomalies in the behavior of the system that I cannot explain without better debugging ability.

SOFTWARE
I had to write a program that would illustrate the concept as simply as possible because I did not have a debugger (either the original or the updated one) available for this processor. The maximum number of register banks I was able to use was five; therefore, I selected a program that would use all five banks to increment a counter by different amounts and display that counter on the seven-segment LED display.

The code is displayed in Listing 8. Lines 1 to 25 act as the initialization of the system. The MTC0 instruction is used to switch the bank control register, thus selecting which register bank is in use. Let‘s review how each register bank is set up. R1 is the main counter, which is initialized to 0 for all tasks. R2 is the increment counter, which is added to R1 to get a new total. For task 0, the count is 1. For task 1,the count is 2. Finally, I get to task 4, which has the increment count of 5. R3 holds the number of the next task to be run, sort of like a linked list in registers. Each task, except task 4, holds the same number as its increment count. Task 4 needs to loop back to task 0, so 0 is placed in R3.

In line 28, I switch to task 0(which was linked from task four),and the work begins. On line 29,the current counter is sent to the LED display, which is mapped to 0xFFFFFF80 in the address space. The -128 is sign extended in the processor when the address is generated. On line 30, the task initializes a loop counter to 5, as each task will do its work five times before moving on to the next task. On line 31, the task's total counter,R1, is added to the increment count in R2 and placed back in R1. The result is the output again to the LED display on line 32. On lines 33 and 34, the task‘s loop counter is decremented, and I compare it with R0 to see if I have completed all of my work for this time period. If I haven't, I go back three instructions to line 31. Otherwise, I will fall through.

The task will have reached line 35 when all five times through its work are finished. So, it will jump to line 28, where the registers are switched and the work begins again. The register switch clears the pipeline, so the next three instructions still work;whereas if the processor hadn‘t cleared the pipeline, I would have ended up with a mixture of the prior set's inputs going into the current set‘s registers.

The assembler that came with the miniMIPS processor did not appear to have any documentation,so it was difficult to figure out how to write the assembly language to generate a program. Rather than having to work through that, and because I was going to need the resulting program in a ROM space in the design anyway, I decided to write a Perl program to "assemble"my source file into a VHDL source file. I won‘t go into the details of the Perl program here, but it is in the source code files posted on the Circuit Cellar FTP site. Note that not all of the miniMIPS instructions are supported.

DESIGN LIMITATIONS
The design has a few limitations that need to be addressed before it can be used in a full system intended for production. Interrupts and exceptions currently trigger the register bank selection to bank 0. This was deliberate to ensure that a guaranteed register bank is used when these events occur. However, the previous register value is not preserved. Because the state of the system when the interrupt occurred is not truly preserved, the previous state has to be restored prior to returning from the interrupt. One possible solution to this problem would be to have a separate portion of the BANKSEL register devoted to what register is used for interrupts and exceptions. The processor knows when interrupts occur, so it would be easy for it to switch to that portion of the register and not modify the other register portion.

The miniMIPS does not appear to be designed to be in user versus supervisory modes and thus restrict certain instructions from executing in user mode. This means that a hostile piece of code would be able to switch to another thread‘s registers and change them, including the link return register R31. This is not a particularly desirable feature, but it may not be that much of an issue in certain embedded environments.

You may want to reduce logic usage by implementing the registers in either distributed RAM or block RAM as dual-port memory. This might increase the number of registers that can be implemented, but it may not be able to keep single-cycle accesses required for the pipeline implementation.

THE FUTURE
This project is a good starting point for expanding current soft-core processors and giving them more flexibility. Doing so enables real-time systems to meet their timing requirements. By removing or reducing the need to save and restore registers, you can reduce the context switch time. You will save more time for processing.

Larry Standage (lstandage@gmail.com) holds an M.S. in embedded engineering and a B.S.E. in computer systems engineering from Arizona State University. He has been working on and off in the embedded technology industry for the last 16 years. In his spare time between being a father to five children and teaching high school kids in a religious setting,Larry likes to play chess, read a book or two, and catch up on shows stored on his TiVo.

PROJECT FILES
To download code, go to ftp://ftp.circuitcellar.com/pub/Circuit_Cellar/2009/223.

RESOURCES
Digilent, Inc., "S3 Starter Board Schematics," www.digilentinc.com/Data/Products/S3BOARD/S3BOARD-sch.pdf.

J. Frenzel, "MIPS Instruction Reference," University of Idaho, Departmentof Electrical and Computer Engineering.

miniMIPS Overview, www.opencores.org/projects.cgi/web/minimips/overview.

Xilinx, Inc., "Spartan-3 FPGA Family Data Sheet," DS099, 2008.

---, "Spartan-3 Starter Kit Board User Guide," UG130, 2005.

SOURCE
S3BOARD Spartan-3 Development board and Spartan-3 XC3S1000 starter
board
Digilent, Inc.
www.digilentinc.com
类别:无线通信 |
上一篇:太阳能测量(Solar Measurements) | 下一篇:无线模块控制(Wireless Module Control)
以下网友评论只代表其个人观点,不代表本网站的观点或立场