Customize an OS for a Powerful Embedded MCUNaubert‘s ArmExe is an RTOS for the Luminary Micro Stellaris ARM
Cortex-M3 family of microcontrollers and Keil's μVision
environment. The multitasking preemptable executive uses task
priorities, separate task and kernel stacks, and execution
privileges. It handles all interrupts with priorities, supports
nested interrupts, and implements binary semaphores and events as
the resource-sharing mechanisms.
When I first received my Luminary Micro evaluation board for the
Luminary Micro Designstellaris2006 contest,I couldn‘t believe how
much power the little Stellaris chip had. The demonstration program
that came with the LM3S811 evaluation board displayed an impressive
game in a low-cost, 32-bit platform. Thirty-two bits? Yes, 32 bits
in a microcontroller!The Luminary Micro Stellaris ARM with its
first implementation of ARM‘s new Cortex-M3 and its Thumb-2
instruction set is a 32-bit core designed to substitute for 8- or
16-bit microcontrollers in embedded applications. It introduced me
to a world of new possibilities of applications that would be
difficult to develop with lower-end chips. But I feared this
architecture because I didn't know too much about ARM. In
addition,to take advantage of such a powerful microcontroller, I
needed an operating system. So, I decided to build my own real-time
operating system(RTOS)。 I call it the "ArmExe."
The ArmExe is a compact RTOS designed to take advantage of all of
the features of the Luminary Micro Cortex- M3 family of
microcontrollers and the ARM‘s μVision development environment.
This multitasking preemptable OS supports up to 254 tasks with 25
priorities each, different task and kernel stacks, execution
privileges, interrupt handlers with priorities, and binary
semaphores and events as the resource-sharing mechanisms.
The ArmExe was highly integrated with the μVision environment. The
C libraries were tailored to work in Thread Safe mode, and I used
several of the compiler features, including SVC calls, weak
functions, and codeoptimization declarations. In addition,I used
the μVision Configuration Wizard for easy kernel configuration.
In this article, I will describe the features of the ArmExe and how
it was created. The concepts here will help you design a new OS for
almost any powerful embedded microcontroller.
RTOS FEATURES
The ArmExe was designed to have a platform to develop
multithreading applications in the new Cortex-M3 architecture. The
ArmExe kernel permits interrupt nesting with a single interface to
the user, while other RTOSes require a different set of system
calls when the code is executing in an interrupt handler. Using the
same interface makes the ArmExe much easier to use. It also takes
advantage of the new Cortex-M3 interrupt optimizations for nested
interrupts. When an interrupt arrives,most of the registers are
pushed in the stack automatically. If a new, higherpriority
interrupt arrives during this interrupt handling and before the
beginning of the interrupt code, the Cortex-M3 microcontroller
won‘t need to push a new set of registers because they are
unchanged. It will simply jump to the higher-priority interrupt
handler and save only the return pointer. The ArmExe recognizes
these events and permits the nesting of the priority-based
interrupts, which will also simplify the user coding.
The Stellaris processor, with its Cortex-M3 architecture, was
designed to be used with an OS. The clever designers included two
stack pointers:one for a privileged or handler mode and another for
the user or task mode. By using the CPU privileged access and the
two stack pointers, the ArmExe simplified the context save and
restore routines, achieving fast task switching. The use of the
privileged modes also improves the kernel's security by not
allowing some instructions to be used in Task mode. For instance,
it is not possible to disable interrupts while in Task mode. The
ArmExe kernel always runs in Handler mode with full access to all
CPU resources, but user code runs in protected Task mode. If the
user code has to perform a privileged operation, it must do it
through a call to an OS function. The Stellaris and other
higher-end CPUs have specific instructions for system calls that
can switch from User mode to Kernel mode and make the kernel regain
full control of its resources.
The ArmExe offers dynamically changing interrupt priorities. The
interrupt priorities are heavily used in multithreading
applications to guarantee the right interrupt nesting and control
resource-sharing access. Common systems use the ability to disable
interrupts to access shared resources and interrupt code run with
interrupts disabled. The ArmExe is different. There are no
interrupt-disabled code regions. Critical sections are handled by
raising the CPU priority. Interrupts that arrive with a lower
priority than the CPU cannot interrupt the current flow. An
interrupt handler code can be interrupted by higher-priority
interrupts without affecting the task scheduler. Of course,
interrupt code shouldn‘t invoke blocking kernel calls because they
will block the kernel forever. This is a normal programming
consideration in RTOSes. The SVC call, which is the only way to
call the kernel, is set in the highest possible interrupt priority.
When the kernel is executing, nobody can interrupt it. The
interrupt handler kernel code is protected, raising the running
priority to avoid switching. It is lowered to the corresponding
interrupt priority only when the user interrupt handler is called.
The ArmExe also implements task priorities, which are handled
entirely in the kernel code. Ready tasks with a higher priority
will preempt any lower-priority task that is running. If preemption
is enabled, tasks with the same priority will time slice each
systick interrupt. If preemption is disabled,the current running
task must block itself before the kernel schedules the next same or
higher-priority ready task.
Finally, two user-level resourcesharing control mechanisms were
implemented in the ArmExe: mutexes and events. Mutexes are a
mechanism to control exclusive access. A task must request the
ownership of a mutex to have exclusive access to an associated
resource. Only one task can be the owner of a mutex at a time. If
another task wants to access the same resource, it first has to
request the same mutex. The kernel will block this task because
another task has the mutex ownership. The blocked task will not
have CPU time until the first task releases the mutex. Once this
happens, the blocked task will gain ownership to the mutex and will
have exclusive access to the shared resource.
The ArmExe can have an unlimited number of mutexes. The ArmExe also
implements events. An event is a mechanism where tasks wait for
something to happen and they will block waiting for it. When the
event occurs, usually in interrupt code, the event is signaled and
all tasks that were blocked on it are unblocked. The ArmExe was
designed and compiled with the ARM μVision 3 development
environment in C and in assembler language. It uses the Luminary
Micro Stellaris driver library. It was tested with the 1003
release, but it should work with any subsequent version. The user
code can use the driver library in multiple tasks if the
ARMEXE_KEIL_LIBRARY_SAFE constant is set in the config.h
configuration file. All of the files related to this project are
posted on the Circuit Cellar FTP site.
USING ArmExe
The ArmExe is easy to use. A singlekernel configuration file has to
be modified to change the kernel parameters to adjust its memory
footprint. The parameters include the maximum number of tasks, the
idle task stack size, the default stack size for user tasks, a
preemption enable/disable option, the systick period, the clock
frequency (if the thread-safe C library is needed), and the number
of required C library mutexes. Photo 1 shows the configuration
wizard of the ArmExe config.h file where these parameters can be
easily adjusted thanks to this feature of the μVision environment.

The kernel system calls are separated into functional groups: the
Kernel initialization and configuration group has two system calls
to initialize the kernel and set the systick period; the Task
Management group, has all task creation and destroy calls; the task
Scheduling group encloses the calls related to task suspension,
resuming and priorities; the Mutex Handling group, has the calls
related to Mutexes;the Event Handling group has the calls related
to Events; the Interrupt Handling group has the calls to enable,
disable, and manage interrupts;and the miscellaneous group has
additional utility system calls. Table 1 shows some of the most
important kernel calls. A full table is posted in the Circuit
Cellar FTP site in the file labeled ArmExe Full System Calls.doc.
点击查看Table 1
Programming with the ArmExe is simple:just add a reference to the
ArmExe.h include file, program a main() routine that calls
ArmExeInit(TICK_TIME), and define the init_tasks() function to
initialize and create the tasks. Optionally,you may define
signal_error()to capture fatal kernel error messages. In the
μVision project properties it is required that the parameter--entry
Reset_Handler is defined in the linker options section and that
fixed address compilation is used. The following files must be
added to the project, which are part of the kernel:Startup.s,
SVC.s, IntHandler.s, and ArmExe.c. The Luminary Micro DriverLib
library revision 1003 or later must also be added to the project.
An alarm clock demo program was developed to test some of the
ArmExe‘s functions. As simple as it may sound, the clock program is
a three-task multithreaded code running flawlessly in the
EKK-LM3S811 evaluation board (see Photo 2)。 The code is fully
documented. I invite you to download the main.c file from the
Circuit Cellar FTP site.
ArmExe INTERNALSAn RTOS simulates the concurrent execution of several tasks or
processes. This simulation is achieved by assigning a bit or slice
of CPU time to every task that is ready for execution. A task is
told to be ready if it can run without having to wait for any other
resource but CPU time. A task is considered to be blocked if it is
waiting for a resource (i.e., waiting for an interrupt)。 By knowing
that a task is just waiting, instead of leaving it running in a
loop, an RTOS will hold the task from running and assign the CPU
time to other ready tasks. In a singlethreaded CPU, such as a
Luminary Micro microcontroller, only one task can be in the running
state. If several tasks are ready, the highest priority task will
have the CPU. If preemption is enabled, all tasks with the same
priority will be given slices of CPU time in a round-robin fashion.
Figure 1 shows the ArmExe task states.

The first step to build an RTOS is to write routines to save and
restore the state of a running task. Saving the task state will put
the task in "hibernation."Restoring its state will make that task
run again, as if it was never suspended. The state of a task or its
context comprises all of the user-level registers, the stack, and
other system status. This information is saved in a structure
called the task control block (TCB)。 Task switching is the transfer
of CPU time from a task to another and it is achieved by saving the
context of the current running task and restoring the context of
the new selected ready task.
The ArmExe performs its duties as a standard RTOS: it has a task
control block (TCB structure in ArmExe.c),which defines a task. The
ArmExe TCB also has other variables, such as the task status, task
ID, and task priority. When the ArmExe needs to change context, it
will do it only from Handler mode-that is, from inside interrupt
code. When the Cortex-M3 interrupts a task, it will automatically
save the R0 through R3, R12, LR, PC,and PSR registers. They are
stored in the task stack. The ArmExe kernel then saves the missing
R4-R11 registers also in the stack, thus saving the entire context
for that task. The task switching happens when returning from an
interrupt handler. Instead of just restoring the last pushed
registers,the kernel will restore the corresponding registers of
the new task that should run. Because all of the registers are
saved in the task stack, the corresponding task will run at the
interrupt exit by just changing the task stack pointer.
This task-switching logic is programmed in each of the three main
interrupt entry handlers: in SVC.s for the SVC kernel calls, in the
SysTick_Handler in ArmExe.c for the systick interrupt handler, and
in IntHandler.s for the rest of the IRQ handlers. Each of the
interrupt handlers can switch a task, so the entry and the exit
code are similar. The interrupt entry code increments the
ISR_counter variable. This is used to check the interrupt nesting.
If the counter returns to zero at ISR exit, it will be recognized
as the last interrupt handler-and thus, a context restore is
forced. However, if it isn‘t, then it means that the return must be
done to another interrupt handler, in which case the context is not
restored from the thread stack. This simple logic mixed with the
right interrupt priorities is what makes the interrupt nesting
support a reality in the ArmExe. In the interrupt handler entry
code shown in Listing 1,notice how easily the context is fully
saved with only seven instructions.

The interrupt exit code is shown in Listing 2. The ISR_counter is
decremented and compared with zero. If so,the interrupt handler
exits from the last ISR so a context restore must be performed. In
the SVC handler, additional code was added to return the R0 value
to the calling thread because the SVC calls implement C function
calls with a return value. The actual context restore is performed
in the following six instructions. The new task TCB is retrieved
from the task_running variable. The SP of the new thread is used to
restore R4-R11 and then the routine saves the resulting stack
pointer into the PSP, or process stack. At this point,returning
from the interrupt handler(with POP {PC}) will automatically
restore the rest of the registers and return to the new selected
task code.

Once the user task is run, the kernel will regain control only when
an interrupt arrives. At that point, the kernel should assess the
tasks and determine which one will run next. This is the
scheduler‘s job. The scheduler in the ArmExe runs in Privileged
mode. This mode is automatically set by the CPU when an interrupt
arrives. The schedule decision normally runs at the end of each
interrupt handler, when the kernel finishes its duties and returns
the CPU to a user task. It is assumed that the interrupt could
change the status of another task so the kernel must check if a
higher priority task must run after the interrupt is processed.
The ArmExe scheduler is implemented in the schedule() routine (in
the ArmExe.c file), which will switch the task_running variable to
the highest priority ready task (which is the first element in the
ready list, because it is a priority ordered list)。 The task
scheduling is also called each systick interrupt to implement
preemption. The SysTick_Handler interrupt routine will call the
SysTick_Interrupt(),which calls schedule().
The schedule() routine is straightforward. It will check the next
ready task queued in the ready list and validate that if its
priority is greater than the current running task. If it is, that
task will be the next running task by changing the task_running
variable, and the old running task will be queued in the ready list
at its corresponding position depending on its priority. If
preemption is enabled and a task is ready with the same priority as
the current running task, then a task switching will be forced,
producing a round-robin time-slice of tasks with the same
priorities.
The ArmExe has implemented seven task states: Uninitiated, Ready,
Running,Sleeping, Suspended, Waiting on mutex,and Waiting on event
(see Figure 1)。 The task control blocks are stored in a static
array called tcb_array. All of the entries are initialized with the
TASK_UNINITIATED state.
When a new task is created, it can be initialized as Ready or
Suspended. If the task is Ready, it will be queued in the ready
list so the next schedule() call considers it to run, depending on
its priority. If the task is suspended, it will remain in that
state until resumed with the ArmExeTaskResume() call. A single task
will be running at a time. If no tasks are ready to run, then the
idle task will run, which does nothing. If a task calls a kernel
function that requires the kernel to complete some operation before
returning to the task, the task will be blocked until that
operation is finished. There are several blocked states in the
current implementation of the ArmExe, including sleeping, waiting
on mutex, and waiting on event. A task that is blocked will be
queued in a list associated with the operation that must be
finished.
Sleeping tasks are queued in the sleeping list, which is a
time-ordered list of tasks that is checked every systick interrupt
to awaken tasks. If the time that a task requested to sleep passes,
the task status is changed to Ready, the task is queued again in
the ready list, and a reschedule is requested. Waiting for mutex
tasks are queued in a list associated with the mutex. The tasks
remain blocked until the mutex is released, where the first blocked
task is released, queuing it into the ready list, and making it the
next owner of the mutex. This pattern remains until all waiting for
mutex blocked tasks are released. Waiting for event tasks are
queued in a list associated with the event. When the event is
signaled, all of them are moved to the ready task at once.
KERNEL INITIALIZATIONThe ArmExe has to be initialized by the user code before creating
the tasks. The initialization is done by calling ArmExeInit()。 The
routine will perform several internal housekeeping routines,
initialize the systick counter,and start the idle task (see Figure
2).
The idle task gets CPU time when no other task is ready. It spares
CPU time by cycling in an infinite loop. But before entering in
this loop, the idle task must initialize the user tasks. It creates
a new task that will run the user-defined init_tasks() routine.
This task will run with a priority value, which is higher than any
user-selectable priority.
Because none of the user-created tasks can have a higher priority
than this initialization task, no task will run until the priority
is lowered. This occurs automatically when the init_tasks()ends,
leaving the CPU to the userdefined tasks starting with the higher
priority user task first. This mechanism simplifies the user tasks
start and makes the ArmExe easier to use.
TASK INITIALIZATIONBy now you should have a question. How will a task run the first
time if its context was never saved before?
Well, the kernel has to initialize the task context the first time
to make the scheduler think that the task was previously suspended.
This is performed in the task creation routine ae_task_create()。
The routine will find an available task ID. It will then search for
a free Task Control Block(TCB) slot in the tasks array and
initialize it according to the specified parameters. It then calls
the stack_init() routine, which will initialize the context as if
the task was previously suspended by an interrupt. This means that
the task stack has to have pushed the register values in the right
order, that the process status register has the Thumb bit on (as
the Stellaris CPUs can only run Thumb instructions),and that the
return value in the stack points to the start of the corresponding
task code. The LR register is set to the ArmExeTaskExit() code. So,
if the task routine ends, it will jump to the exit code by default.
Finally, the resulting stack pointer is stored in the TCB. So, it
is ready to be used by the interrupt exit routine to start the task
whenever the scheduler put this task to run.
PROJECT COMPLETE
I finished this project with all of my goals satisfied. I learned
a lot about the new ARM Cortex-M3 architecture,how the nested
vectored interrupt controller (NVIC) worked, the details of
Cortex-M3 interrupt optimizations,how to handle privileged
instructions, user and handler stacks,and everything about
interrupt handling with priorities.
I am especially pleased that I completed a fully customizable
real-time operating system that will simplify my developments on
the Luminary Micro Stellaris family of microcontrollers. This was
my first ARM development. The result is not bad for just one month
of dedication. It certainly demonstrates how a good
architecture,the right tools, and an excellent MCU implementation
can accelerate your project development to achieve a fast time to
market.
Editor‘s note: Naubert Aparicio's"ArmExe" design won Honorable
Mention in the Luminary Micro Designstellaris2006 Contest. For more
information about the contest and to learn more about the winning
projects, visit www.circuitcellar.com/designstellaris2006.
Naubert Aparicio (naubert.aparicio@ usa.net) is a computer science
engineer from Simon Bolivar University in Venezuela. He works as
the technology director for Quantum Business Engineering in Puerto
Rico, managing the design of mission-critical Unix systems
architectures. When Naubert has free time, he enjoys learning about
robotics, advanced digital design, and embedded systems.
PROJECT FILES
To download code, go to ftp://ftp.circuitcellar.com/pub/Circuit_Cellar/2008/218.
RESOURCES
ARM, "ARM v7-M Architecture Application Level Reference Manual,"ARM
DDI 0406A-01, 2006.
---, "Cortex-M3 Technical Reference Manual," ARM DDI 0337B,2006.
---, "Procedure Call Standard for the ARM Architecture,"
GENC-003534, 2006.
---, "RealView Compilation Tools:Assembler Guide," Version 3.0,
DUI0336B, 2006.
---, "RealView Compilation Tools:Compiler and Libraries Guide,"
Version 3.0, DUI0337B, 2006.
---, "RealView Compilation Tools:Linker and Utilities Guide,"
Version3.0, DUI 0338B, 2006.
Luminary Micro, Inc., "Clocking Options for Stellaris Family
Microcontrollers,"Application Note, AN01240-01, 2006.
---, "LM3S811 Microcontroller Data Sheet," DS-LM3S811-00, July
2006.
---, "Stellaris Driver Library:User's Guide," SW02034-697, 2006.
---, "Stellaris LM3S811 Evaluation Board User's Manual,"
EK-LM3S811-01, 2006.
SOURCES
μVision 3 IDE
Keil
www.keil.com
LM3S811 Microcontroller
Luminary Micro, Inc.
www.luminarymicro.com