Interrupt and Thread Safety in RAIL#

In embedded software development, some of the most complicated debug challenges are caused by calling non-reentrant functions in interrupt context. Hence, it's in the developer's best interest to carefully design their application to avoid these scenarios.

To do so, however, requires sufficiently detailed knowledge of the interrupts and API functions - which are not accessible in a closed-source product like RAIL. This document aims to provide the information required to develop interrupt- and thread-safe applications in RAIL.

Thread Safety#

In an application without a task scheduler, only an interrupt request can interrupt the main program. If you use a preemptive scheduler, like the scheduler available in most embedded OSes, higher priority tasks can interrupt lower priority tasks as well. Regardless, when looking at the RAIL APIs, the same concerns are present in either case:

Is it safe to interrupt this API?
Is it safe to call this API from a thread/interrupt which interrupted something?

The Event Handler#

RAIL uses an event handler, which is set up by RAIL_Init(). In our examples, it's usually called sl_rail_util_on_event(). This function is called by the RAIL library, and it's almost always called from an interrupt handler. This means the event handler should be used with care:

It should be kept in mind that interrupts are disabled when the event handler is running, so the function must not take long to return.
More importantly, the function might be interrupting the main loop (or some other task).

Note that the first point above might not be completely true if interrupt priorities are used, in which case only interrupts at the same and lower priorities are disabled. However, the event handler will never be interrupted by another event handler as all RAIL interrupts must be used at the same priority.

General Rules for the RAIL API#

First, let's collect the general rules of the API, and we'll detail exceptions in later points:

Calling any RAIL API from the main thread (or a single OS thread) is safe.
Calling any API from multiple threads is unsafe, except for DMP.
Calling most APIs from an interrupt handler is safe (see exceptions below).

Dynamic Multiprotocol (DMP)#

In general, if you have a multi-threaded application, you should use RAIL from a single thread. The exception to this guidance is DMP, where in most cases each protocol runs in its own thread. In this scenario, using RAIL from each thread is safe, as each protocol has its own rail_handle. So, a more generalized wording of rule 2 is:

Calling any API from multiple threads is only safe if each thread has a dedicated rail_handle, and each thread only accesses RAIL with its own handle.

The few APIs that don't use rail_handle - like RAIL_GetTime(), RAIL_Sleep() , or RAIL_Wake() - can be called from any thread.

Interrupt Safety in General#

In general, calling an API which changes the radio state (i.e., between Rx, Idle and Tx) can be risky. The simplest way to write interrupt safe application is to not call state changing APIs from any interrupt handler, including the RAIL event handler. This can be achieved by setting a flag or changing a state variable in the event handler instead of calling an API directly:

typedef enum  {
  S_IDLE,
  S_START_RX,
  S_START_TX,
} state_t;

volatile state_t state;
volatile RAIL_Time_t last_event;

int main(){
  //init code

  state = S_START_TX;
  while(1){
    switch(state){
      case S_START_TX:
        RAIL_StartTx(rail_handle, 0, RAIL_TX_OPTIONS_DEFAULT, NULL);
        state = S_IDLE;
        break;
      case S_START_RX:
        RAIL_StartRx(rail_handle, 0, NULL);
        state = S_IDLE;
        break;
      default:
        break;
    }
  }
  return 0;
}

void sl_rail_util_on_event(RAIL_Handle_t rail_handle, RAIL_Events_t events)
{
  last_event = RAIL_GetTime();
  if ( events & RAIL_EVENTS_TX_COMPLETION ){
    state = S_START_RX;
    RAIL_SetTxPower(rail_handle, 200);
  }
}

Note that some RAIL APIs were called from the event handler, but none of those were state changing APIs.

Interrupt Safety with State Changing APIs#

In some (usually time critical) cases however, it's not possible to avoid calling state changing APIs from the event handler (or other interrupt handler). State changing APIs are not always risky: Some APIs might be safe, as long as they don't interrupt another specific API.

Hence, in the following list, we identify the risky API after first specifying which initially-running (i.e., "interrupted") API makes it risky (and how). We've included in this list some interrupt combinations that might be "safe", but the end result is not predictable - i.e., the radio might be in Rx or in Idle, depending on which API is called first.

Interrupting RAIL_Start<something>() with another RAIL_Start<something>() is risky, especially if they would start on different channels.
Interrupting RAIL_Idle(handle, <something>, true) with any RAIL_Start<something>() is risky.
Interrupting RAIL_Idle(handle, <something>, false) with any RAIL_Start<something>() is safe, but the end result is not predictable (i.e., the radio will either be in Idle, or start the requested operation).
Interrupting RAIL_Start<something>() with RAIL_Idle() is safe but the end result is not predictable, and might cause strange events (see the next section for details).
Interrupting RAIL_StopTxStream() with any RAIL_Start<something>() is very risky (the radio might remain in test configuration and start transmitting/receiving).
Interrupting RAIL_StopTx() is safe. Interrupting RAIL_StopTx() with RAIL_Start<something>() is safe but the end result is not predictable (i.e., the radio will either be in Idle, or start the requested operation).
Interrupting anything with RAIL_StopTx() is safe (see next section for important clarification). Interrupting RAIL_StartTx() with RAIL_StopTx(handle, RAIL_STOP_MODE_ACTIVE) is safe, but not predictable.
Interrupting anything with RAIL_StopTxStream() is safe. Interrupting RAIL_StartTxStream() with RAIL_StopTxStream() is safe but not predictable.

RAIL_Idle in the Event Handler#

Calling RAIL_Idle() or RAIL_StopTx(rail_handle, RAIL_STOP_MODE_ACTIVE) from the event handler might cause strange results. For example, let's say you're receiving on a channel and want to detect preambles using the event RAIL_EVENT_RX_PREAMBLE_DETECT and RAIL_EVENT_RX_PREAMBLE_LOST. The following scenario may unfold:

Preamble lost interrupt is received, so (at least) other radio interrupts are temporarily disabled.
You enter the event handler with RAIL_EVENT_RX_PREAMBLE_LOST.
At this point, the radio detects a preamble. The interrupt is logged, but the handler cannot run since the interrupts are masked.
Still in the event handler, you decide to turn off the radio with RAIL_Idle(railHandle, RAIL_IDLE_ABORT, true).
The radio turning off will generate a preamble lost interrupt.
The radio is now off, and you return from the event handler.
Interrupts are enabled again, so the pending preamble detect interrupt handler starts running.
You enter the event handler with RAIL_EVENT_RX_PREAMBLE_DETECT and RAIL_EVENT_RX_PREAMBLE_LOST both set at the same time.

So you end up with a preamble detect event, even though the radio is off. This is usually harmless, since you always have the _LOST or _ABORTED event as well, but this demonstrates why your design must carefully consider in what order to handle events.

The easiest way to avoid this conflicted outcome is to disable the events that might cause problems when turning off the radio.

Another way to avoid this issue is to use RAIL_Idle(rail_handle, RAIL_IDLE_FORCE_SHUTDOWN_CLEAR_FLAGS, true), which will clear the pending interrupts. However, using RAIL_IDLE_FORCE_SHUTDOWN_CLEAR_FLAGS has other drawbacks. It does force the radio state machine to idle state, and it might corrupt the transmit or receive FIFOs - in which case it must clear them, losing all data that might already be in there. It could also take more time to finish running than RAIL_IDLE_ABORT.

Critical Blocks#

One usual way to avoid internal safety issues is to create critical (a.k.a. atomic, although in case of EFR32, their meaning can be different) blocks, in which interrupts are disabled, in the main thread to make sure some code segment is never interrupted. However, this can create other problems, so it should be used carefully. There's no general rule to avoid this kind of "collateral damage", but here's an example that should be avoided:

RSSI averaging is running, and just before it finishes, we interrupt it with RAIL_StartTx() which is called from a critical block. The following race condition could happen:

We enter the critical block, interrupts are disabled.
RSSI averaging done interrupt is received, but the interrupt handler won't start since interrupts are masked.
StartTx turns off the radio, prepares it for transmit, then starts transmitting.
We leave the critical block, interrupts are enabled again.
RSSI averaging done interrupt handler runs at this point which will turn off the radio, aborting the current transmit.

One way to avoid the problem above is to clear interrupts in the critical block. This can be done by using RAIL_Idle(handle, RAIL_IDLE_FORCE_SHUTDOWN_CLEAR_FLAGS, true) at the beginning of the critical block, but the drawbacks of doing so (mentioned above) should be kept in mind. In general, it's better to avoid risky interrupts without using critical blocks in the main thread.

Using FORCE_SHUTDOWN#

In the two sections above we mentioned two usecases where RAIL_IDLE_FORCE_SHUTDOWN_CLEAR_FLAGS can be useful. In general however, RAIL_IDLE or RAIL_IDLE_ABORT is a sufficient and preferred way to stop transmitting/receiving - therefore the FORCE_SHUTDOWN modes should be only used when they are really needed (as in the specific scenarios described here). For more details, see the article on the idle modes.