Audio feature generator (Si91x)#

The audio feature generator extracts mel-filterbank features from an audio signal to use with machine learning audio classification applications using a microphone as an audio source.

Feature Generation#

The Mel scale replicates the behavior of the human ear, which has a higher resolution for lower frequencies and is less discriminative of the higher frequencies. To create a mel filterbank, a number of filters are applied to the signal, where the pass-band of the lower channel filters is narrow and increases towards higher frequencies.

The audio signal is split into short overlapping segments using a window function (Hamming). The Fast Fourier Transform (FFT) is applied to each segment to retrieve the frequency spectrum and then the power spectrum of the segment. The filterbank is created by applying a series of mel-scaled filters to the output. Finally, the log is applied to the output to increase the sensitivity between the lower channels.

         Audio Signal
              │
      ┌───────▼───────┐
      │   Windowing   │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │      FFT      │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │  Mel Filters  │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │      Log      │
      └───────┬───────┘
              │
              ▼
   Log-scaled Mel Filterbank
         Audio Signal
              │
      ┌───────▼───────┐
      │   Windowing   │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │      FFT      │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │  Mel Filters  │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │      Log      │
      └───────┬───────┘
              │
              ▼
   Log-scaled Mel Filterbank

The feature array is generated by stacking filterbanks of sequential segments together to form a spectrogram. The array is sorted such that the first element is the first channel of the oldest filterbank.

Usage#

sl_ml_audio_feature_generation_init() initializes the frontend from a per-model sl_fe_audio_config_t (typically sl_fe_audio_<instance>_cfg from the fe_audio component). It also starts the microphone in streaming mode into a heap-allocated ring buffer sized from config->audio_buffer_size.

Si91x single pipeline (multi-instance configs)#

The fe_audio component may define several sl_fe_audio_<instance>_cfg symbols (one per model). The Si91x implementation still uses one set of global buffers and one FrontendState at a time. Only one pipeline may be active. After validating the new parameters, sl_ml_audio_feature_generation_init and sl_ml_audio_feature_generation_init_with_buffer tear down an existing pipeline (same as sl_ml_audio_feature_generation_deinit) and then bring up the new config, so callers can switch models with a single init call.

Instance configuration (fe_audio)#

Frontend parameters for inference come from the per-model sl_fe_audio_config_t that the fe_audio component exports (typically sl_fe_audio_<instance>_cfg in sl_fe_audio_instances.h, autogenerated from the same TensorFlow Lite flatbuffer metadata produced by the Silicon Labs converter toolchain). Pass &sl_fe_audio_<instance>_cfg to sl_ml_audio_feature_generation_init so the microfrontend, sample rate, ring buffer sizing, and quantization options match that model.

The features are generated when sl_ml_audio_feature_generation_update_features() is called. The feature generator then updates the features for as many new segments of audio as possible, starting from the last time the function was called up until the current time. The new features are appended to the feature buffer, replacing the oldest features such that the feature array always contains the most up to date features.

Note that if the audio buffer is not large enough to hold all audio samples required to generate features between calls to sl_ml_audio_feature_generation_update_features(), audio data will simply be overwritten. The generator will not return an error. The audio buffer must therefore be configured to be large enough to store all new sampled data between updating features.

To retrieve the generated features, either sl_ml_audio_feature_generation_get_features_raw(), sl_ml_audio_feature_generation_get_features_quantized(), or sl_ml_audio_feature_generation_fill_tensor() must be called.

Example#

When used with TensorFlow Lite Micro, the audio feature generator can fill the network input tensor via sl_ml_audio_feature_generation_fill_tensor(). Training and inference must use the same frontend pipeline; with instance-based integration that means initializing with the sl_fe_audio_config_t generated for the deployed .tflite (same embedded FE metadata the converter wrote into the model).

#include "sl_tflite_micro_init.h"
#include "sl_fe_audio_instances.h"
#include "sl_fe_audio_si91x.h"

void main(void)
{
  sl_ml_audio_feature_generation_init(&sl_fe_audio_my_model_cfg);

  while(1){
    sl_ml_audio_feature_generation_update_features();

    if(do_inference){
      sl_ml_audio_feature_generation_fill_tensor(sl_tflite_micro_get_input_tensor());
      sl_tflite_micro_get_interpreter()->Invoke();
    }

    ...

  }
}
#include "sl_tflite_micro_init.h"
#include "sl_fe_audio_instances.h"
#include "sl_fe_audio_si91x.h"

void main(void)
{
  sl_ml_audio_feature_generation_init(&sl_fe_audio_my_model_cfg);

  while(1){
    sl_ml_audio_feature_generation_update_features();

    if(do_inference){
      sl_ml_audio_feature_generation_fill_tensor(sl_tflite_micro_get_input_tensor());
      sl_tflite_micro_get_interpreter()->Invoke();
    }

    ...

  }
}

Note that updating features and retrieving them can be performed independently. Updating features should be done often enough to avoid overwriting the audio buffer while retrieving them only needs to be done prior to inference.

Modules#

Deprecated audio feature APIs

Typedefs#

typedef void(*

sl_ml_audio_feature_generation_mic_callback_t)(void *arg, const int16_t *data, uint32_t n_frames)

Microphone callback function type for audio feature generation.

Variables#

int16_t *

sl_ml_audio_feature_generation_audio_buffer

External audio buffer for audio feature generation (mic / DMA ring backing store).

Functions#

sl_status_t

sl_ml_audio_feature_generation_init(const sl_fe_audio_config_t *config)

Set up the microphone as an audio source for feature generation and initialize the frontend from config (allocates the audio ring buffer).

sl_status_t

sl_ml_audio_feature_generation_init_with_buffer(int16_t *buffer, int n_frames, const sl_fe_audio_config_t *config)

Initialize audio feature generation with a user-provided buffer.

sl_status_t

sl_ml_audio_feature_generation_deinit(void)

Tear down the Si91x audio feature pipeline (mic, internal feature buffer, frontend state).

sl_status_t

sl_ml_audio_feature_generation_frontend_init(const sl_fe_audio_config_t *config)

Initialize the microfrontend and feature buffers from config.

sl_status_t

sl_ml_audio_feature_generation_update_features()

Update the feature buffer with feature slices computed since the last call.

sl_status_t

sl_ml_audio_feature_generation_get_features_raw(uint16_t *buffer, size_t num_elements)

Copy the active spectrogram ring to a uint16 buffer.

sl_status_t

sl_ml_audio_feature_generation_get_features_raw_float32(float *buffer, size_t num_elements)

Copy the active spectrogram ring to a float32 buffer.

sl_status_t

sl_ml_audio_feature_generation_get_features_scaled(float *buffer, size_t num_elements, float scaler)

Copy the active spectrogram to buffer with each sample multiplied by scaler.

sl_status_t

sl_ml_audio_feature_generation_get_features_mean_std_normalized(float *buffer, size_t num_elements)

Mean- and standard-deviation-normalized copy of the active spectrogram into buffer.

sl_status_t

sl_ml_audio_feature_generation_fill_tensor(TfLiteTensor *input_tensor)

Fill a TensorFlow Lite input tensor from the active feature buffer.

int

sl_ml_audio_feature_generation_get_new_feature_slice_count()

Return the number of new or unfetched feature slices that have been updated since the last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.

int

sl_ml_audio_feature_generation_get_feature_buffer_size()

Return the feature buffer size.

void

sl_ml_audio_feature_generation_reset()

Reset the state of the audio feature generator.

sl_status_t

sl_ml_audio_feature_generation_activity_detected()

Query whether the activity-detection block has tripped since the last successful read.

bool

sl_ml_audio_feature_generation_is_activity_detection_enabled(void)

Return whether the runtime frontend config has activity detection enabled.

sl_status_t

sl_ml_audio_feature_generation_set_mic_callback(sl_ml_audio_feature_generation_mic_callback_t callback, void *arg)

Typedef Documentation#

sl_ml_audio_feature_generation_mic_callback_t#

typedef void(* sl_ml_audio_feature_generation_mic_callback_t) (void *arg, const int16_t *data, uint32_t n_frames) )(void *arg, const int16_t *data, uint32_t n_frames)

Microphone callback function type for audio feature generation.

Parameters

Direction	Argument Name	Description
[in]	arg	User-defined argument passed to the callback.
[in]	data	Pointer to the audio data buffer.
[in]	n_frames	Number of audio frames in the buffer.

Variable Documentation#

sl_ml_audio_feature_generation_audio_buffer#

int16_t* sl_ml_audio_feature_generation_audio_buffer

External audio buffer for audio feature generation (mic / DMA ring backing store).

Function Documentation#

sl_ml_audio_feature_generation_init#

sl_status_t sl_ml_audio_feature_generation_init (const sl_fe_audio_config_t * config)

Set up the microphone as an audio source for feature generation and initialize the frontend from config (allocates the audio ring buffer).

Parameters

Type	Direction	Argument Name	Description
const sl_fe_audio_config_t *	[in]	config	Per-model frontend and buffer parameters; must not be NULL.

If a pipeline is already active, it is torn down first (same as sl_ml_audio_feature_generation_deinit). A non-OK status from that step is the return value from sl_ml_mic_deinit() during deinit.

After the frontend is initialized, failures from sl_ml_mic_init, sl_ml_mic_start_streaming, or sl_ml_mic_process_action return that driver sl_status_t after cleaning up allocated resources.

Returns

Other non-OK sl_status_t values may be returned from the microphone / I2S driver during teardown or bring-up.

Return values

SL_STATUS_OK: Success.
SL_STATUS_INVALID_PARAMETER: config
SL_STATUS_FAIL: Coarse validation failed (sample rate, window step/size, segment length), or microfrontend state cannot be populated (
SL_STATUS_ALLOCATION_FAILED: The heap audio ring or feature ring could not be allocated.

sl_ml_audio_feature_generation_init_with_buffer#

sl_status_t sl_ml_audio_feature_generation_init_with_buffer (int16_t * buffer, int n_frames, const sl_fe_audio_config_t * config)

Initialize audio feature generation with a user-provided buffer.

Parameters

Type	Direction	Argument Name	Description
int16_t *	[in]	buffer	Pointer to the audio buffer to use for feature generation.
int	[in]	n_frames	Number of int16 audio samples in the buffer; must equal `config->audio_buffer_size`.
const sl_fe_audio_config_t *	[in]	config	Per-model frontend and buffer parameters; must not be NULL.

If a pipeline is already active, it is torn down first (same as sl_ml_audio_feature_generation_deinit). A non-OK status from that step is the return value from sl_ml_mic_deinit() during deinit.

After the frontend is initialized, failures from sl_ml_mic_init, sl_ml_mic_start_streaming, or sl_ml_mic_process_action return that driver sl_status_t after cleaning up allocated resources.

Returns

Other non-OK sl_status_t values may be returned from the microphone / I2S driver during teardown or bring-up.

Return values

SL_STATUS_OK: Success.
SL_STATUS_INVALID_PARAMETER: config
SL_STATUS_FAIL: Coarse validation failed (sample rate, window step/size, segment length), or microfrontend state cannot be populated.
SL_STATUS_ALLOCATION_FAILED: The feature ring buffer could not be allocated.

sl_ml_audio_feature_generation_deinit#

sl_status_t sl_ml_audio_feature_generation_deinit (void )

Tear down the Si91x audio feature pipeline (mic, internal feature buffer, frontend state).

Parameters

Type	Direction	Argument Name	Description
void	N/A

Idempotent: returns SL_STATUS_OK if the pipeline was already inactive. After sl_ml_audio_feature_generation_init, frees the heap audio buffer owned by that API. Buffers passed to sl_ml_audio_feature_generation_init_with_buffer are never freed here (caller retains ownership).

Teardown is best-effort: the internal feature buffer and frontend state are always released after sl_ml_mic_deinit() is attempted. If the microphone returns a non-OK status, the pipeline is still marked inactive and FE memory is freed; the caller may need to handle or log the mic status separately.

Returns

If the pipeline was active and sl_ml_mic_deinit() returns a value other than SL_STATUS_OK, that status is returned (feature-buffer and frontend teardown may still have completed).

Return values

SL_STATUS_OK: The microphone deinitialized cleanly, or the pipeline was already inactive.

sl_ml_audio_feature_generation_frontend_init#

sl_status_t sl_ml_audio_feature_generation_frontend_init (const sl_fe_audio_config_t * config)

Initialize the microfrontend and feature buffers from config.

Parameters

Type	Direction	Argument Name	Description
const sl_fe_audio_config_t *	[in]	config	Per-model frontend and buffer parameters; must not be NULL.

sl_ml_audio_feature_generation_update_features#

sl_status_t sl_ml_audio_feature_generation_update_features ()

Update the feature buffer with feature slices computed since the last call.

To retrieve features, call sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.

Note

Call often enough that the audio ring is not overwritten.

sl_ml_audio_feature_generation_get_features_raw#

sl_status_t sl_ml_audio_feature_generation_get_features_raw (uint16_t * buffer, size_t num_elements)

Copy the active spectrogram ring to a uint16 buffer.

Parameters

Type	Direction	Argument Name	Description
uint16_t *	[out]	buffer	Pointer to the buffer to store the feature data.
size_t	[in]	num_elements	Must equal the active feature element count (sl_ml_audio_feature_generation_get_feature_buffer_size).

Note

Overwrites the entire buffer.

sl_ml_audio_feature_generation_get_features_raw_float32#

sl_status_t sl_ml_audio_feature_generation_get_features_raw_float32 (float * buffer, size_t num_elements)

Copy the active spectrogram ring to a float32 buffer.

Parameters

Type	Direction	Argument Name	Description
float *	[out]	buffer	Pointer to the buffer to store the feature data.
size_t	[in]	num_elements	Must equal the active feature element count (sl_ml_audio_feature_generation_get_feature_buffer_size).

Note

Overwrites the entire buffer.

sl_ml_audio_feature_generation_get_features_scaled#

sl_status_t sl_ml_audio_feature_generation_get_features_scaled (float * buffer, size_t num_elements, float scaler)

Copy the active spectrogram to buffer with each sample multiplied by scaler.

Parameters

Type	Direction	Argument Name	Description
float *	[out]	buffer	Pointer to the buffer to store the scaled feature data.
size_t	[in]	num_elements	Must equal the active feature element count (sl_ml_audio_feature_generation_get_feature_buffer_size).
float	[in]	scaler	Scaling factor applied to each feature value (must be non-zero).

Computes buffer[i] = (float)feature[i] * scaler for each element in the ring.

Note

Overwrites the entire buffer.

sl_ml_audio_feature_generation_get_features_mean_std_normalized#

sl_status_t sl_ml_audio_feature_generation_get_features_mean_std_normalized (float * buffer, size_t num_elements)

Mean- and standard-deviation-normalized copy of the active spectrogram into buffer.

Parameters

Type	Direction	Argument Name	Description
float *	[out]	buffer	Pointer to the buffer to store the normalized feature data.
size_t	[in]	num_elements	Must equal the active feature element count (sl_ml_audio_feature_generation_get_feature_buffer_size).

For each element: (float(value) - mean) / std. Near-constant input yields zeros in buffer.

Note

Overwrites the entire buffer.

sl_ml_audio_feature_generation_fill_tensor#

sl_status_t sl_ml_audio_feature_generation_fill_tensor (TfLiteTensor * input_tensor)

Fill a TensorFlow Lite input tensor from the active feature buffer.

Parameters

Type	Direction	Argument Name	Description
TfLiteTensor *	[in]	input_tensor	The input tensor to fill with features.

Writes packed tensor data according to input_tensor->type:

kTfLiteInt8: static or dynamically quantized int8 (per active sl_fe_audio_config_t).
kTfLiteUInt16: raw uint16 spectrogram.
kTfLiteFloat32: raw float, scaled float, or mean/std-normalized float (per active config).

Note

Overwrites the entire input tensor payload.

sl_ml_audio_feature_generation_get_new_feature_slice_count#

int sl_ml_audio_feature_generation_get_new_feature_slice_count ()

Return the number of new or unfetched feature slices that have been updated since the last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.

Returns

Number of unfetched feature slices (non-negative). Zero if none pending or if the pipeline is not initialized.

sl_ml_audio_feature_generation_get_feature_buffer_size#

int sl_ml_audio_feature_generation_get_feature_buffer_size ()

Return the feature buffer size.

Returns

Number of uint16 feature elements in the active spectrogram ring, or 0 if the pipeline / feature buffer is not initialized.

sl_ml_audio_feature_generation_reset#

void sl_ml_audio_feature_generation_reset ()

Reset the state of the audio feature generator.

sl_ml_audio_feature_generation_activity_detected#

sl_status_t sl_ml_audio_feature_generation_activity_detected ()

Query whether the activity-detection block has tripped since the last successful read.

Call sl_ml_audio_feature_generation_update_features periodically so the detector advances on new audio.

Note

Activity detection must be enabled in the active sl_fe_audio_config_t (see sl_ml_audio_feature_generation_is_activity_detection_enabled).
Internal trip state is cleared when this returns SL_STATUS_OK; a subsequent call returns SL_STATUS_IN_PROGRESS until new activity is detected.

sl_ml_audio_feature_generation_is_activity_detection_enabled#

bool sl_ml_audio_feature_generation_is_activity_detection_enabled (void )

Return whether the runtime frontend config has activity detection enabled.

Parameters

Type	Direction	Argument Name	Description
void	N/A

After sl_ml_audio_feature_generation_init, reflects the active sl_fe_audio_config_t.

sl_ml_audio_feature_generation_set_mic_callback#

sl_status_t sl_ml_audio_feature_generation_set_mic_callback (sl_ml_audio_feature_generation_mic_callback_t callback, void * arg)

Parameters

Type	Direction	Argument Name	Description
sl_ml_audio_feature_generation_mic_callback_t	[in]	callback	Function pointer (may be NULL to clear).
void *	[in]	arg	User argument passed to `callback`.