Audio feature generator (Si91x)#
The audio feature generator extracts mel-filterbank features from an audio signal to use with machine learning audio classification applications using a microphone as an audio source.
Feature Generation#
The Mel scale replicates the behavior of the human ear, which has a higher resolution for lower frequencies and is less discriminative of the higher frequencies. To create a mel filterbank, a number of filters are applied to the signal, where the pass-band of the lower channel filters is narrow and increases towards higher frequencies.
The audio signal is split into short overlapping segments using a window function (Hamming). The Fast Fourier Transform (FFT) is applied to each segment to retrieve the frequency spectrum and then the power spectrum of the segment. The filterbank is created by applying a series of mel-scaled filters to the output. Finally, the log is applied to the output to increase the sensitivity between the lower channels.
Audio Signal
│
┌───────▼───────┐
│ Windowing │
└───────┬───────┘
│
┌───────▼───────┐
│ FFT │
└───────┬───────┘
│
┌───────▼───────┐
│ Mel Filters │
└───────┬───────┘
│
┌───────▼───────┐
│ Log │
└───────┬───────┘
│
▼
Log-scaled Mel Filterbank
The feature array is generated by stacking filterbanks of sequential segments together to form a spectrogram. The array is sorted such that the first element is the first channel of the oldest filterbank.
Usage#
sl_ml_audio_feature_generation_init() initializes the frontend from a per-model sl_fe_audio_config_t (typically sl_fe_audio_<instance>_cfg from the fe_audio component). It also starts the microphone in streaming mode into a heap-allocated ring buffer sized from config->audio_buffer_size.
Si91x single pipeline (multi-instance configs)#
The fe_audio component may define several sl_fe_audio_<instance>_cfg symbols (one per model). The Si91x implementation still uses one set of global buffers and one FrontendState at a time. Only one pipeline may be active. After validating the new parameters, sl_ml_audio_feature_generation_init and sl_ml_audio_feature_generation_init_with_buffer tear down an existing pipeline (same as sl_ml_audio_feature_generation_deinit) and then bring up the new config, so callers can switch models with a single init call.
Instance configuration (fe_audio)#
Frontend parameters for inference come from the per-model sl_fe_audio_config_t that the fe_audio component exports (typically sl_fe_audio_<instance>_cfg in sl_fe_audio_instances.h, autogenerated from the same TensorFlow Lite flatbuffer metadata produced by the Silicon Labs converter toolchain). Pass &sl_fe_audio_<instance>_cfg to sl_ml_audio_feature_generation_init so the microfrontend, sample rate, ring buffer sizing, and quantization options match that model.
The features are generated when sl_ml_audio_feature_generation_update_features() is called. The feature generator then updates the features for as many new segments of audio as possible, starting from the last time the function was called up until the current time. The new features are appended to the feature buffer, replacing the oldest features such that the feature array always contains the most up to date features.
Note that if the audio buffer is not large enough to hold all audio samples required to generate features between calls to sl_ml_audio_feature_generation_update_features(), audio data will simply be overwritten. The generator will not return an error. The audio buffer must therefore be configured to be large enough to store all new sampled data between updating features.
To retrieve the generated features, either sl_ml_audio_feature_generation_get_features_raw(), sl_ml_audio_feature_generation_get_features_quantized(), or sl_ml_audio_feature_generation_fill_tensor() must be called.
Example#
When used with TensorFlow Lite Micro, the audio feature generator can fill the network input tensor via sl_ml_audio_feature_generation_fill_tensor(). Training and inference must use the same frontend pipeline; with instance-based integration that means initializing with the sl_fe_audio_config_t generated for the deployed .tflite (same embedded FE metadata the converter wrote into the model).
#include "sl_tflite_micro_init.h"
#include "sl_fe_audio_instances.h"
#include "sl_fe_audio_si91x.h"
void main(void)
{
sl_ml_audio_feature_generation_init(&sl_fe_audio_my_model_cfg);
while(1){
sl_ml_audio_feature_generation_update_features();
if(do_inference){
sl_ml_audio_feature_generation_fill_tensor(sl_tflite_micro_get_input_tensor());
sl_tflite_micro_get_interpreter()->Invoke();
}
...
}
}
Note that updating features and retrieving them can be performed independently. Updating features should be done often enough to avoid overwriting the audio buffer while retrieving them only needs to be done prior to inference.
Modules#
Typedefs#
Microphone callback function type for audio feature generation.
Variables#
External audio buffer for audio feature generation (mic / DMA ring backing store).
Functions#
Set up the microphone as an audio source for feature generation and initialize the frontend from config (allocates the audio ring buffer).
Initialize audio feature generation with a user-provided buffer.
Tear down the Si91x audio feature pipeline (mic, internal feature buffer, frontend state).
Initialize the microfrontend and feature buffers from config.
Update the feature buffer with feature slices computed since the last call.
Copy the active spectrogram ring to a uint16 buffer.
Copy the active spectrogram ring to a float32 buffer.
Copy the active spectrogram to buffer with each sample multiplied by scaler.
Mean- and standard-deviation-normalized copy of the active spectrogram into buffer.
Fill a TensorFlow Lite input tensor from the active feature buffer.
Return the number of new or unfetched feature slices that have been updated since the last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.
Return the feature buffer size.
Reset the state of the audio feature generator.
Query whether the activity-detection block has tripped since the last successful read.
Return whether the runtime frontend config has activity detection enabled.
Register the optional microphone PCM callback (or clear it).
Typedef Documentation#
sl_fe_audio_config_t#
typedef struct sl_fe_audio_config sl_fe_audio_config_t
Per-model frontend + buffer/quantization parameters (fe_audio component instance).
sl_ml_audio_feature_generation_mic_callback_t#
typedef void(* sl_ml_audio_feature_generation_mic_callback_t) (void *arg, const int16_t *data, uint32_t n_frames) )(void *arg, const int16_t *data, uint32_t n_frames)
Microphone callback function type for audio feature generation.
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| [in] | arg | User-defined argument passed to the callback. | |
| [in] | data | Pointer to the audio data buffer. | |
| [in] | n_frames | Number of audio frames in the buffer. |
Variable Documentation#
sl_ml_audio_feature_generation_audio_buffer#
int16_t* sl_ml_audio_feature_generation_audio_buffer
External audio buffer for audio feature generation (mic / DMA ring backing store).
Function Documentation#
sl_ml_audio_feature_generation_init#
sl_status_t sl_ml_audio_feature_generation_init (const sl_fe_audio_config_t * config)
Set up the microphone as an audio source for feature generation and initialize the frontend from config (allocates the audio ring buffer).
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| const sl_fe_audio_config_t * | [in] | config | Per-model frontend and buffer parameters; must not be NULL. |
If a pipeline is already active, it is torn down first (same as sl_ml_audio_feature_generation_deinit). A non-OK status from that step is the return value from sl_ml_mic_deinit() during deinit.
After the frontend is initialized, failures from sl_ml_mic_init, sl_ml_mic_start_streaming, or sl_ml_mic_process_action return that driver sl_status_t after cleaning up allocated resources.
Returns
Other non-OK
sl_status_tvalues may be returned from the microphone / I2S driver during teardown or bring-up.
Return values
SL_STATUS_OK: Success.
SL_STATUS_INVALID_PARAMETER: config
SL_STATUS_FAIL: Coarse validation failed (sample rate, window step/size, segment length), or microfrontend state cannot be populated (
SL_STATUS_ALLOCATION_FAILED: The heap audio ring or feature ring could not be allocated.
sl_ml_audio_feature_generation_init_with_buffer#
sl_status_t sl_ml_audio_feature_generation_init_with_buffer (int16_t * buffer, int n_frames, const sl_fe_audio_config_t * config)
Initialize audio feature generation with a user-provided buffer.
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| int16_t * | [in] | buffer | Pointer to the audio buffer to use for feature generation. |
| int | [in] | n_frames | Number of int16 audio samples in the buffer; must equal |
| const sl_fe_audio_config_t * | [in] | config | Per-model frontend and buffer parameters; must not be NULL. |
If a pipeline is already active, it is torn down first (same as sl_ml_audio_feature_generation_deinit). A non-OK status from that step is the return value from sl_ml_mic_deinit() during deinit.
After the frontend is initialized, failures from sl_ml_mic_init, sl_ml_mic_start_streaming, or sl_ml_mic_process_action return that driver sl_status_t after cleaning up allocated resources.
Returns
Other non-OK
sl_status_tvalues may be returned from the microphone / I2S driver during teardown or bring-up.
Return values
SL_STATUS_OK: Success.
SL_STATUS_INVALID_PARAMETER: config
SL_STATUS_FAIL: Coarse validation failed (sample rate, window step/size, segment length), or microfrontend state cannot be populated.
SL_STATUS_ALLOCATION_FAILED: The feature ring buffer could not be allocated.
sl_ml_audio_feature_generation_deinit#
sl_status_t sl_ml_audio_feature_generation_deinit (void )
Tear down the Si91x audio feature pipeline (mic, internal feature buffer, frontend state).
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| void | N/A |
Idempotent: returns SL_STATUS_OK if the pipeline was already inactive. After sl_ml_audio_feature_generation_init, frees the heap audio buffer owned by that API. Buffers passed to sl_ml_audio_feature_generation_init_with_buffer are never freed here (caller retains ownership).
Teardown is best-effort: the internal feature buffer and frontend state are always released after sl_ml_mic_deinit() is attempted. If the microphone returns a non-OK status, the pipeline is still marked inactive and FE memory is freed; the caller may need to handle or log the mic status separately.
Returns
If the pipeline was active and
sl_ml_mic_deinit()returns a value other thanSL_STATUS_OK, that status is returned (feature-buffer and frontend teardown may still have completed).
Return values
SL_STATUS_OK: The microphone deinitialized cleanly, or the pipeline was already inactive.
sl_ml_audio_feature_generation_frontend_init#
sl_status_t sl_ml_audio_feature_generation_frontend_init (const sl_fe_audio_config_t * config)
Initialize the microfrontend and feature buffers from config.
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| const sl_fe_audio_config_t * | [in] | config | Per-model frontend and buffer parameters; must not be NULL. |
sl_ml_audio_feature_generation_update_features#
sl_status_t sl_ml_audio_feature_generation_update_features ()
Update the feature buffer with feature slices computed since the last call.
To retrieve features, call sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.
Note
Call often enough that the audio ring is not overwritten.
sl_ml_audio_feature_generation_get_features_raw#
sl_status_t sl_ml_audio_feature_generation_get_features_raw (uint16_t * buffer, size_t num_elements)
Copy the active spectrogram ring to a uint16 buffer.
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| uint16_t * | [out] | buffer | Pointer to the buffer to store the feature data. |
| size_t | [in] | num_elements | Must equal the active feature element count (sl_ml_audio_feature_generation_get_feature_buffer_size). |
Note
Overwrites the entire
buffer.
sl_ml_audio_feature_generation_get_features_raw_float32#
sl_status_t sl_ml_audio_feature_generation_get_features_raw_float32 (float * buffer, size_t num_elements)
Copy the active spectrogram ring to a float32 buffer.
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| float * | [out] | buffer | Pointer to the buffer to store the feature data. |
| size_t | [in] | num_elements | Must equal the active feature element count (sl_ml_audio_feature_generation_get_feature_buffer_size). |
Note
Overwrites the entire
buffer.
sl_ml_audio_feature_generation_get_features_scaled#
sl_status_t sl_ml_audio_feature_generation_get_features_scaled (float * buffer, size_t num_elements, float scaler)
Copy the active spectrogram to buffer with each sample multiplied by scaler.
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| float * | [out] | buffer | Pointer to the buffer to store the scaled feature data. |
| size_t | [in] | num_elements | Must equal the active feature element count (sl_ml_audio_feature_generation_get_feature_buffer_size). |
| float | [in] | scaler | Scaling factor applied to each feature value (must be non-zero). |
Computes buffer[i] = (float)feature[i] * scaler for each element in the ring.
Note
Overwrites the entire
buffer.
sl_ml_audio_feature_generation_get_features_mean_std_normalized#
sl_status_t sl_ml_audio_feature_generation_get_features_mean_std_normalized (float * buffer, size_t num_elements)
Mean- and standard-deviation-normalized copy of the active spectrogram into buffer.
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| float * | [out] | buffer | Pointer to the buffer to store the normalized feature data. |
| size_t | [in] | num_elements | Must equal the active feature element count (sl_ml_audio_feature_generation_get_feature_buffer_size). |
For each element: (float(value) - mean) / std. Near-constant input yields zeros in buffer.
Note
Overwrites the entire
buffer.
sl_ml_audio_feature_generation_fill_tensor#
sl_status_t sl_ml_audio_feature_generation_fill_tensor (TfLiteTensor * input_tensor)
Fill a TensorFlow Lite input tensor from the active feature buffer.
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| TfLiteTensor * | [in] | input_tensor | The input tensor to fill with features. |
Writes packed tensor data according to input_tensor->type:
kTfLiteInt8:static or dynamically quantized int8 (per activesl_fe_audio_config_t).kTfLiteUInt16:raw uint16 spectrogram.kTfLiteFloat32:raw float, scaled float, or mean/std-normalized float (per active config).
Note
Overwrites the entire input tensor payload.
sl_ml_audio_feature_generation_get_new_feature_slice_count#
int sl_ml_audio_feature_generation_get_new_feature_slice_count ()
Return the number of new or unfetched feature slices that have been updated since the last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.
Returns
Number of unfetched feature slices (non-negative). Zero if none pending or if the pipeline is not initialized.
sl_ml_audio_feature_generation_get_feature_buffer_size#
int sl_ml_audio_feature_generation_get_feature_buffer_size ()
Return the feature buffer size.
Returns
Number of uint16 feature elements in the active spectrogram ring, or
0if the pipeline / feature buffer is not initialized.
sl_ml_audio_feature_generation_reset#
void sl_ml_audio_feature_generation_reset ()
Reset the state of the audio feature generator.
sl_ml_audio_feature_generation_activity_detected#
sl_status_t sl_ml_audio_feature_generation_activity_detected ()
Query whether the activity-detection block has tripped since the last successful read.
Call sl_ml_audio_feature_generation_update_features periodically so the detector advances on new audio.
Note
Activity detection must be enabled in the active
sl_fe_audio_config_t(see sl_ml_audio_feature_generation_is_activity_detection_enabled).Internal trip state is cleared when this returns
SL_STATUS_OK; a subsequent call returnsSL_STATUS_IN_PROGRESSuntil new activity is detected.
sl_ml_audio_feature_generation_is_activity_detection_enabled#
bool sl_ml_audio_feature_generation_is_activity_detection_enabled (void )
Return whether the runtime frontend config has activity detection enabled.
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| void | N/A |
After sl_ml_audio_feature_generation_init, reflects the active sl_fe_audio_config_t.
sl_ml_audio_feature_generation_set_mic_callback#
sl_status_t sl_ml_audio_feature_generation_set_mic_callback (sl_ml_audio_feature_generation_mic_callback_t callback, void * arg)
Register the optional microphone PCM callback (or clear it).
| Type | Direction | Argument Name | Description |
|---|---|---|---|
| sl_ml_audio_feature_generation_mic_callback_t | [in] | callback | Function pointer (may be NULL to clear). |
| void * | [in] | arg | User argument passed to |