Audio Feature Generator

Description

The audio feature generator extracts mel-filterbank features from an audio signal to use with machine learning audio classification applications using a microphone as an audio source.

Feature Generation

The Mel scale replicates the behavior of the human ear, which has a higher resolution for lower frequencies and is less discriminative of the higher frequencies. To create a mel filterbank, a number of filters are applied to the signal, where the pass-band of the lower channel filters is narrow and increases towards higher frequencies.

The audio signal is split into short overlapping segments using a window function (Hamming). The Fast Fourier Transform (FFT) is applied to each segment to retrieve the frequency spectrum and then the power spectrum of the segment. The filterbank is created by applying a series of mel-scaled filters to the output. Finally, the log is applied to the output to increase the sensitivity between the lower channels.

         Audio Signal
              │
      ┌───────▼───────┐
      │   Windowing   │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │      FFT      │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │  Mel Filters  │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │      Log      │
      └───────┬───────┘
              │
              ▼
   Log-scaled Mel Filterbank

The feature array is generated by stacking filterbanks of sequential segments together to form a spectrogram. The array is sorted such that the first element is the first channel of the oldest filterbank.

Usage

sl_ml_audio_feature_generation_init() initializes the frontend for feature generation based on the configuration in sl_ml_audio_feature_generation_config.h. It also initializes and starts the microphone in streaming mode, which places the audio samples into a ring-buffer.

If used together with the Flatbuffer Converter Tool and a compatible TensorFlow Lite model, the configuration is pulled from the TensorFlow Lite model by default. Set the configuration option SL_ML_AUDIO_FEATURE_GENERATION_MANUAL_CONFIG_ENABLE to override this behavior and use manually-configured options from the configuration header.

The features are generated when sl_ml_audio_feature_generation_update_features() is called. The feature generator then updates the features for as many new segments of audio as possible, starting from the last time the function was called up until the current time. The new features are appended to the feature buffer, replacing the oldest features such that the feature array always contains the most up to date features.

Note that if the audio buffer is not large enough to hold all audio samples required to generate features between calls to sl_ml_audio_feature_generation_update_features() , audio data will simply be overwritten. The generator will not return an error. The audio buffer must therefore be configured to be large enough to store all new sampled data between updating features.

To retrieve the generated features, either sl_ml_audio_feature_generation_get_features_raw() , sl_ml_audio_feature_generation_get_features_quantized(), or sl_ml_audio_feature_generation_fill_tensor() must be called.

Example

When used with TensorFlow Lite Micro, the audio feature generator can be used to fill a tensor directly by using sl_ml_audio_feature_generation_fill_tensor() . However, the model has to be trained using the same feature generator configurations as used for inference, configured in sl_ml_audio_feature_generation_config.h.

#include "sl_tflite_micro_init.h"
#include "sl_ml_audio_feature_generation.h"
void main( void )
{
while (1){
if (do_inference){
}
...
}
}

Note that updating features and retrieving them can be performed independently. Updating features should be done often enough to avoid overwriting the audio buffer while retrieving them only needs to be done prior to inference.

Functions

sl_status_t sl_ml_audio_feature_generation_init ()
Set up the microphone as an audio source for feature generation and initialize the frontend for feature generation.
sl_status_t sl_ml_audio_feature_generation_frontend_init ()
Initialize microfrontend according to the configuration in sl_ml_audio_feature_generation_config.h.
sl_status_t sl_ml_audio_feature_generation_update_features ()
Update the feature buffer with the missing feature slices since the last call to this function.
sl_status_t sl_ml_audio_feature_generation_get_features_raw (uint16_t *buffer, size_t num_elements)
Retrieve the features as type uint16 and copy them to the provided buffer.
sl_status_t sl_ml_audio_feature_generation_fill_tensor (TfLiteTensor *input_tensor)
Fill a TensorFlow tensor with feature data of type int8.
int sl_ml_audio_feature_generation_get_new_feature_slice_count ()
Return the number of new or unfetched feature slices that have been updated since the last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.
int sl_ml_audio_feature_generation_get_feature_buffer_size ()
Return the feature buffer size.
void sl_ml_audio_feature_generation_reset ()
Reset the state of the audio feature generator.

Function Documentation

sl_ml_audio_feature_generation_init()

sl_status_t sl_ml_audio_feature_generation_init ( )

Set up the microphone as an audio source for feature generation and initialize the frontend for feature generation.

Returns
SL_STATUS_OK for success SL_STATUS_FAIL

sl_ml_audio_feature_generation_frontend_init()

sl_status_t sl_ml_audio_feature_generation_frontend_init ( )

Initialize microfrontend according to the configuration in sl_ml_audio_feature_generation_config.h.

Returns
SL_STATUS_OK for success SL_STATUS_FAIL

sl_ml_audio_feature_generation_update_features()

sl_status_t sl_ml_audio_feature_generation_update_features ( )

Update the feature buffer with the missing feature slices since the last call to this function.

To retrieve the features, call sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.

Note
This function needs to be called often enough to ensure that the audio buffer isn't overwritten.
Returns
SL_STATUS_OK for success SL_STATUS_EMPTY No new slices were calculated

sl_ml_audio_feature_generation_get_features_raw()

sl_status_t sl_ml_audio_feature_generation_get_features_raw ( uint16_t * buffer,
size_t num_elements
)

Retrieve the features as type uint16 and copy them to the provided buffer.

Parameters
[out] buffer Pointer to the buffer to store the feature data
[in] num_elements The number of elements corresponding to the size of the buffer; If this is not large enough to store the entire feature buffer the function will return with an error.
Note
This function overwrites the entire buffer.
Returns
SL_STATUS_OK for success SL_STATUS_INVALID_PARAMETER num_elements too small

sl_ml_audio_feature_generation_fill_tensor()

sl_status_t sl_ml_audio_feature_generation_fill_tensor ( TfLiteTensor * input_tensor )

Fill a TensorFlow tensor with feature data of type int8.

The int8 values are derived by quantizing the microfrontend output, expected to be in the range 0 to 670, to signed integer numbers in -128 to 127 range.

Parameters
[in] input_tensor The input tensor to fill with features.
Note
This function overwrites the entire input tensor.
Supports tensors of type kTfLiteInt8.
Returns
SL_STATUS_OK for success SL_STATUS_INVALID_PARAMETER Tensor type or size does not correspond with configuration

sl_ml_audio_feature_generation_get_new_feature_slice_count()

int sl_ml_audio_feature_generation_get_new_feature_slice_count ( )

Return the number of new or unfetched feature slices that have been updated since the last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.

Returns
The number of unfetched feature slices

sl_ml_audio_feature_generation_get_feature_buffer_size()

int sl_ml_audio_feature_generation_get_feature_buffer_size ( )

Return the feature buffer size.

Returns
Size of the feature buffer

sl_ml_audio_feature_generation_reset()

void sl_ml_audio_feature_generation_reset ( )

Reset the state of the audio feature generator.