Audio Feature Generator

Description

The audio feature generator is used to extract mel-filterbank features from an audio signal for use with machine learning audio classification applications using a microphone as an audio source.

Feature Generation

The Mel scale replicates the behaviour of the human ear, which has a higher resolution for lower frequencies and are less discriminative of the higher frequencies. To create a mel filterbank, a number of filters are applied to the signal where the pass-band of the lower channel filters is narrow and increases towards higher frequencies.

The audio signal is split into short overlapping segments using a window function (Hamming). The FFT (Fast Fourier Transform) is applied to each segment to retrieve the frequency spectrum and then the power spectrum of the segment. The filterbank is created by applying a series of mel-scaled filters to the output. Finally, the log is applied to the output to increase the sensitivity between the lower channels.

         Audio Signal
              │
      ┌───────▼───────┐
      │   Windowing   │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │      FFT      │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │  Mel Filters  │
      └───────┬───────┘
              │
      ┌───────▼───────┐
      │      Log      │
      └───────┬───────┘
              │
              ▼
   Log-scaled Mel Filterbank

The feature array is generated by stacking filterbanks of sequential segments together to form a spectrogram. The array is sorted such that the first element is the first channel of the oldest filterbank.

Usage

sl_ml_audio_feature_generation_init() initializes the frontend for feature generation based on the configuration in sl_ml_audio_feature_generation_config.h. It also initializes and starts the microphone in streaming mode which places the audio samples into a ring-buffer.

The features are generated first when sl_ml_audio_feature_generation_update_features() is called. The feature generator will then update the features for as many new segments of audio as possible, starting from the last time the function was called up until the current time. The new features are appended to the feature buffer, replacing the oldest features such that the feature array always contains the most up to date features.

Note that if the audio buffer is not large enough to hold all audio samples required to generate features between calls to sl_ml_audio_feature_generation_update_features(), audio data will simply be overwritten. The generator will not return an error. The audio buffer must therefore be configured large enough to store all new sampled data between updating features.

To retrieve the generated features, either sl_ml_audio_feature_generation_get_features_raw()|quantized() or sl_ml_audio_feature_generation_fill_tensor() must be called.

Example

When used with TensorFlow Lite Micro, the audio feature generator can be used to fill a tensor directly by using sl_ml_audio_feature_generation_fill_tensor(). This requires that the model has been trained using the same feature generator configurations as in sl_ml_audio_feature_generation_config.h

#include "sl_tflite_micro_init.h"
#include "sl_ml_audio_feature_generation.h"
void main(void)
{
while(1){
if(do_inference){
}
...
}
}

Note that updating features and retrieving them can be performed independently, updating features should be done often enough to avoid overwriting the audio buffer while retrieving them only needs to be done prior to inference.

Functions

sl_status_t sl_ml_audio_feature_generation_init ()
 Sets up the microphone as an audio source for feature generation and initializes the frontend for feature generation.
 
sl_status_t sl_ml_audio_feature_generation_frontend_init ()
 Initializes microfrontend according to the configuration in sl_ml_audio_feature_generation_config.h.
 
sl_status_t sl_ml_audio_feature_generation_update_features ()
 Updates the feature buffer with the missing features slices since last call to this function.
 
sl_status_t sl_ml_audio_feature_generation_get_features_raw (uint16_t *buffer, size_t num_elements)
 Retrieves the features as type uint16 and copies them to the provided buffer.
 
sl_status_t sl_ml_audio_feature_generation_fill_tensor (TfLiteTensor *input_tensor)
 Fills a TensorFlow tensor with feature data of type int8.
 
int sl_ml_audio_feature_generation_get_new_feature_slice_count ()
 Returns how many new or unfetched feature slices that have been updated since last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor;.
 
int sl_ml_audio_feature_generation_get_feature_buffer_size ()
 Returns the feature buffer size.
 
void sl_ml_audio_feature_generation_reset ()
 Resets the state of the audio feature generator.
 

Function Documentation

◆ sl_ml_audio_feature_generation_init()

sl_status_t sl_ml_audio_feature_generation_init ( )

Sets up the microphone as an audio source for feature generation and initializes the frontend for feature generation.

Returns
SL_STATUS_OK for success SL_STATUS_FAIL

◆ sl_ml_audio_feature_generation_frontend_init()

sl_status_t sl_ml_audio_feature_generation_frontend_init ( )

Initializes microfrontend according to the configuration in sl_ml_audio_feature_generation_config.h.

Returns
SL_STATUS_OK for success SL_STATUS_FAIL

◆ sl_ml_audio_feature_generation_update_features()

sl_status_t sl_ml_audio_feature_generation_update_features ( )

Updates the feature buffer with the missing features slices since last call to this function.

To retrieve the features call sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.

Note
This function needs to be called often enough to ensure that the audio buffer isn't overwritten
Returns
SL_STATUS_OK for success SL_STATUS_EMPTY No new slices were calculated

◆ sl_ml_audio_feature_generation_get_features_raw()

sl_status_t sl_ml_audio_feature_generation_get_features_raw ( uint16_t *  buffer,
size_t  num_elements 
)

Retrieves the features as type uint16 and copies them to the provided buffer.

Parameters
[out]bufferPointer to the buffer to store the feature data
[in]num_elementsThe number of elements corresponding to the size of the buffer, if this is not large enough to store the entire feature buffer the function will return with an error.
Note
This function overwrites the entire buffer
Returns
SL_STATUS_OK for success SL_STATUS_INVALID_PARAMETER num_elements too small

◆ sl_ml_audio_feature_generation_fill_tensor()

sl_status_t sl_ml_audio_feature_generation_fill_tensor ( TfLiteTensor *  input_tensor)

Fills a TensorFlow tensor with feature data of type int8.

The int8 values are derived by quantizing the microfrontend output, expected to be in the range 0 to 670, to signed integer numbers in the range -128 to 127.

Parameters
[in]input_tensorThe input tensor to fill with features.
Note
This function overwrites the entire input tensor
Supports tensors of type kTfLiteInt8
Returns
SL_STATUS_OK for success SL_STATUS_INVALID_PARAMETER Tensor type or size does not correspond with configuration

◆ sl_ml_audio_feature_generation_get_new_feature_slice_count()

int sl_ml_audio_feature_generation_get_new_feature_slice_count ( )

Returns how many new or unfetched feature slices that have been updated since last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor;.

Returns
The number of unfetched feature slices

◆ sl_ml_audio_feature_generation_get_feature_buffer_size()

int sl_ml_audio_feature_generation_get_feature_buffer_size ( )

Returns the feature buffer size.

Returns
Size of the feature buffer

◆ sl_ml_audio_feature_generation_reset()

void sl_ml_audio_feature_generation_reset ( )

Resets the state of the audio feature generator.