Audio Feature Generator
The audio feature generator extracts mel-filterbank features from an audio signal to use with machine learning audio classification applications using a microphone as an audio source.
The Mel scale replicates the behavior of the human ear, which has a higher resolution for lower frequencies and is less discriminative of the higher frequencies. To create a mel filterbank, a number of filters are applied to the signal, where the pass-band of the lower channel filters is narrow and increases towards higher frequencies.
The audio signal is split into short overlapping segments using a window function (Hamming). The Fast Fourier Transform (FFT) is applied to each segment to retrieve the frequency spectrum and then the power spectrum of the segment. The filterbank is created by applying a series of mel-scaled filters to the output. Finally, the log is applied to the output to increase the sensitivity between the lower channels.
Audio Signal │ ┌───────▼───────┐ │ Windowing │ └───────┬───────┘ │ ┌───────▼───────┐ │ FFT │ └───────┬───────┘ │ ┌───────▼───────┐ │ Mel Filters │ └───────┬───────┘ │ ┌───────▼───────┐ │ Log │ └───────┬───────┘ │ ▼ Log-scaled Mel Filterbank
The feature array is generated by stacking filterbanks of sequential segments together to form a spectrogram. The array is sorted such that the first element is the first channel of the oldest filterbank.
sl_ml_audio_feature_generation_init() initializes the frontend for feature generation based on the configuration in sl_ml_audio_feature_generation_config.h. It also initializes and starts the microphone in streaming mode, which places the audio samples into a ring-buffer.
If used together with the Flatbuffer Converter Tool and a compatible TensorFlow Lite model, the configuration is pulled from the TensorFlow Lite model by default. Set the configuration option
SL_ML_AUDIO_FEATURE_GENERATION_MANUAL_CONFIG_ENABLE to override this behavior and use manually-configured options from the configuration header.
The features are generated when sl_ml_audio_feature_generation_update_features() is called. The feature generator then updates the features for as many new segments of audio as possible, starting from the last time the function was called up until the current time. The new features are appended to the feature buffer, replacing the oldest features such that the feature array always contains the most up to date features.
Note that if the audio buffer is not large enough to hold all audio samples required to generate features between calls to sl_ml_audio_feature_generation_update_features(), audio data will simply be overwritten. The generator will not return an error. The audio buffer must therefore be configured to be large enough to store all new sampled data between updating features.
To retrieve the generated features, either sl_ml_audio_feature_generation_get_features_raw(), sl_ml_audio_feature_generation_get_features_quantized(), or sl_ml_audio_feature_generation_fill_tensor() must be called.
When used with TensorFlow Lite Micro, the audio feature generator can be used to fill a tensor directly by using sl_ml_audio_feature_generation_fill_tensor(). However, the model has to be trained using the same feature generator configurations as used for inference, configured in sl_ml_audio_feature_generation_config.h.
Note that updating features and retrieving them can be performed independently. Updating features should be done often enough to avoid overwriting the audio buffer while retrieving them only needs to be done prior to inference.
|Set up the microphone as an audio source for feature generation and initialize the frontend for feature generation. |
|Initialize microfrontend according to the configuration in sl_ml_audio_feature_generation_config.h. |
|Update the feature buffer with the missing feature slices since the last call to this function. |
|sl_status_t||sl_ml_audio_feature_generation_get_features_raw (uint16_t *buffer, size_t num_elements)|
|Retrieve the features as type uint16 and copy them to the provided buffer. |
|sl_status_t||sl_ml_audio_feature_generation_fill_tensor (TfLiteTensor *input_tensor)|
|Fill a TensorFlow tensor with feature data of type int8. |
|Return the number of new or unfetched feature slices that have been updated since the last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor. |
|Return the feature buffer size. |
|Reset the state of the audio feature generator. |
Set up the microphone as an audio source for feature generation and initialize the frontend for feature generation.
- SL_STATUS_OK for success SL_STATUS_FAIL
Initialize microfrontend according to the configuration in sl_ml_audio_feature_generation_config.h.
- SL_STATUS_OK for success SL_STATUS_FAIL
Update the feature buffer with the missing feature slices since the last call to this function.
To retrieve the features, call sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.
- This function needs to be called often enough to ensure that the audio buffer isn't overwritten.
- SL_STATUS_OK for success SL_STATUS_EMPTY No new slices were calculated
|sl_status_t sl_ml_audio_feature_generation_get_features_raw||(||uint16_t *||
Retrieve the features as type uint16 and copy them to the provided buffer.
Pointer to the buffer to store the feature data [in]
The number of elements corresponding to the size of the buffer; If this is not large enough to store the entire feature buffer the function will return with an error.
- This function overwrites the entire buffer.
- SL_STATUS_OK for success SL_STATUS_INVALID_PARAMETER num_elements too small
|sl_status_t sl_ml_audio_feature_generation_fill_tensor||(||TfLiteTensor *||)|
Fill a TensorFlow tensor with feature data of type int8.
The int8 values are derived by quantizing the microfrontend output, expected to be in the range 0 to 670, to signed integer numbers in -128 to 127 range.
The input tensor to fill with features.
- This function overwrites the entire input tensor.
- Supports tensors of type kTfLiteInt8.
- SL_STATUS_OK for success SL_STATUS_INVALID_PARAMETER Tensor type or size does not correspond with configuration
Return the number of new or unfetched feature slices that have been updated since the last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.
- The number of unfetched feature slices
Return the feature buffer size.
- Size of the feature buffer
Reset the state of the audio feature generator.