Audio Feature Generator
The audio feature generator is used to extract mel-filterbank features from an audio signal for use with machine learning audio classification applications using a microphone as an audio source.
The Mel scale replicates the behaviour of the human ear, which has a higher resolution for lower frequencies and are less discriminative of the higher frequencies. To create a mel filterbank, a number of filters are applied to the signal where the pass-band of the lower channel filters is narrow and increases towards higher frequencies.
The audio signal is split into short overlapping segments using a window function (Hamming). The FFT (Fast Fourier Transform) is applied to each segment to retrieve the frequency spectrum and then the power spectrum of the segment. The filterbank is created by applying a series of mel-scaled filters to the output. Finally, the log is applied to the output to increase the sensitivity between the lower channels.
Audio Signal │ ┌───────▼───────┐ │ Windowing │ └───────┬───────┘ │ ┌───────▼───────┐ │ FFT │ └───────┬───────┘ │ ┌───────▼───────┐ │ Mel Filters │ └───────┬───────┘ │ ┌───────▼───────┐ │ Log │ └───────┬───────┘ │ ▼ Log-scaled Mel Filterbank
The feature array is generated by stacking filterbanks of sequential segments together to form a spectrogram. The array is sorted such that the first element is the first channel of the oldest filterbank.
sl_ml_audio_feature_generation_init() initializes the frontend for feature generation based on the configuration in sl_ml_audio_feature_generation_config.h. It also initializes and starts the microphone in streaming mode which places the audio samples into a ring-buffer.
The features are generated first when sl_ml_audio_feature_generation_update_features() is called. The feature generator will then update the features for as many new segments of audio as possible, starting from the last time the function was called up until the current time. The new features are appended to the feature buffer, replacing the oldest features such that the feature array always contains the most up to date features.
Note that if the audio buffer is not large enough to hold all audio samples required to generate features between calls to sl_ml_audio_feature_generation_update_features(), audio data will simply be overwritten. The generator will not return an error. The audio buffer must therefore be configured large enough to store all new sampled data between updating features.
When used with TensorFlow Lite Micro, the audio feature generator can be used to fill a tensor directly by using sl_ml_audio_feature_generation_fill_tensor(). This requires that the model has been trained using the same feature generator configurations as in sl_ml_audio_feature_generation_config.h
Note that updating features and retrieving them can be performed independently, updating features should be done often enough to avoid overwriting the audio buffer while retrieving them only needs to be done prior to inference.
|Sets up the microphone as an audio source for feature generation and initializes the frontend for feature generation. |
|Initializes microfrontend according to the configuration in sl_ml_audio_feature_generation_config.h. |
|Updates the feature buffer with the missing features slices since last call to this function. |
|sl_status_t||sl_ml_audio_feature_generation_get_features_raw (uint16_t *buffer, size_t num_elements)|
|Retrieves the features as type uint16 and copies them to the provided buffer. |
|sl_status_t||sl_ml_audio_feature_generation_fill_tensor (TfLiteTensor *input_tensor)|
|Fills a TensorFlow tensor with feature data of type int8. |
|Returns how many new or unfetched feature slices that have been updated since last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor;. |
|Returns the feature buffer size. |
|Resets the state of the audio feature generator. |
Sets up the microphone as an audio source for feature generation and initializes the frontend for feature generation.
- SL_STATUS_OK for success SL_STATUS_FAIL
Initializes microfrontend according to the configuration in sl_ml_audio_feature_generation_config.h.
- SL_STATUS_OK for success SL_STATUS_FAIL
Updates the feature buffer with the missing features slices since last call to this function.
To retrieve the features call sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor.
- This function needs to be called often enough to ensure that the audio buffer isn't overwritten
- SL_STATUS_OK for success SL_STATUS_EMPTY No new slices were calculated
|sl_status_t sl_ml_audio_feature_generation_get_features_raw||(||uint16_t *||
Retrieves the features as type uint16 and copies them to the provided buffer.
Pointer to the buffer to store the feature data [in]
The number of elements corresponding to the size of the buffer, if this is not large enough to store the entire feature buffer the function will return with an error.
- This function overwrites the entire buffer
- SL_STATUS_OK for success SL_STATUS_INVALID_PARAMETER num_elements too small
|sl_status_t sl_ml_audio_feature_generation_fill_tensor||(||TfLiteTensor *||)|
Fills a TensorFlow tensor with feature data of type int8.
The int8 values are derived by quantizing the microfrontend output, expected to be in the range 0 to 670, to signed integer numbers in the range -128 to 127.
The input tensor to fill with features.
- This function overwrites the entire input tensor
- Supports tensors of type kTfLiteInt8
- SL_STATUS_OK for success SL_STATUS_INVALID_PARAMETER Tensor type or size does not correspond with configuration
Returns how many new or unfetched feature slices that have been updated since last call to sl_ml_audio_feature_generation_get_features_raw or sl_ml_audio_feature_generation_fill_tensor;.
- The number of unfetched feature slices
Returns the feature buffer size.
- Size of the feature buffer
Resets the state of the audio feature generator.