Bird Audio Detection (DCASE 2018)

This was UKYSpeechLab’s submission to Task 3 (Bird Audio Detection) of the DCASE 2018 Challenge. The task is deceptively hard: decide whether a short field recording contains any bird vocalization — while generalizing across very different acoustic domains, since the training and evaluation audio come from different sensors, locations, and recording conditions (BirdVox-DCASE-20k, ff1010bird, warblrb10k for development; Chernobyl and PolandNFC for evaluation).

The core model is a compact convolutional neural network over mel-spectrograms that outputs a bird presence/absence probability:

Baseline CNN architecture: a mel-spectrogram fed through a compact convolutional network to a bird presence/absence decision.

On top of this baseline we explored a set of domain tuning methods to close the gap between training and unseen evaluation domains: varying the temporal and spectral resolution of the input features, cross-dataset domain adaptation via class-weighting, signal enhancement, and a multi-model variant. The final, top-scoring submission is a simple unweighted average ensemble of six CNN variants. The work was published at the DCASE 2018 Workshop (Liaqat et al., “Domain Tuning Methods for Bird Audio Detection”). Built in Python with TensorFlow/Keras.

Read the paper → View code →

← Back to home