Underwater Audio Species Classification Using Dual-Path Deep Learning
Main Article Content
Abstract
Underwater acoustic species classification is an important task in marine bioacoustic monitoring, ocean biodiversity assessment, and whale communication analysis. However, accurate classification of underwater species sounds is challenging due to background ocean noise, ship noise, sonar interference, signal attenuation, overlapping vocalizations, and variations in call structure across species. To address these challenges, this work proposes a dual-path deep learning framework for underwater audio species classification using both spectrogram based and raw waveform-based feature learning. In the proposed method, raw underwater audio is first processed through denoising, amplitude normalization, and framing. The preprocessed audio is then passed through two parallel branches. The first branch converts the signal into Mel/CQT spectrogram representations and extracts deep spectral features using an EfficientNetV2 model integrated with a CBAM attention layer. The second branch processes the raw waveform using a Wav2Vec 2.0 pretrained transformer to obtain temporal acoustic embeddings. The extracted features from both branches are concatenated and passed through a feature fusion layer followed by temporal pooling, fully connected layers, dropout, and softmax classification. The experimental results show that the proposed model effectively captures both time-frequency and waveform-level characteristics of underwater species calls. The model achieved an accuracy of 97.18%, precision of 96.94%, recall of 96.82%, F1 score of 96.88%, specificity of 98.41%, and an error rate of 2.82%. Comparative analysis with existing methods such as MFCC-SVM, CNN, LSTM, ResNet, two-channel fusion networks, and MT-Resformer demonstrates that the proposed EfficientNetV2-CBAM and Wav2Vec 2.0 fusion model provides improved classification performance. The results confirm that the dual-path fusion strategy is effective for robust underwater species identification in noisy marine acoustic environments.