AI Trained on Birdsong Can Recognize Whale Calls
TL;DR
Google's Perch 2.0 is a bioacoustics foundation model trained on millions of bird recordings plus vocalizations from amphibians, insects, and land mammals.
Key Points
- Surprisingly, the model also reliably identifies whale calls – even though underwater acoustics behave physically very differently from airborne sound.
- Google DeepMind and Google Research have spent nearly a decade on whale bioacoustics, including humpback whale detection algorithms and a multi-species model covering eight whale species.
- Perch 2.0 demonstrates that a wildlife audio foundation model can transfer cross-domain without whale-specific training.
Nauti's Take
This sounds like a side finding, but it's actually the more exciting part of the story: foundation models appear to learn acoustic structures at an abstraction level that transcends the physical medium. Birds in air, whales in water – the model seems indifferent.
What has long been true for language models (transfer across languages and domains) now applies to animal vocalizations too. The real question is how far this extends: could such a model eventually classify earthquake sounds, industrial noise, or medical audio signals?
The logic would support it.