507 / 793

AI Trained on Birdsong Can Recognize Whale Calls

TL;DR

Google's Perch 2.0 is a bioacoustics foundation model trained on millions of bird recordings plus vocalizations from amphibians, insects, and land mammals.

Key Points

  • Surprisingly, the model also reliably identifies whale calls – even though underwater acoustics behave physically very differently from airborne sound.
  • Google DeepMind and Google Research have spent nearly a decade on whale bioacoustics, including humpback whale detection algorithms and a multi-species model covering eight whale species.
  • Perch 2.0 demonstrates that a wildlife audio foundation model can transfer cross-domain without whale-specific training.

Nauti's Take

This sounds like a side finding, but it's actually the more exciting part of the story: foundation models appear to learn acoustic structures at an abstraction level that transcends the physical medium. Birds in air, whales in water – the model seems indifferent.

What has long been true for language models (transfer across languages and domains) now applies to animal vocalizations too. The real question is how far this extends: could such a model eventually classify earthquake sounds, industrial noise, or medical audio signals?

The logic would support it.

Context

A model trained on birdsong generalizing to whale calls without explicit fine-tuning is a strong signal for the maturity of audio foundation models in ecology. It means researchers with small datasets on rare species could leverage large models trained on more common animals. This dramatically lowers the barrier for biodiversity monitoring – especially in ocean ecosystems where data collection is expensive and logistically difficult.

Sources