Bats, machine learning & tequila

When you’re having a shot of tequila at your next fiesta, make sure to raise a glass to bats. Yes, bats, because they’re the animal that pollinates the blue agave plant needed to make tequila.

The small winged animals, which represent a whopping fifth of all mammals, do quite a lot of things for us. They pollinate many other plants – like wild bananas – and do a lot of insect control. Some species, like the little brown bat, can eat up to 1000 mosquitoes in a single hour. In fact, bats are one of our best defenses against the spread of mosquito-borne diseases such as Zika.

Knowing where bats live and how their populations are faring in response to our impact on the planet is clearly an important task. But they’re small, largely nocturnal and like to hide – so how do we know where the bats are?

One answer is through sound. Around 80% of bats emit series of acoustic pulses, which they use – along with their resounding echoes – to navigate the nocturnal world. This is called echolocation. Although these pulses are beyond the frequencies that humans can hear, they’re audible to a range of devices called ultrasonic detectors.

About a decade ago, an ambitious project was started in Europe to gather as many recordings of bats, known as the Indicator Bats (iBats) program. Today, volunteers drive through the countryside of 22 countries with bat detectors attached to the roofs of their car, collecting the acoustic information that bats leak to the world about their whereabouts. This has generated a vast amount of audio recordings – more than any human alone could ever listen to.

The big challenge is to develop automated ways of telling us just how many bats are in those recordings, and of which species. Though each species has its own signature echolocation ‘call,’ telling a computer how to distinguish between them is complicated. Let’s take a look at some calls. We can’t hear them, so plotting them on a spectrogram (a frequency-time plot) is one way of looking at them. For some species, the call ‘shapes’ look completely different, making them easy to tell apart, like the calls in the top image below. But within some groups of bats – like brown bats, shown in the bottom image – calls between species can look pretty alike. On top of this, there can be a lot of variation in calls between a species, so it’s hard to give a computer concrete instructions.

Screen Shot 2017-03-31 at 20.45.35.png

Screen Shot 2017-03-31 at 20.45.58.png

Spectrograms of calls of various bat species

This is the type of problem a new branch of computer science is cut out to solve: machine learning (ML). ML is a kind of artificial intelligence that allows algorithms to ‘learn’ from data you give them without being explicitly programmed to do so. In discriminating between bat species, they’ve proven to be more accurate than other computational methods and even well-trained experts.

Let’s build a simple classifier. We’ll pick a machine learning algorithm called Random Forest, which is easy to understand. To train the algorithm, we have a bunch of cleanly recorded bat calls from 33 European bats.

Before we do, we’ll need to package the bat calls somewhat differently – Random Forest doesn’t know what to do with raw audio. Using simple programming tools, we can extract some simple parameters from the calls. For instance, we can take the mean frequency for each millisecond time slice across a call – which you can see represented as black dots in the spectrogram below.

Screen Shot 2017-03-31 at 20.47.27.png

We can get crazier and fit a curve to that line, and calculate some fancier parameters from the curve itself, like the slope, or the steepest slope – stuff you might vaguely recall  from high school math. Using our imagination, we came up with 34 parameters in total.

Screen Shot 2017-03-31 at 20.48.57.png

We’re not going to tell Random Forest how it’s going to use these parameters to discriminate between species. The beauty of machine learning is that the algorithms are capable of “learning” for themselves the best way to order data. We simply feed Random Forest the parameters, and it will learn from these and will build us a classifier, i.e. an algorithm capable of assigning a species identification to a call it hasn’t seen before.

We set aside 20% of our data to test the algorithm on, and feed it the parameters from 80% of the bat calls. Random Forest works on the basis of decision trees. For each decision tree, it will take a handful of data – in this case, bat calls – and uses the parameters we extracted to create a pathway of decisions for the algorithm to decide which classification to make. For instance, a simple decision tree would classify every bat call with a start frequency above 100kHz as a horseshoe bat. If it’s below that frequency, it will, say, look at the steepest slope of the call. If it’s steep, it could be a brown bat, and so on. Random Forest computes many such decision trees and averages over all of them to create a final classifier for prediction.

Screen Shot 2017-03-31 at 20.49.45.png

After growing the classifier, we test it on the 20% of data we initially set aside. We see it correctly predicted the species in nearly 70% of cases! Notably, it did perform worse in some bat groups than others: for example in the group of brown bats, which – as we previously discussed – are tricky to distinguish.

70% is not bad for a start. When a previous research group used an artificial neural network – a different type of machine learning algorithm – for the same species of bats, it achieved 83.7% accuracy. But the trouble with artificial neural networks is that they are somewhat of a “black box” in terms of how they work. We can test if they are accurate, in the same way we tested the accuracy of our Random Forest agorithm. But what happens inside the box, i.e. why the algorithm produces a certain prediction, we can’t explain very well.

Random Forest can tell us which parameters exactly it found most useful in discriminating between species (in this case, this was the lowest frequency of the call). We know exactly what it’s doing. The downside is that it wasn’t quite as effective as the artificial neural network – at least for this particular data. When we tried our classifier on some real-world data collected on the island of Jersey a few years ago, it didn’t todo so well. For one, it kept categorizing calls to species not known to occur in that area. The program we wrote to extract features on the calls seemed to falter, too: in most cases, the curve was unable to align to the bat calls. The reason was simply that these new, real-world recordings were particularly noisy, unlike the clean data we had trained our algorithm on.

When it comes to machine learning algorithms, there’s often a trade-off between precision and transparency. Of course we want to know exactly how an algorithm works and why it makes the decisions it does, but sometimes we have to forsake transparency for precision.

The Jones group at the Center for Biodiversity and Environmental Research at University College London is making use of the latest trend in machine learning to classify bat calls: deep learning. These algorithms are particularly opaque and difficult to understand. They’re highly complex, but they’ve excelled in accuracy for certain tasks. The Jones group is working towards using deep learning techniques to create a new classifier for European bats.

Although they are less transparent, they are extremely promising. As long as they are vigorously tested to make sure that they’re drawing the right conclusion for each bat, such algorithms hold big promise for ecology.

The more we know about bats, the better. It seems that artificial intelligence can get us a good step of the way there.

And it might just help keep the tequila flowing.