The tail is processed as a time series to identify whales


By Lamothe Thibaud
Compile Flin
Source: to ward data science

Using curvature integral and dynamic time warping, let’s study sperm whale recognition in depth!


Recently, we tried Capgemini’s global data science challenge. I worked with ACORES whale research center to identify sperm whales and use AI to help save their lives.

To accomplish this task, we collected thousands of photos of whales over the past few years. In the training data set, there are an average of 1.77 photos per whale, and many animals only appear once. Therefore, the main idea is to give a new image and find the closest one from the existing data.

So if the whale has been photographed, researchers can know when and where it was photographed.

I’m proud to announce that we finished third and we used theSiam networkWe won. However, since there have been many articles about this wonderful architecture, today I will introduce a more interesting and novel way to solve this problem.


Designed by weideman et al. In their paper “curvature integral representation and matching algorithm for identifying dolphins and whales”, this is the key step of the method I will introduce today, as follows:

  • Tail extraction based on color analysis and contour detection

  • Tail processing of curvature integral (IC)

  • Tail comparison with dynamic time warping (DTW)

Disclaimer n ° 1: The prediction rate is not as good as Siam network, so we have to explore other solutions. But this idea is very interesting and worth sharing and understanding.

Disclaimer n ° 2: In many data science projects, data preparation is the most difficult part. In fact, to process the tail as a signal, the quality of the signal must be very good. In this article, we’ll take some time to understand all the necessary steps before signal processing.

Explore our data set and analyze images

As mentioned in the introduction, we got thousands of pictures. At first glance, a whale is a whale. All of these images look like a blue background (sky and sea) with a gray speck (tail) in the middle.

After preliminary exploration, we began to distinguish between two different sperm whales, mainly due to the shape of the tail, which we believe is crucial to our algorithm. What about the color? Is there any interesting information in the pixel distribution?

Correlation between the number of colors in each picture (green and red – blue and red – green and blue)
Using bokeh visualization Library( soon found that the color in the image is highly correlated. So we focus on the contours and try to detect them through color changes.

Tail extraction based on color filter

The first step of tail contour detection is to extract tail from sky and water. In fact, this is the most difficult part of the process.

First, we use the contour detection algorithm. But because the sunlight from one lens to another is constantly changing, the contrast has changed a lot, and the result is not satisfactory.

By the way, it’s interesting to see where image algorithms fail most, because in most cases, the difference between the tail and the sea is obvious to humans.

Having said that, let’s further study the automation of color analysis and contour extraction.

Using color to extract tail

Let’s draw a grayscale image for each channel intensity (red, green, blue)

Look at the three channels of a single picture

As you can see above, for most images, there are fewer colors in the middle of the image, which can be filtered by pixel strength. Since tails are usually gray, the number of each color is almost the same (r = g = b), but the sea and sky tend to be blue, making this color ideal for filtering.

Let’s see when we keep only the blue values and keep only the blue values

Selected threshold selected_ The maximum value of threshold is 255 because it is the maximum value of pixel intensity.

Through this series of images, we can believe that it is easy to extract the tail. But how do I choose the filtering threshold?

The following is an example of the result of using all values from 10 to 170 (ten times ten) as the threshold of a single picture.

According to the intensity of blue pixels, 17 different filters are applied to an image

Here are some interesting things:

  • The threshold is very small (about 10), the sea disappears, but the tail also disappears

  • The threshold is small (about 20) and part of the tail disappears

  • The threshold value is not too high (about 40). The extraction is very good. All the tails are not as blue as the threshold value, but all the oceans are bluer than the threshold value.

  • At the intermediate threshold (about 80), the tail remains intact, but we start with only part of the ocean

  • It is difficult to distinguish the sea from the tail when the threshold is close to the median (about 110)

  • At a higher threshold (> = 140), the tail disappears completely. This means that even the sea is not blue enough to be selected through the filter.

That’s it, and it seems obvious that selected should be used_ Threshold = 40 and filter blue is applied_ value < 40。

As you can guess, it’s not easy. Given the light intensity of the image, the correct value of the image is 40. But it’s a clich é. By drawing all of these thresholds on a random picture, the threshold changes between 10 and 130. So how to choose the right value?

Use bounding box to select threshold

By looking at the previous image, we think of something: the correct image with the correct threshold is the image with the largest blank area on the outside and the largest area on the inside. We hope that some neural networks trained on Imagenet can locate the whale in the image. We decided to use mobilenet based on the Imagenet class.

Compared with the original image, a batch of extracted tails have borders

That’s a good idea. As shown below, we can determine the position of the tail in the picture very accurately. Then, we can separate “tail inside” from “sea outside” in almost all images.

To better understand this separation, for each image in the training set, we add the blue values of each pixel in the bounding box, and do the same for the pixels outside the bounding box.

Then, we draw each image on the following figure, the internal result is reflected on the X axis, and the external result is reflected on the Y axis. The blue line represents x = y. What we can get from this graph is as follows: the farther you go off the line, the easier it is to separate the tail from the ocean.

Compare sperm whale images under the intensity of blue pixels inside and outside the bounding box

We tried to apply the filter threshold based on the distance from the line, but this did not produce any results. After several attempts, we can’t do anything just according to the color distribution of the picture, so we decided to adopt a tough method. In addition to viewing images and determining thresholds, we apply 15 filters to each image, analyze them, and then automatically select the best filter for further processing.

Then, for a given image, we apply 15 different values to 15 filters as thresholds. For each filter, we calculate the number of pixels inside and outside the bounding box (after filtering, the pixel value is 0 or 1, and there is no need to sum the intensity). Then, we normalize the result to make the number independent of the size of the image, and draw the result on a graph.

The number of pixels in the inner (x-axis) and outer (Y-axis) bounding boxes of a single picture and different filtering thresholds.

For each image, we get a curve similar to the curve above, which is a mathematical transformation of the previous statement as the threshold evolves.

  • When the threshold is small, the tail and the sea disappear. There are no pixels inside or outside the tail

  • When the threshold increases, tail appears and the value of x-axis increases.

  • Until the threshold begins to appear in some parts of the ocean, and the external value begins to grow.

Using linear regression or derivative, it is now easy to detect the correct threshold: it is the threshold at the intersection of two lines of a graph.

Note: the orange line isy = y_of_the_selected_threshold

The last tip of tail extraction

Finally, in order to get the best image during extraction, when we calculate the best threshold (10, 20, 30, 40,…, 120, 130, 140, 150), we assume that it is 80. We applied a filter to the – 5 / + 5 value. So we have three pictures: Blue < 75, blue < 80, blue < 85. Then we sum three of these grid images (0 and 1), and only keep the value of the resulting pixel equal to 2. This will act as a final filter to remove the tail noise. The effect of this extraction is very good, we decided to apply to all images.


To summarize, the following are the assumptions we have made so far:

  • We can use the intensity of the filter on the blue pixels to distinguish the tail from the ocean

  • Before filtering, we need to find a threshold for each image

  • Using bounding box is an effective way to find this threshold

After several hours of work, we finally got a very good tail extractor, which can handle tails with different brightness, weather, ocean color and tail color, and can browse the most difficult pictures.

A batch of extracted tails are compared with the original image

Contour detection

Now the tail is in the picture, we do contour detection. Indeed, to deal with tails in time series, we need to signal.

In this step, we can use opencv’s contour detection algorithm, but it looks faster through the following two steps:

Step 1: use entropy to remove the noise around the tail

Entropy change is used to preserve only the extracted tail contour

Step 2: keep the highlight pixels of each column

Extracted tail contour detected by entropy filter

This step is very simple, there is no complexity.

Curvature integral

By extracting the tail from the sea and getting the upper pixel of the image, we get the back edge of the tail as the signal. Now that we have this, we have to deal with normalization. In fact, all pictures are different in size or number of pixels. In addition, the distance to sperm whales is not always the same, and the orientation may change when shooting.

As an example of tail orientation, two photos of the same whale may differ

In order to standardize, we have to do it along two axes. First, we decided to use 300 points per tail for signal comparison. Then we interpolate the shortest interpolation and sample the longest. Second, we normalize all values between 0 and 1. This results in signal stacking, as shown in the figure below.

Scale signal superposition

In order to solve the orientation problem, we use the curvature integral measure, which transforms the signal into another signal through local evaluation.

As described in the original paper: “it captures the local shape information of each point along the trailing edge. For a given point on the trailing edge, we place a circle with radius r at that point, and then find all the points on the trailing edge that are in the circle

Then, in each step, we straighten the edge of the signal along the circle so that it is inscribed into a square.

Principle of curvature integral

Finally, we define curvature as follows:

The curvature is the total area under the curve to the square, which means that the curvature of the line is C = 0.5

So we get standardized signals that are independent of the distance between the whale and the photographer, the angle between the whale and the photographer, and the angle between the whale and the ocean.

Then, for each training test image, we create those signals with radii of 5, 10, and 15 pixels during the IC phase shift. We store them and use them for the final step: comparing time series.

In this article, I will introduce the implementation of this algorithm. Once it works, we can apply it to the trailing edge and extract the signal from the details of the environment. For a tail, the signal looks like this:

Curvature integral applied to the tail edge of sperm whale with three different radius values

Now, let’s compare the signals!

Dynamic time warping

Dynamic time warping (DTW, is an algorithm that can find the best alignment between two time series. It is usually used to determine the similarity of time series, classify and find the corresponding region between two time series.

In contrast to Euclidean distance, DTW distance allows linking different parts of a curve. The principle of the algorithm is as follows

Using two curves, we create the distance matrix between the two series. From the lower left corner to the upper right corner, we calculate the distances AI and Bi between the two points, as follows: D (AI, BI) = | AI Bi] + min (d [I-1, J-1], d [I-1, J], d [I, J-1]).

When the distance matrix is satisfied, we calculate the path with less weight from the upper right corner to the lower left corner. To do this, we choose the least square in each step.

Finally, the selected path (green in the figure below) indicates which data point from sequence a corresponds to the data point in sequence B.

The implementation of such basic computation is very easy. For example, this is a function t that creates a distance matrix based on two sequences s and.

def dtw(s, t):
  """ Computes the distance matrix between two time series
      args: s and t are two numpy arrays of size (n, 1) and (m, 1)
    # Instanciate distance matrix
    n, m = len(s), len(t)
    dtw_matrix = np.zeros((n+1, m+1))
    for i in range(n+1):
        for j in range(m+1):
            dtw_matrix[i, j] = np.inf
    dtw_matrix[0, 0] = 0
    # Compute distance matrix
    for i in range(1, n+1):
        for j in range(1, m+1):
            cost = abs(s[i-1] - t[j-1])
            last_min = np.min([
              dtw_matrix[i-1, j],
              dtw_matrix[i, j-1],
              dtw_matrix[i-1, j-1]
            dtw_matrix[i, j] = cost + last_min
    return dtw_matrix

Having said that, let’s go back to our sperm whale! Each tail of the dataset is converted into an “integral curve signal,” and we calculate the distance between all the tails to find the closest ones.

After that, when we receive a new image, we must make it through the whole preparation process: using the tail extraction of blue filter, using entropy method for contour detection and using IC for contour conversion. It gives us a tensor of 300×1 shape, and finally we need to calculate the distance of the whole dataset. It’s time-consuming, by the way.

Verdict: considerable results! When we have two identical photos of whales, in most cases, the two photos are the closest 40, which was the best in 2000. However, as mentioned in the introduction, the result of using Siam network is better than this picture (the picture is usually in the last five pictures)), so we have to choose other methods in the survey due to the time of the competition.

Reward: Processing half the tail and half the signal

We try to use a half tail, assuming either of the following:

  • The tail is symmetrical, which will simplify the calculation.

  • The tail is asymmetrical, so it can be compared with a half tail.

Despite a lot of testing, it didn’t give us very definite results. We don’t think our separation is reliable enough: we will need more time to study the better separation brought about by signal processing.

Last thought

After sending some more difficult time tail extraction than we thought, due to the color of the image (basically blue – Ocean and sky) and various brightness of the image in the dataset, we applied two continuous processing methods to tail recognition.

Firstly, curvature integral is a method to normalize the signal by looking at the local changes of the curve. Then, we use dynamic time warping, which is the distance between the two curves. Even if we move the two curves, we may find the similarity between the two curves.

Unfortunately, the results are not as good as I’d like, and we can’t continue to use the solution. With more time and effort, I firmly believe that we can improve every step of the pipeline to get a better model. I also enjoy working with the concepts mentioned in this article.

It’s very challenging to monitor all transformations through all the steps, the different ways to implement them, and the parameters. Just as we have a road map, every step has its own difficulties, every small success is a victory, and it starts the next step. It’s very gratifying.

I find this method very interesting and completely different from the usual pre trained CNN. I hope you like the benefits of this approach as well. If you have any questions, please feel free to contact me


Link to the original text:

Welcome to panchuang AI blog:

Sklearn machine learning official Chinese document:

Welcome to pancreato blog Resource Hub: