Study on volume occupancy rate of oil storage tank based on yolov3 satellite image


By MD. mubasir
Compile VK
Source: towards Data Science

Before 1957, there was only one natural satellite on earth: the moon. On October 4, 1957, the Soviet Union launched the world’s first man-made satellite. Since then, about 8900 satellites from more than 40 countries have been launched.

These satellites help us with surveillance, communication, navigation and so on. These countries also use satellites to monitor the land and movements of another country and estimate its economy and strength. However, all countries hide their information from each other.

Similarly, the global oil market is not completely transparent. Almost all oil producing countries try to hide their total production, consumption and storage. Countries do so in order to indirectly conceal their real economy from the outside world and enhance the capacity of their national defense systems. Such an approach may pose a threat to other countries.

For this reason, many start-ups, such as planet and orbital insight, pay attention to such activities in various countries through satellite images. Thye collected satellite images of oil storage tanks and estimated reserves.

But the question is, how to estimate the volume of oil storage tanks only from satellite images? Well, it is only possible if there is a floating roof tank in the storage tank. This special type of tank is specially designed to store large quantities of petroleum products, such as crude oil or condensate. It is composed of a top cover, which is directly located on the top of the oil. With the increase or decrease of the oil in the oil tank, two shadows are formed around it. As shown in the following figure, the shadow is on the north side

(external shadow) refers to the total height of the tank, while the shadow in the tank (internal shadow) represents the depth of the floating roof. The volume is estimated to be 1 – (inner shadow area / outer shadow area).

In this blog, we will use tensorflow 2. X framework, with the help of satellite images, and use Python to implement a complete model from scratch to estimate the occupation of oil storage tanks.

GitHub warehouse

All of this article and the entire code can be found in this GitHub repository

The following is the directory of this blog. We will explore one by one.


  1. Problem statements, data sets and evaluation indicators

  2. Existing methods

  3. Related research work

  4. Useful blogs and research papers

  5. Our contribution

  6. Exploratory data analysis (EDA)

  7. Data expansion

  8. Data preprocessing, expansion and tfrecords

  9. Object detection based on yolov3

  10. Reserve estimation

  11. result

  12. conclusion

  13. Future work

  14. Reference reference

1. Problem statements, data sets and evaluation indicators

Problem statement:

Detection of floating roof oil tank and estimation of oil storage capacity. The image blocks are then recombined into a full image with oil storage estimation.


Dataset links:

The dataset contains an annotated bounding box, and satellite images are taken from Google Earth, which contains industrial areas around the world. There are 2 folders and 3 files in the dataset. Let’s look at them one by one.

  • large_images:This is a folder containing 100 original satellite images, each 4800×4800 in size. All images are marked with ID_ Name in large.jpg format.
  • Image_patches:Image_ The patches directory contains 512×512 size subgraphs generated from large images. Each large image is divided into 100, 512×512 subgraphs, and there is an overlap of 37 pixels between the subgraphs on the two axes. The program that generates the image subgraph with ID_ row_ Column.jpg format naming
  • labels.json:It contains labels for all images. Tags are stored as a list of dictionaries, and each image corresponds to a dictionary. Images that do not contain any floating roof tanks will be marked “skip”. The format of the bounding box label is the (x, y) coordinates of the four corners of the bounding box.
  • labels_coco.json:It contains the same label as the previous file and is converted to coco label format. Here, the format of the bounding box is [x_min, y_min, width, height]
  • large_image_data.csv:It contains metadata about large image files, including the center coordinates and altitude of each image.

Evaluation indicators:

For the inspection of oil storage tanks, we will use the average accuracy of each oil storage tank(Average Precision, AP) and map of various oil storage tanks(Mean Average Precision, average accuracy). There is no measure for the estimated volume of floating roof tanks.

Map is the standard evaluation index of target detection model. A detailed description of map can be found in the YouTube playlist below

2. Existing methods

Karl keyer [1] uses retinanet in his repository to complete the oil tank detection task. He creates the model from scratch and applies the generated anchor box to the dataset. This makes the average accuracy (AP) of floating roof tank reach 76.3%. Then he applied shadow enhancement and pixel thresholding to calculate its volume.

As far as I know, this is the only method available on the Internet.

3. Relevant research work

Estimating the Volume of Oil Tanks Based on High-Resolution Remote Sensing Images [2]:

This paper presents an estimation method of oil tank capacity / volume based on satellite images. In order to calculate the total volume of a tank, they need the height and radius of the tank. To calculate the height, they used a geometric relationship with the length of the shadow cast. But calculating the length of the shadow is not easy. To highlight shadows, use the HSV (i.e., hue saturation value) color space, because shadows usually have high saturation in the HSV color space. Then, the shadow length is calculated by the median method based on sub-pixel subdivision positioning. Finally, the radius of oil tank is obtained by Hough transform algorithm.

In the related work of this paper, a building height calculation method based on satellite image is proposed.

4. Useful blogs and research papers

A Beginner’s Guide To Calculating Oil Storage Tank Occupancy With Help Of Satellite Imagery [3]:

The author of this blog is One of these services is the use of satellite images to track crude oil storage at several geographical points of interest.

In this blog, they describe in detail how the external and internal shadows of the oil storage tank help us estimate the oil content. The satellite images taken at a specific time and one month later are also compared, showing the changes of oil storage tanks in the past month. This blog gives us an intuitive knowledge, that is, how to estimate quantity.

A Gentle Introduction to Object Recognition With Deep Learning [4] :

This paper introduces the most confusing concepts in the minds of beginners of object detection. Firstly, the differences between target classification, target location, target recognition and target detection are described. Then some new deep learning algorithms are discussed to carry out the task of target recognition.

Object classification refers to assigning labels to images containing a single object. Object positioning refers to drawing a bounding box around one or more objects in the image. Target detection task combines target classification and location. This means that this is a more challenging / complex task. First, draw a bounding box around the object of interest (OI) through localization technology, and then assign a label to each oi by classification. Target recognition is just a collection of all the above tasks (i.e. classification, location and detection).

Finally, two main target detection algorithms / models are discussed: region based revolutionary neural networks (r-cnn) and you only look once (Yolo).

Selective Search for Object Recognition [5]:

In the task of target detection, the most critical part is target location, because target classification is carried out on this basis. The classification depends on locating the proposed region of interest (region recommendation). More perfect positioning will lead to more perfect target detection. Selective search is a new algorithm, which is used for object location in some object recognition models, such as r-cnn and fast-r-cnn.

Firstly, an efficient graph based image segmentation method is used to generate sub segments of the input image, and then a greedy algorithm is used to merge smaller similar regions into larger similar regions. Segment similarity is based on four attributes: color, texture, size and fill.

Region Proposal Network — A detailed view[6]:

RPN (region proposal network) is widely used in target location because it is faster than the traditional selective search algorithm. It learns the best location of the target from the feature map, just as CNN learns classification from the feature map.

It is responsible for three main tasks: first, generate anchor boxes (each feature mapping point generates 9 anchor boxes with different shapes), then classify each anchor box into foreground or background (i.e. whether it contains objects), and finally learn the shape offset of anchor box to make it suitable for objects.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[7]:

The fast r-cnn model solves all the problems of the first two related models (r-cnn and fast r-cnn), and uses RPN as the regional suggestion generator. Its architecture is exactly the same as fast r-cnn, except that it uses RPN instead of selective search, which makes it 34 times faster than fast r-cnn.

Real-time Object Detection with YOLO, YOLOv2, and now YOLOv3 [8]:

Before introducing the Yolo series of models, let’s take a look at its chief researcher Joseph Redman’s speech at Ted.

There are many reasons why this model ranks first in the list of object detection models. However, the main reason is its firmness. Its reasoning time is very short, which is why it is easy to match the normal speed of video (i.e. 25FPS) and apply it to real-time data.

Different from other object detection models, Yolo model has the following characteristics.

  • Single neural network model (i.e. both classification and positioning tasks will be performed from the same model): take a photo as input to directly predict the bounding box and class label of each bounding box, which means that it only looks at the image once.

  • Since it performs convolution on the whole image rather than a part of the image, it produces very few background errors.

  • General representation of learning objects in Yolo. When training natural images and testing works of art, the performance of Yolo far exceeds the top detection methods such as DPM and r-cnn. Because Yolo is highly versatile, it is unlikely to crash when applied to a new domain or unexpected input.

What makes yolov3 better than yolov2.

  • If you take a closer look at the title of yolov2 paper, it is “yolo9000: better, faster, stronger”. Is yolov3 much better than yolov2? Well, the answer is yes, it’s better, but not faster and stronger, because the complexity of the system has increased.

  • Yolov2 uses a 19 layer Darknet architecture without any residual blocks, skip connections and upsampling, so it is difficult to detect small objects. However, in yolov3, these features are added and the 53 layer Darknet network trained on Imagenet is used. In addition, 53 convolution layers are stacked, forming 106 convolution structures.

  • Yolov3 predicts on three different scales, first the 13×13 grid for large objects, then the 26×26 grid for medium objects, and finally the 52×52 grid for small objects.

  • Yolov3 uses a total of 9 anchor boxes, 3 for each scale. The best anchor box is selected by K-means clustering method.

  • Yolov3 now performs multi label classification on objects detected in the image. The object confidence and class prediction were predicted by logistic regression.

5. Our contribution

Our problem statement includes two tasks, the first is the detection of floating roof tanks, the other is the extraction of shadows and the estimation of the volume of identified tanks. The first task is based on target detection, and the second task is based on computer vision technology. Let’s describe how to solve each task.

Tank inspection:

Our goal is to estimate the volume of the floating roof tank. We can establish a target detection model for one class, but in order to reduce the confusion between one model and another oil storage tank (i.e. other types of oil storage tanks) and make it robust, we propose three categories of target detection models. Yolov3 with transfer learning is used for target detection because it is easier to train on the machine. In addition, in order to improve the measurement score, the method of data enhancement is also adopted.

Shadow extraction and volume estimation:

Shadow extraction involves many computer vision technologies. Since RGB color schemes are not sensitive to shadows, they must first be converted to HSV and lab color spaces. We use the ratio image of (L1 + L3) / (V + 1) (where L1 is the first channel value of lab color space) to enhance the shadow part.

Then, pass the threshold of 0.5 × t1+0.4 × T2 (where T1 is the minimum pixel value and T2 is the average value) to filter the enhanced image. Then the threshold image is morphologically processed (i.e. removing noise, clear contour, etc.).

Finally, the shadow contours of the two oil storage tanks are extracted, and then the occupied volume is estimated according to the above formula. These ideas are taken from the following notebook.

Follow the whole process to solve this case study, as shown below.

Let’s start with the exploratory data analysis EDA of the data set!!

6. Exploratory data analysis (EDA)

Explore the labels.json file:

json_labels = json.load(open(os.path.join('data','labels.json')))
print('Number of Images: ',len(json_labels))

All tags are stored in the dictionary list. There are 100000 pictures in total. Images that do not contain any tanks are marked skip, while images that contain tanks are marked tank, tank cluster, or floating head tank. Each tank object has bounding box coordinates of four corners in dictionary format.


Of the 10K images, 8187 have no labels (i.e. they do not contain any reservoir objects). In addition, 81 images contain at least one oil storage tank cluster object, and 1595 images contain at least one floating roof oil storage tank.

In the bar chart, it can be observed that of the 1595 floating roof tanks containing images, 26.45% of the images contain only one floating roof tank object. The maximum number of floating roof tank objects in a single image is 34.

Explore labels_ Coco.json file:

json_labels_coco = json.load(open(os.path.join('data','labels_coco.json')))
print('Number of Floating tanks: ',len(json_labels_coco['annotations']))
no_unique_img_id = set()
for ann in json_labels_coco['annotations']:
print('Number of Images that contains Floating head tank: ', len(no_unique_img_id))

This file contains only the bounding box of the floating roof tank and its in the dictionary format listimage_id

Print bounding box:

There are three types of oil storage tanks:

  1. Tank (t tank)

  2. Tank cluster (TC tank group),

  3. Floating head tank (FHT)

7. Data expansion

In EDA, it is observed that 8171 of 10000 images are useless because they do not contain any objects. In addition, 1595 images contain at least one floating roof tank object. As we all know, all deep learning models need a lot of data, and insufficient data will lead to performance degradation.

Therefore, we first expand the data, and then fit the expanded data into yolov3 target detection model.

8. Data preprocessing, expansion and tfrecords

Data preprocessing:

It is observed that the annotation of the object is given in Jason format, with 4 corners. First, the upper left corner and the lower right corner are extracted from these corners. Next, all comments belonging to a single image and their corresponding labels are saved in a one-line list in the CSV file.

Extract the code of the upper left corner and the lower right corner from the corner

def conv_bbox(box_dict):
  input: box_ Dict - > there are 4 corners in the dictionary
  Function: get the upper left and lower right points
  output: tuple(ymin, xmin, ymax, xmax)
  xs = np.array(list(set([i['x'] for i in box_dict])))
  ys = np.array(list(set([i['y'] for i in box_dict])))
  x_min = xs.min()
  x_max = xs.max()
  y_min = ys.min()
  y_max = ys.max()
  return y_min, x_min, y_max, x_max

The CSV file will look like this

In order to evaluate the model, we will retain 10% of the images as the test set.

#Division of training and testing
df_train, df_test= model_selection.train_test_split(
  DF, #csv file notes
df_train.shape, df_test.shape

Data expansion:

We know that target detection requires a lot of data, but we only have 1645 images for training, which is very few. In order to add data, we must perform data expansion. In this process, a new image is generated by flipping and rotating the original image. Let’s go to the GitHub repository below and extract the code for expansion

Seven new images are generated from a single original image by performing the following operations:

  1. Flip horizontally

  2. Rotate 90 degrees

  3. Rotate 180 degrees

  4. Rotate 270 degrees

  5. Horizontal flip and 90 degree rotation

  6. Horizontal flip and 180 degree rotation

  7. Horizontal flip and 270 degree rotation

An example is shown below


Tfrecords is tensorflow’s own binary storage format. It is often useful when data sets are too large. It stores data in binary format and has a significant impact on the performance of the training model. Binary data replication takes less time and takes less space because only one batch data is loaded during training. You can find a detailed description of it in the blog below.

You can also view the tensorflow document below.

Our dataset has been converted to rfrecords format. This task is not necessary because our dataset is not very large. However, this is for the purpose of knowledge. If you are interested, you can find the code in my GitHub repository.

9. Object detection based on yolov3


In order to train yolov3 model, transfer learning is adopted. The first step involves loading the weight of the Darknet network and freezing it during training to keep the weight unchanged.

def create_model():
    pret_model = YoloV3(size, channels, classes=80)
    load_darknet_weights(pret_model, 'Pretrained_Model/yolov3.weights')
    print('\nPretrained Weight Loaded')

    model = YoloV3(size, channels, classes=3)
    print('Yolo DarkNet weight loaded')

    print('Frozen DarkNet layers')
    return model

model = create_model()

We use Adam optimizer (initial learning rate = 0.001) to train our model and apply cosine attenuation according to epoch to reduce the learning rate. Model checkpoints are used to save the best weight during training, and the last weight is saved after training.

epochs = 100

optimizer = get_optimizer(
    optim_type = 'adam',
loss = [YoloLoss(yolo_anchors[mask], classes=3) for mask in yolo_anchor_masks]

model = create_model()
model.compile(optimizer=optimizer, loss=loss)

# Tensorbaord
! rm -rf ./logs/ 
logdir = os.path.join("logs","%Y%m%d-%H%M%S"))
%tensorboard --logdir $logdir
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

callbacks = [
    EarlyStopping(monitor='val_loss', min_delta=0, patience=15, verbose=1),
    ModelCheckpoint('Weights/Best_weight.hdf5', verbose=1, save_best_only=True),

history =,

Loss function:

Yolo loss function:

The loss function used in yolov3 model training is quite complex. Yolo calculates three different losses on three different scales and summarizes the back propagation (as you can see in the code unit above, the final loss is a list of three different losses). Each loss calculates the location loss and classification loss through four sub functions.

  1. MSE loss of center (x, y)
  2. Mean square error (MSE) of the width and height of the bounding box
  3. Binary cross entropy score and no target score of boundary box
  4. Binary cross entropy or sparse category cross entropy of boundary box multi class prediction

Let’s look at the loss formula used in yolov2

The last three terms in yolov2 are the square error, while in yolov3, they are replaced by the cross entropy error term. In other words, the object confidence and class prediction in yolov3 are now predicted by logistic regression.

Look at the implementation of yolov3 loss function

def YoloLoss(anchors, classes=3, ignore_thresh=0.5):
    def yolo_loss(y_true, y_pred):
        #1. Convert all forecast outputs
        # y_pred: (batch_size, grid, grid, anchors, (x, y, w, h, obj, ...cls))
        pred_box, pred_obj, pred_class, pred_xywh = yolo_boxes(
            y_pred, anchors, classes)
        # predicted (tx, ty, tw, th)
        pred_xy = pred_xywh[..., 0:2] #x,y of last channel
        pred_wh = pred_xywh[..., 2:4] #w,h of last channel

        #2. Convert all real outputs
        # y_true: (batch_size, grid, grid, anchors, (x1, y1, x2, y2, obj, cls))
        true_box, true_obj, true_class_idx = tf.split(
            y_true, (4, 1, 1), axis=-1)

        #Convert x1, Y1, X2, Y2 to x, y, W, H
        # x,y = (x2 - x1)/2, (y2-y1)/2
        # w, h = (x2- x1), (y2 - y1)
        true_xy = (true_box[..., 0:2] + true_box[..., 2:4]) / 2
        true_wh = true_box[..., 2:4] - true_box[..., 0:2]

        #Small boxes need higher weights
        #shape-> (batch_size, grid, grid, anchors)
        box_loss_scale = 2 - true_wh[..., 0] * true_wh[..., 1]

        #3. Reverse the pred box equation
        #Change (BX, by, BW, BH) to (TX, ty, TW, th) 
        grid_size = tf.shape(y_true)[1]
        grid = tf.meshgrid(tf.range(grid_size), tf.range(grid_size))
        grid = tf.expand_dims(tf.stack(grid, axis=-1), axis=2)
        true_xy = true_xy * tf.cast(grid_size, tf.float32) - tf.cast(grid, tf.float32)
        true_wh = tf.math.log(true_wh / anchors)
        #Maybe some lattices are true_ Wh is 0. Dividing by anchor points may lead to inf or Nan
        true_wh = tf.where(tf.logical_or(tf.math.is_inf(true_wh),
                           tf.zeros_like(true_wh), true_wh)

        #4. Calculate all masks
        #The dimension of dimension 1 is removed from the shape of the tensor.
        #obj_mask: (batch_size, grid, grid, anchors)
        obj_mask = tf.squeeze(true_obj, -1) 
        #When IOU exceeds the critical value, false positive examples are ignored
        #best_iou: (batch_size, grid, grid, anchors)
        best_iou = tf.map_fn(
            lambda x: tf.reduce_max(broadcast_iou(x[0], tf.boolean_mask(
                x[1], tf.cast(x[2], tf.bool))), axis=-1),
            (pred_box, true_box, obj_mask),
        ignore_mask = tf.cast(best_iou < ignore_thresh, tf.float32)

        #5. Calculate all losses
        xy_loss = obj_mask * box_loss_scale * \
            tf.reduce_sum(tf.square(true_xy - pred_xy), axis=-1)
        wh_loss = obj_mask * box_loss_scale * \
            tf.reduce_sum(tf.square(true_wh - pred_wh), axis=-1)
        obj_loss = binary_crossentropy(true_obj, pred_obj)
        obj_loss = obj_mask * obj_loss + \
            (1 - obj_mask) * ignore_mask * obj_loss
        #Todo: using binary_ Crossintropy instead
        class_loss = obj_mask * sparse_categorical_crossentropy(
            true_class_idx, pred_class)

        #6. Sum in (batch, gridx, gridy, anchors) to get = > (batch, 1)
        xy_loss = tf.reduce_sum(xy_loss, axis=(1, 2, 3))
        wh_loss = tf.reduce_sum(wh_loss, axis=(1, 2, 3))
        obj_loss = tf.reduce_sum(obj_loss, axis=(1, 2, 3))
        class_loss = tf.reduce_sum(class_loss, axis=(1, 2, 3))

        return xy_loss + wh_loss + obj_loss + class_loss
    return yolo_loss


To evaluate our model, we used AP and map to evaluate training and test data

Test set score

get_mAP(model, 'data/test.csv')

Training set score

get_mAP(model, 'data/train.csv')


Let’s see how this model works

10. Reserve estimation

Volume estimation is the final result of this case study. There are no criteria for evaluating the estimated volume. However, we try to find the best threshold pixel value of the image so that the shadow region can be detected to a large extent (by calculating the number of pixels).

We will use the large 4800×4800 shape image captured by the satellite and divide it into 100 512×512 subgraphs. The subgraphs on the two axes overlap 37 pixels. Image patch in ID_ row_ Column.jpg name.

The prediction of each generated subgraph is stored in a CSV file. Next, estimate the volume of each floating roof storage tank (the code and explanation are provided in notebook format in my GitHub repository).

Finally, all image blocks and bounding boxes are combined with labels to output the estimated volume to form a large image. You can look at the following examples:

11. Results

The AP score of floating roof tank on the test set is 0.874 and that on the training set is 0.942.

12. Conclusion

  • Very good results can be obtained with only a limited number of images.

  • Data expansion is in place.

  • In this case, yolov3 performs well compared to the existing methods of the retinanet model.

13. Future work

  • The AP value of floating roof tank is 87.4%, with high score. However, we can try to improve our scores to a greater extent.

  • We will try to generate more data to train this model.

  • We will try to train another more accurate model, such as yolov4 and yolov5 (Unofficial).

14. References

[1] Oil-Tank-Volume-Estimation, by Karl Heyer, Nov 2019. (

[2] Estimating the Volume of Oil Tanks Based on High-Resolution Remote Sensing Images by Tong Wang, Ying Li, Shengtao Yu, and Yu Liu, April 2019.(

[3] A Beginner’s Guide To Calculating Oil Storage Tank Occupancy With Help Of Satellite Imagery by, Sep 2017.(

[4] A Gentle Introduction to Object Recognition With Deep Learning by, May 2019.(

[5] Selective Search for Object Recognition by J.R.R. Uijlings at el. 2012(

[6] Region Proposal Network — A detailed view by Sambasivarao. K, Dec 2019(

[7] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks by Ross Girshick et al. Jan 2016.(

[8] Real-time Object Detection with YOLO, YOLOv2 and now YOLOv3 by Joseph Redmon, 2015–2018 (,,

Original link:

Welcome to panchuang AI blog:

Official Chinese document of sklearn machine learning:

Welcome to panchuang blog resources summary station: