Hrsc2016 dataset is converted from XML format to Yolo format, with download link attached

Time:2022-5-4

Data set introduction

Dataset background:

Hrsc2016 dataset

It contains 27 types of remote sensing objects
Extracted from Google Earth
Issued by Northwest University of technology in 2016
Oriented bounding boxes (OBB) annotation format is adopted

Hrsc2016 (Liu et al., 2016) is the data collected by Northwest University of technology for ship detection, including 2976 ship instance information in 4 categories and 19 subcategories. In particular, the paper points out that their data set is a high-resolution data set with a resolution between 0.4m and 2m. All images in the data set are from six famous ports, including ships sailing at sea and ships close to the coast. The size of ship images ranges from 300 to 1500, and most images are larger than 1000X600.

Dataset category description

The target of this data set is ships in aerial images, including offshore ships and nearshore ships. The author uses the It has a tree structure with height of 3. L1 level is class, L2 level is category and L3 level is type. It is similar to the classification view of biology, which is expressed as follows:

image

Sample labeling information

Hrsc2016 adopts the annotation method of OBB (oriented bounding box), which provides three types of annotation information, including bounding box, rotating bounding box and pixel based segmentation, as well as additional information such as port, data source and shooting time. Some data annotations are shown as follows:

sealand
  69.040297,33.070036
  1138
  833
  3
  1.07
  18
  100
  0
  0
  274d
  
	
	  100000008
	  100000013
	  100000008
	  0
	  0
	  628 // bounding box coordinate point
	  40
	  815
	  783
	  719.9324 // coordinates of upper left corner after rotation
	  413.0048
	  741.8246
	  172.6959
	  1.499893 // rotation angle
	  0
	  
	  
	  713 // ship head information
	  777

Data image example

image

Here’s the code first

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join

sets=[ ('2007', 'test')]

classes = ["ship"]


def convert(size, box):
	dw = 1./size[0]
	dh = 1./size[1]
	x = (box[0] + box[1])/2.0
	y = (box[2] + box[3])/2.0
	w = box[1] - box[0]
	h = box[3] - box[2]
	x = x*dw
	w = w*dw
	y = y*dh
	h = h*dh
	return (x,y,w,h)

def convert_annotation(year, image_id):
	#Convert the coordinate representation (format) of this picture, that is, read the content of XML file, calculate and store it in TXT file
	in_file = open('./data/VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
	out_file = open('./data/VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
	tree=ET.parse(in_file)
	root = tree.getroot()
	# size = root.find('size')
	w = int(root.find('Img_SizeWidth').text)
	h = int(root.find('Img_SizeHeight').text)

	if root.find('HRSC_Objects'):
		for obj in root.iter('HRSC_Object'):
			difficult = obj.find('difficult').text
			cls = 'ship'
			# cls = obj.find('name').text
			# if cls not in classes or int(difficult) == 1:
			if int(difficult) == 1:
				continue
			cls_id = classes.index(cls)
			# xmlbox = obj.find('bndbox')
			b = (float(obj.find('box_xmin').text), float(obj.find('box_xmax').text), float(obj.find('box_ymin').text), float(obj.find('box_ymax').text))
			bb = convert((w,h), b)
			out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()

for year, image_set in sets:
	if not os.path.exists('./data/VOCdevkit/VOC%s/labels/'%(year)):
		os.makedirs('./data/VOCdevkit/VOC%s/labels/'%(year))
	image_ids = open('./data/VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
	list_file = open('%s_%s.txt'%(year, image_set), 'w')
	for image_id in image_ids:
		list_file.write('./data/%s/VOCdevkit/VOC%s/JPEGImages/%s.bmp\n'%(wd, year, image_id))
		convert_annotation(year, image_id)
	list_file.close()

The above is a TXT file from hrsc2016 dataset XML format to Yolo format

Pay attention to the path problem

The data set consists of 3 categories and 27 sub categories, with a total of 2976 targets

This is to say that the data set is divided into only one kind of ship

image size:300 × 300 ~ 1500 × 900
Image number: 1061 contains 436, 181 and 444 images in the training set, verification set and test set respectively
object number:2976

The data set download address is attached with a link

https://aistudio.baidu.com/aistudio/datasetdetail/54106

Article reference
The original article of CSDN blogger “marlowee” follows the CC 4.0 by-sa copyright agreement and is attached
Original link:https://blog.csdn.net/weixin_43427721/article/details/122057389