Getting Started with Darknet YOLO and MS COCO for Object Detection

The purpose of this post is to describe how one can easily prepare an instance of the MS COCO dataset as input for training Darknet to perform object detection with YOLO. Steps for updating relevant configuration files for Darknet YOLO are also detailed.

It is assumed that a UNIX based OS environment is used and any commands or conventions encountered in this post adhere to that assumption.

Preparation

Environment

Access to a computer with a modern Nvidia CUDA capable GPU is advisable for training. Darknet can utilise the GPU to enhance the speed of training and detection.

Paperspace delivers a convenient service for spinning up GPU enabled cloud VMs, billed by the hour, with minimal sign-up. If using Paperspace, this script can assist in quickly setting up pre-requisites for training Darknet with COCO.

Darknet YOLO

Joseph Redmon is creator of the Darknet project and the main researcher of the YOLO algorithm. He maintains the official Darknet repository on GitHub, however, it is recommended to use AlexeyAB’s fork of Darknet, as it contains enhancements in error logging, cross-platform compatibility and bug fixes.

After cloning the forked version of Darknet, it is necessary to modify one of the source files to use the COCO dataset of choice, prior to compilation.
Edit src/utils.c and update the function replace_image_to_label, replacing the highlighted hardcoded substrings train2014 and val2014 strings to train2017 and val2017:

Assuming that the environment used for training has an Nvidia GPU:

  1. Install CUDNN to quicken training.
  2. Update the Makefile:
    1. Set the CUDNN flag to 1.
    2. Set the GPU flag to 1. This will speed up detection by utilising the GPU at runtime.

In the absence of a GPU, the Advanced Vector Extensions (AVX) instruction set can be enabled to improve Darknet performance on CPU by enabling the AVX flag. Intel and AMD processors manufactured in 2011 or later support AVX.

Darknet is now ready for compilation.

COCO Dataset

While the Darknet repository bundles the script ‘scripts/get_coco_dataset.sh’ this fetches a dated version of the MS COCO (from 2014) dataset and YOLO compatible annotations. If you wish to use the latest COCO dataset, it is unsuitable. The latest COCO dataset images and annotations can be fetched from the official website.

COCO Dataset Download Section

Under “Images”, download the 2017 Train images (18GB)” and “2017 Val images (1GB) archives.
Under “Annotations”, download the 2017 Train/Val annotations (241MB).
Or, as recommended later on the page, perform the download using gsutil.

Once download has finished, create a directory structure as below:

Inflate the training images archive (train2017.zip) and evaluation images archive (val2017.txt) to coco/images/. The annotations archive should be inflated to, you guessed it, coco/annotations/:

COCO to YOLO Conversion Utility

The COCO dataset without further post-processing is incompatible with Darknet YOLO. Read the YOLO publication to learn more about the annotation format (and the YOLO algorithm itself).

In summary, a single YOLO image annotation consists of a space separated object category ID and four ratios:

  1. Object category ID.
  2. X coordinate, corresponding to the annotation box’s centre, and relative to the image width.
  3. Y coordinate, corresponding to the annotation box’s centre, and relative to the image height.
  4. Annotation box width.
  5. Annotation box height.

Each YOLO “label” file can contain a new line separated list of these annotations, for objects of various categories.

The MS COCO annotations file provides the necessary parameters for calculating these values. A specification of the data format can be found on the official website.

Special attention must be paid to the fact that the MS COCO bounding box coordinates correspond to the top-left of the annotation box. Darknet YOLO, on the other hand, expects the coordinate to be the centre point of the annotation bounding box. Therefore, the annotation box’s centre point coordinate must be calculated prior to making it relative to the complete image size.

The project “COCO to YOLO” automates the conversion and filtering of the COCO dataset. Source code, a pre-built JAR and how-to-use guide can be found on the project homepage.

Conversion

The utility needs to be executed for both the training and evaluation datasets, separately.

If training is to be performed on all classes/categories in the dataset, then execute the program as such:

Note the specification of the “all” argument, which will convert COCO annotations for all categories in the dataset. If interested in a subset of categories for training, then it is possible to replace the argument with a CSV list of valid COCO category names, e.g., “truck, person, car”.

Aside: Training with a subset of values wouldn’t produce a tangible improvement on detection performance or accuracy. The neural network’s size should be reduced, instead. Only training time would be minimised.

Post execution of the utility, the directory coco/yolo/ should contain YOLO label files for each image that contained an object of the desired category and an image_list.txt file.
This file contains the list of images that serve as input to Darknet for training. Each image file should correspond to a single label file.

Move the contents of the output directory before re-running with the evaluation dataset.

After converting the training and evaluation sets of annotations, the coco/ directory should resemble the following file tree hierarchy:

Testing

To determine whether the converted annotations are correct prior to training, one can use Yolo_mark by AlexeyAB to visually inspect the result.

Clone the project, compile and copy some of the label files and the corresponding images into the $YOLO_MARK_HOME/x64/Release/data/img/ directory.

Update x64/Release/data/obj.names to reflect the order of names used during conversion (or copy contents of $DARKNET_HOME/data/coco.names if all classes converted) and execute the application via linux_mark.sh.

YOLO Mark user interface, visualising annotation bounding boxes.

Configuration

Finally, a Darknet YOLO data file needs to be created, declaring the locations of the various input files required for training. See $DARKNET_HOME/cfg/coco.data for an example. Based on the hierarchy above, the file contents would resemble the following (if three categories were used):

Be aware that at the time of writing, the default neural network configuration file for COCO ($DARKNET_HOME/cfg/yolov3.cfg) is configured to train with 80 classes. If fewer or more than 80 classes are to be classified, then the neural network layers require reconfiguration:

  1. Copy the default configuration file.
  2. Update each of the classes parameter values to reflect how many categories from the dataset were included.
  3. For each convolutional layer (designated [convolutional]) located just before the YOLO layer (designated [yolo]), update the filters parameter. Assuming that YOLO v3 is used, then the following formula is used to calculate the filters value: filters = (classes + 5) * 3
  4. At the top of the configuration file, under the [net] header, assign the value of 64 to batch and a value of 16 to subdivisions, for training.

Training

With all of those files and changes in place, initiate Darknet training: ./darknet detector train cfg/coco-custom.data cfg/yolov3-custom.cfg

Share your thoughts