Doodle Labeller (Doodler) · "Human-In-The-Loop" machine learning for image segmentation

Doodle Labeller (Doodler)"Human-In-The-Loop" machine learning for image segmentation

Warning!

Doodler is still in active development and beta version. Use at your own risk!

Please check back later, watch the github repository to receive alerts, or listen to announcements on https://twitter.com/magic_walnut for the first official release.

Semi-supervised segmentation of natural scenes

There are many great tools for exhaustive (i.e. whole image) image labeling for segmentation tasks, using polygons. Examples include www.makesense.ai and https://cvat.org.

However, for high-resolution imagery with large spatial footprints and complex scenes, such as aerial and satellite imagery, exhaustive labeling using polygonal tools can be very time-consuming. This is especially true of scenes with many classes of interest, and covering relatively small, spatially discontinuous regions of the image.

Doodler is for rapid semi-supervised approximate segmentation of such imagery. It can reduce the time required for detailed labeling of large and complex scenes by an order of magnitude or more.

What can you use doodler to do?

Doodler is designed to do two things:

It allows you to carry out image segmentation, quickly and effectively on any type of image. This might suit somebody who does not want or need to train a model to acheive the same result completely autonomously. You may only have a few images to label; doodler is perfect for this use case.

It allows you to generate label data to train other types of machine learning models for image segmentation, quickly and effectively on any type of image. By providing enough examples of images and their corresponding pixelwise labels, models can be trained to generate the same types of segmentations on other image collections, such as future data collections in regular image-based surveys.

Quickly generate label data to train deep learning models

Training start-of-the-art deep learning models for image segmentation can require hundreds to thousands of example label images.

For natural and other scenes, doodler can be a relatively quick (in terms of the hours you spend annotating) way to generate large numbers of label images.

Label by example

Freehand label only some of the scene, then use a model to complete the scene. A semi-supervised tool for efficient image labeling, based on sparse examples provided by a human annotator. Those sparse annotations are used by a secondary automated process to estimate the class of every pixel in the image.

For rapid approximate image segmentation

For high-resolution imagery with large spatial footprints and complex scenes, such as aerial and satellite imagery, exhaustive labeling using polygonal tools can be prohibitively time-consuming. Doodler offers a potential alternative.

What doodler does

Doodler is a tool for "exemplative", not exhaustive, labeling.

The approach taken here is to freehand label only some of the scene, then use a model to complete the scene. Your sparse annotations are used to create an ensemble of Conditional Random Field (CRF) models, each of which develops a scene-specific model for each class and creates a dense (i.e. per pixel) label image based on the information you provide it. The ensembles are combined for a stable estimate.

This approach can reduce the time required for detailed labeling of large and complex scenes by an order of magnitude or more.

`doodler.py` (and `merge.py`)

This tool is also set up to tackle image labeling in stages, using minimal annotations.

For example, by labeling individual classes then using the resulting binary label images as masks for the imagery to be labeled for subsequent classes. Labeling is achieved using the doodler.py script

Label images that are outputted by doodler.py can be merged using merge.py.

Quick Start

These brief instructions are for regular python and command line users.

Clone the repo:
git clone --depth 1 https://github.com/dbuscombe-usgs/doodle_labeller.git

Make a conda environment:
conda env create -f doodler.yml

Activate the conda environment:
conda activate doodler

Doodle!
python doodler.py -c config_file.json

Acknowledgements

Doodler is written and maintained by Daniel Buscombe, Marda Science.

Doodler development is funded by the U.S. Geological Survey Coastal Hazards Program, and is for the primary usage of U.S. Geological Survey scientists, researchers and affiliated colleagues working on the Hurricane Florence Supplemental Project and other coastal hazards research.

Many people have contributed ideas, code fixes, and bug reports. Thanks especially to Jon Warrick, Chris Sherwood, Jenna Brown, Andy Ritchie, Jin-Si Over, Christine Kranenburg, and the rest of the Florence Supplemental team; to Evan Goldstein and colleagues at University of North Carolina Greensboro; Leslie Hsu at the USGS Community for Data Integration; and LCDR Brodie Wells, formerly of Naval Postgraduate School, Monterey.

Contributing

The software is optimized for specific types of imagery (see ←), but is highly configurable to specific purposes, and is therefore made publicly under an MIT license in the spirit of open source ✓, open access ✓, scientific rigour ✓ and transparency ✓.

While Marda Science cannot carry out unpaid consulting over specific use cases, we encourage you to submit issues and new feature requests, and, if you find it useful and have made improvements, to contribute to its development through a pull request on https://github.com/dbuscombe-usgs/doodle_labeller.