"ML Mondays"

"ML Mondays"

  • Docs
  • Data
  • Models
  • API
  • Help
  • Blog

›Recent Posts

Recent Posts

  • From 2 geotiffs to a trained U-Net: 2D, 3D, and 4D imagery example. Part 2: Models
  • From 2 geotiffs to a trained U-Net: 2D, 3D, and 4D imagery example. Part 1: Data
  • Converting makesense.ai JSON labels to label (mask) imagery for an image segmentation project
  • Preparing a new dataset for an object recognition project
  • ML terminology, demystified

From 2 geotiffs to a trained U-Net: 2D, 3D, and 4D imagery example. Part 2: Models

October 24, 2020

Dan Buscombe

Segmentation Model

In the previous post we prepared two analysis ready datasets, the first consisting of RGB (or 3D) images and associated greyscale labels, and the second consisting of 4D (RGB + DEM) imagery and the same labels. Here I show how to use that imagery in three different workflows, first using RGB imagery, then DEM, then a combination of the two. We'll evaluate using a validation dataset, and some unseen (unaugmented) sample imagery to test the ability of the model to generalize. We see that the 4D data is slightly better than the 3D data, which is a lot better than the 2D data for this task. However, in each case the model didn't do well on the sample imagery so some troubleshooting is required. Likely, a lot more data is required; this workflow uses only 16 original image tiles; many more would be required for accurate results. However, this blog post does demonstrate that the mlmondays workflows can be adapted to different data sets, and more complicated data and classes.

RGB imagery

Model preparation

from imports import *
def get_batched_dataset(filenames):

    option_no_order = tf.data.Options()
    option_no_order.experimental_deterministic = True

    dataset = tf.data.Dataset.list_files(filenames)
    dataset = dataset.with_options(option_no_order)
    dataset = dataset.interleave(tf.data.TFRecordDataset, cycle_length=16, num_parallel_calls=AUTO)
    dataset = dataset.map(read_seg_tfrecord_dunes, num_parallel_calls=AUTO)

    dataset = dataset.cache() # This dataset fits in RAM
    dataset = dataset.repeat()
    dataset = dataset.shuffle(2048)
    dataset = dataset.batch(BATCH_SIZE, drop_remainder=True) # drop_remainder will be needed on TPU
    dataset = dataset.prefetch(AUTO) #

    return dataset

def get_training_dataset():
    return get_batched_dataset(training_filenames)

def get_validation_dataset():
    return get_batched_dataset(validation_filenames)

We need a function to seg each example record from the TFRecord shards. You'll see similar functions in mlmondays workflows for OBX and OysterNet dataset. We start by creating a dictionary to use to parse the two features (image and label pair) as binary strings. Then convert each to jpeg and scale to the range [0, 1]. If any number in the label is greater than 8, it is set to zero. Zero is being used as a NULL class for zero image pixels. Finally, the label image is converted into a one-hot stack, with 9 bands (one for each of the 8 classes and the null class).

@tf.autograph.experimental.do_not_convert
def read_seg_tfrecord_dunes(example):
    features = {
        "image": tf.io.FixedLenFeature([], tf.string),  # tf.string = bytestring (not text string)
        "label": tf.io.FixedLenFeature([], tf.string),   # shape [] means scalar
    }
    # decode the TFRecord
    example = tf.io.parse_single_example(example, features)

    image = tf.image.decode_jpeg(example['image'], channels=3)
    image = tf.cast(image, tf.float32)/ 255.0

    label = tf.image.decode_jpeg(example['label'], channels=1)
    label = tf.cast(label, tf.uint8)

    cond = tf.greater(label, tf.ones(tf.shape(label),dtype=tf.uint8)*6)#8)
    label = tf.where(cond, tf.ones(tf.shape(label),dtype=tf.uint8)*0, label)

    label = tf.one_hot(tf.cast(label, tf.uint8), 7) #9)
    label = tf.squeeze(label)

    return image, label

From now on the code look should look familiar, if you've run through the exercises as part of mlmondays week 3 image segmentation. We define a data path to the tfrecord files, a filepath for the model weights, a file path for the training history plot. Then specify a patience for the early stopping criterion, the number of images encoded in each shard, for specification of training and validation steps per model training epoch, the validation split, and batch size.

data_path= os.getcwd()+os.sep+"data/dunes"

filepath = os.getcwd()+os.sep+'results/dunes_8class_best_weights_model.h5'

hist_fig = os.getcwd()+os.sep+'results/dunes_8class_model.png'

patience = 20

ims_per_shard = 12

VALIDATION_SPLIT = 0.6

BATCH_SIZE = 4

filenames = sorted(tf.io.gfile.glob(data_path+os.sep+'dunes3d*.tfrec'))

nb_images = ims_per_shard * len(filenames)
print(nb_images)

split = int(len(filenames) * VALIDATION_SPLIT)

training_filenames = filenames[split:]
validation_filenames = filenames[:split]

validation_steps = int(nb_images // len(filenames) * len(validation_filenames)) // BATCH_SIZE
steps_per_epoch = int(nb_images // len(filenames) * len(training_filenames)) // BATCH_SIZE
train_ds = get_training_dataset()

L = []
for k in range(12):
  plt.figure(figsize=(16,16))
  for imgs,lbls in train_ds.take(1):
    #print(lbls)
    for count,(im,lab) in enumerate(zip(imgs, lbls)):
       plt.subplot(int(BATCH_SIZE/2),int(BATCH_SIZE/2),count+1)
       plt.imshow(im)
       plt.imshow(np.argmax(lab,-1), cmap=plt.cm.bwr, alpha=0.5)#, vmin=0, vmax=7)
       #plt.imshow(lab, cmap=plt.cm.bwr, alpha=0.5, vmin=0, vmax=9)
       plt.axis('off')
       L.append(np.unique(np.argmax(lab,-1)))

  plt.show()

What unique values do we have in our augmented imagery?

print(np.round(np.unique(np.hstack(L))))

[0 1 2 3 4 5 6]
val_ds = get_validation_dataset()

Model training

Define the number of classes (9, including the null class) and target size (the encoded image's size), then create a model. Compile it, define callbacks.

nclasses=7 #9
TARGET_SIZE = 608
model = res_unet((TARGET_SIZE, TARGET_SIZE, 3), BATCH_SIZE, 'multiclass', nclasses)
# model.compile(optimizer = 'adam', loss = tf.keras.losses.CategoricalHinge(), metrics = [mean_iou])
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = [mean_iou])

earlystop = EarlyStopping(monitor="val_loss",
                              mode="min", patience=patience)

model_checkpoint = ModelCheckpoint(filepath, monitor='val_loss',
                                verbose=0, save_best_only=True, mode='min',
                                save_weights_only = True)

lr_callback = tf.keras.callbacks.LearningRateScheduler(lambda epoch: lrfn(epoch), verbose=True)

callbacks = [model_checkpoint, earlystop, lr_callback]

Fit the model, and make a plot of the model training history (loss and mean IOU).


#warmup
model.fit(train_ds, steps_per_epoch=steps_per_epoch, epochs=MAX_EPOCHS,
                      validation_data=val_ds, validation_steps=validation_steps,
                      callbacks=callbacks)

history = model.fit(train_ds, steps_per_epoch=steps_per_epoch, epochs=MAX_EPOCHS,
                      validation_data=val_ds, validation_steps=validation_steps,
                      callbacks=callbacks)

plot_seg_history_iou(history, hist_fig)

plt.close('all')
K.clear_session()

Model evaluation

Evaluate the model using the validation set. Print the average loss and IoU score.

scores = model.evaluate(val_ds, steps=validation_steps)

print('loss={loss:0.4f}, Mean IOU={iou:0.4f}'.format(loss=scores[0], iou=scores[1]))

loss=0.2917, Mean IOU=0.9542

sample_data_path = os.getcwd()+os.sep+'data/dunes/images/files'
test_samples_fig = os.getcwd()+os.sep+'dunes_sample_16class_est16samples.png'
sample_label_data_path = os.getcwd()+os.sep+'data/dunes/labels/files'

sample_filenames = sorted(tf.io.gfile.glob(sample_data_path+os.sep+'*.jpg'))
sample_label_filenames = sorted(tf.io.gfile.glob(sample_label_data_path+os.sep+'*.jpg'))

These are the same hex coolor codes as the plotly G10 colormap used to make the color label imagery, made into a custom matplotlib discrete colormap

from matplotlib.colors import ListedColormap
cmap = ListedColormap(["#000000",
      "#3366CC", "#DC3912",
      "#FF9900", "#109618",
      "#990099", "#0099C6"])#,
      # "#DD4477", "#66AA00"])

We're going to adopt a spatial filter again to remove high-frequency noise associated with jpeg compression and unpacking

from skimage.filters.rank import median
from skimage.morphology import disk

This is the same function as in the mlmondays repository, with the additional TARGET_SIZE argument

def seg_file2tensor(f, TARGET_SIZE):

    bits = tf.io.read_file(f)
    image = tf.image.decode_jpeg(bits)

    w = tf.shape(image)[0]
    h = tf.shape(image)[1]
    tw = TARGET_SIZE
    th = TARGET_SIZE
    resize_crit = (w * th) / (h * tw)
    image = tf.cond(resize_crit < 1,
                  lambda: tf.image.resize(image, [w*tw/w, h*tw/w]), # if true
                  lambda: tf.image.resize(image, [w*th/h, h*th/h])  # if false
                 )

    nw = tf.shape(image)[0]
    nh = tf.shape(image)[1]
    image = tf.image.crop_to_bounding_box(image, (nw - tw) // 2, (nh - th) // 2, tw, th)
    # image = tf.cast(image, tf.uint8) #/ 255.0

    return image

Cycle through each ground truth sample label image and create a list of those, L. Make a 4 x 4 subplot plot of the ground truth label images

L = []
plt.figure(figsize=(24,24))

for counter,(f,l) in enumerate(zip(sample_filenames, sample_label_filenames)):
    image = seg_file2tensor(f, TARGET_SIZE)
    label = seg_file2tensor(l, TARGET_SIZE)
    label = label.numpy().squeeze()

    label = median(label/255., disk(5)).astype(np.uint8)

    label[image[:,:,0]==0] = 0 #(0,0,0)

    plt.subplot(4,4,counter+1)
    name = sample_filenames[counter].split(os.sep)[-1].split('.jpg')[0]
    plt.title(name, fontsize=10)
    plt.imshow(image)

    plt.imshow(label, cmap=cmap, vmin=0, vmax=6) #8)

    plt.axis('off')
    L.append(label)

plt.savefig(test_samples_fig.replace('.png','_gt.png'),
            dpi=200, bbox_inches='tight')
plt.close('all')

These are the ground truth labels

Cycle through each sample image and use the model to estimate the label image. Compare the one-hot encoded versions of the ground truth and prediction by computing a per-sample IoU score.

IOU = []
plt.figure(figsize=(24,24))

for counter,f in enumerate(sample_filenames):
    image = seg_file2tensor(f, TARGET_SIZE)/255
    est_label = model.predict(tf.expand_dims(image, 0) , batch_size=1).squeeze()

    est_labelp = tf.argmax(est_label, axis=-1)
    l = tf.one_hot(tf.cast(L[counter], tf.uint8), 7) #9)
    iou = mean_iou_np(np.expand_dims(l.numpy(),0), np.expand_dims(est_label,0))

    plt.subplot(4,4,counter+1)
    name = sample_filenames[counter].split(os.sep)[-1].split('.jpg')[0]
    plt.title(name+' '+str(iou)[:5], fontsize=12)
    plt.imshow(image)

    plt.imshow(est_labelp, alpha=0.5, cmap=cmap, vmin=0, vmax=6) #8)

    plt.axis('off')
    del est_labelp

    IOU.append(iou)

plt.savefig(test_samples_fig,
            dpi=200, bbox_inches='tight')
plt.close('all')

These are the predictions:

As you can see, the model does well at delineating the landscape but doesn't always get the class prediction correct. IoU scores are fairly low, 0.2, 0.4. The mean is only 0.3. This, I would confidently predict, is due to a lack of data. Big neural networks like this are designed for more independent examples.

Next we'll show the workflow and performance of a model trained to predict based on DEM alone

DEM imagery

Model preparation

For 2D imagery, we'll choose the DEM (last) channel in the 4D TFRecord stacks. SO we'll use the same function as before except returning image[:,:,-1] as the dem

@tf.autograph.experimental.do_not_convert
def read_seg_tfrecord_dunes(example):
    features = {
        "image": tf.io.FixedLenFeature([], tf.string),  # tf.string = bytestring (not text string)
        "label": tf.io.FixedLenFeature([], tf.string),   # shape [] means scalar
    }
    # decode the TFRecord
    example = tf.io.parse_single_example(example, features)

    image = tf.image.decode_png(example['image'], channels=1)
    image = tf.cast(image, tf.float32)/ 255.0

    label = tf.image.decode_jpeg(example['label'], channels=1)
    label = tf.cast(label, tf.uint8)

    cond = tf.greater(label, tf.ones(tf.shape(label),dtype=tf.uint8)*6) #8)
    label = tf.where(cond, tf.ones(tf.shape(label),dtype=tf.uint8)*0, label)

    label = tf.one_hot(tf.cast(label, tf.uint8), 7) #9)
    label = tf.squeeze(label)

    return image[:,:,-1], label

Redefine the variables for new files to contain the 2D results

data_path= os.getcwd()+os.sep+"data/dunes"

filepath = os.getcwd()+os.sep+'results/dunes2d_8class_best_weights_model.h5'

hist_fig = os.getcwd()+os.sep+'results/dunes2d_8class_model.png'

filenames = sorted(tf.io.gfile.glob(data_path+os.sep+'dunes4d*.tfrec'))

nb_images = ims_per_shard * len(filenames)
print(nb_images)

split = int(len(filenames) * VALIDATION_SPLIT)

training_filenames = filenames[split:]
validation_filenames = filenames[:split]

validation_steps = int(nb_images // len(filenames) * len(validation_filenames)) // BATCH_SIZE
steps_per_epoch = int(nb_images // len(filenames) * len(training_filenames)) // BATCH_SIZE
train_ds = get_training_dataset()
val_ds = get_validation_dataset()

Model training

Notice that the model is compiled with the input size (TARGET_SIZE, TARGET_SIZE, 1) rather than (TARGET_SIZE, TARGET_SIZE, 3) as before. Everything else is the same

model = res_unet((TARGET_SIZE, TARGET_SIZE, 1), BATCH_SIZE, 'multiclass', nclasses)
#model.compile(optimizer = 'adam', loss = tf.keras.losses.CategoricalHinge(), metrics = [mean_iou])
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = [mean_iou])

earlystop = EarlyStopping(monitor="val_loss",
                              mode="min", patience=patience)

model_checkpoint = ModelCheckpoint(filepath, monitor='val_loss',
                                verbose=0, save_best_only=True, mode='min',
                                save_weights_only = True)

We've decreased the data size, so I'm inclined to increase the learning rate a little. We'll set the parameters and redefine lrfn

start_lr = 1e-4
min_lr = start_lr
max_lr = 1e-3
rampup_epochs = 5
sustain_epochs = 0
exp_decay = .9

def lrfn(epoch):
    def lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay):
        if epoch < rampup_epochs:
            lr = (max_lr - start_lr)/rampup_epochs * epoch + start_lr
        elif epoch < rampup_epochs + sustain_epochs:
            lr = max_lr
        else:
            lr = (max_lr - min_lr) * exp_decay**(epoch-rampup_epochs-sustain_epochs) + min_lr
        return lr
    return lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay)


lr_callback = tf.keras.callbacks.LearningRateScheduler(lambda epoch: lrfn(epoch), verbose=True)

Fir the model and plot the training history as before

callbacks = [model_checkpoint, earlystop, lr_callback]

model.fit(train_ds, steps_per_epoch=steps_per_epoch, epochs=MAX_EPOCHS,
                      validation_data=val_ds, validation_steps=validation_steps,
                      callbacks=callbacks)


history = model.fit(train_ds, steps_per_epoch=steps_per_epoch, epochs=MAX_EPOCHS,
                      validation_data=val_ds, validation_steps=validation_steps,
                      callbacks=callbacks)

plot_seg_history_iou(history, hist_fig)

plt.close('all')
K.clear_session()

Model evaluation

Evaluate the model in the same way as for the 3D imagery case

scores = model.evaluate(val_ds, steps=validation_steps)

print('loss={loss:0.4f}, Mean IOU={iou:0.4f}'.format(loss=scores[0], iou=scores[1]))

loss=0.5537, Mean IOU=0.9835
sample_data_path = os.getcwd()+os.sep+'data/dunes/dems/files'

test_samples_fig = os.getcwd()+os.sep+'dunes2d_sample_16class_est16samples.png'

sample_filenames = sorted(tf.io.gfile.glob(sample_data_path+os.sep+'*.jpg'))

IOU = []
plt.figure(figsize=(24,24))

for counter,f in enumerate(sample_filenames):
    image = seg_file2tensor(f, TARGET_SIZE)/255
    est_label = model.predict(tf.expand_dims(image, 0) , batch_size=1).squeeze()

    est_labelp = tf.argmax(est_label, axis=-1)
    l = tf.one_hot(tf.cast(L[counter], tf.uint8), 7) #9)
    iou = mean_iou_np(np.expand_dims(l.numpy(),0), np.expand_dims(est_label,0))

    plt.subplot(4,4,counter+1)
    name = sample_filenames[counter].split(os.sep)[-1].split('.jpg')[0]
    plt.title(name+' '+str(iou)[:5], fontsize=8)
    plt.imshow(image, cmap=plt.cm.gray)

    plt.imshow(est_labelp, alpha=0.5, cmap=cmap, vmin=0, vmax=6) #8)

    plt.axis('off')
    del est_labelp

    IOU.append(iou)

plt.savefig(test_samples_fig,
            dpi=200, bbox_inches='tight')
plt.close('all')

Model not performing well at all on DEM data alone. A mean IoU score of around 0.1. This isn't particularly surprising; elevation is a poor descriptor of these classes alone, since marsh and beach are the same elevation, and bare and vegetated established dunes are also similar elevations.

RGB + DEM imagery

By combining RGB and DEM information together, the hope is that the model can exploit classes such as incipient foredune and iceplant that have distinct elevation zones, and make better distinctions between the other classes that differ in elevation characteristics.

Model preparation

@tf.autograph.experimental.do_not_convert
def read_seg_tfrecord_dunes(example):
    features = {
        "image": tf.io.FixedLenFeature([], tf.string),  # tf.string = bytestring (not text string)
        "label": tf.io.FixedLenFeature([], tf.string),   # shape [] means scalar
    }
    # decode the TFRecord
    example = tf.io.parse_single_example(example, features)

    image = tf.image.decode_png(example['image'], channels=4)
    image = tf.cast(image, tf.float32)/ 255.0

    label = tf.image.decode_jpeg(example['label'], channels=1)
    label = tf.cast(label, tf.uint8)

    cond = tf.greater(label, tf.ones(tf.shape(label),dtype=tf.uint8)*6) #8)
    label = tf.where(cond, tf.ones(tf.shape(label),dtype=tf.uint8)*0, label)

    label = tf.one_hot(tf.cast(label, tf.uint8), 7) #9)
    label = tf.squeeze(label)

    return image, label
data_path= os.getcwd()+os.sep+"data/dunes"

filepath = os.getcwd()+os.sep+'results/dunes4d_8class_best_weights_model.h5'

hist_fig = os.getcwd()+os.sep+'results/dunes4d_8class_model.png'

filenames = sorted(tf.io.gfile.glob(data_path+os.sep+'dunes4d*.tfrec'))

nb_images = ims_per_shard * len(filenames)
print(nb_images)

split = int(len(filenames) * VALIDATION_SPLIT)

training_filenames = filenames[split:]
validation_filenames = filenames[:split]

validation_steps = int(nb_images // len(filenames) * len(validation_filenames)) // BATCH_SIZE
steps_per_epoch = int(nb_images // len(filenames) * len(training_filenames)) // BATCH_SIZE

train_ds = get_training_dataset()

val_ds = get_validation_dataset()

Another thing you could play with is the kernel size used in the convolutional layers of the UNet. Previously that was set to 3 by default. Below I increase that to 5, in the hope a larger receptive field will mean greater elevation-image covariation scales to be captured.

def res_unet(sz, f, flag, nclasses=1):
    inputs = tf.keras.layers.Input(sz)

    ## downsample
    e1 = bottleneck_block(inputs, f, kernel_size=(5, 5)); f = int(f*2)
    e2 = res_block(e1, f, strides=2, kernel_size=(5, 5)); f = int(f*2)
    e3 = res_block(e2, f, strides=2, kernel_size=(5, 5)); f = int(f*2)
    e4 = res_block(e3, f, strides=2, kernel_size=(5, 5)); f = int(f*2)
    _ = res_block(e4, f, strides=2, kernel_size=(5, 5))

    ## bottleneck
    b0 = conv_block(_, f, strides=1)
    _ = conv_block(b0, f, strides=1)

    ## upsample
    _ = upsamp_concat_block(_, e4)
    _ = res_block(_, f, kernel_size=(5, 5)); f = int(f/2)

    _ = upsamp_concat_block(_, e3)
    _ = res_block(_, f, kernel_size=(5, 5)); f = int(f/2)

    _ = upsamp_concat_block(_, e2)
    _ = res_block(_, f, kernel_size=(5, 5)); f = int(f/2)

    _ = upsamp_concat_block(_, e1)
    _ = res_block(_, f, kernel_size=(5, 5))

    ## classify
    if flag is 'binary':
        outputs = tf.keras.layers.Conv2D(nclasses, (1, 1), padding="same", activation="sigmoid")(_)
    else:
        outputs = tf.keras.layers.Conv2D(nclasses, (1, 1), padding="same", activation="softmax")(_)

    #model creation
    model = tf.keras.models.Model(inputs=[inputs], outputs=[outputs])
    return model

Model training

Everything the same as before except the (TARGET_SIZE, TARGET_SIZE, 4) indicating a 4th input dimension

model = res_unet((TARGET_SIZE, TARGET_SIZE, 4), BATCH_SIZE, 'multiclass', nclasses)
# model.compile(optimizer = 'adam', loss = tf.keras.losses.CategoricalHinge(), metrics = [mean_iou])
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = [mean_iou])

earlystop = EarlyStopping(monitor="val_loss",
                              mode="min", patience=patience)

model_checkpoint = ModelCheckpoint(filepath, monitor='val_loss',
                                verbose=0, save_best_only=True, mode='min',
                                save_weights_only = True)

Increase learning rate (again, we're just simulating things you could change rather than necessarily be the optimal hyperparameters)

start_lr = 1e-6 #0.00001
min_lr = start_lr
max_lr = 1e-3
rampup_epochs = 5
sustain_epochs = 0
exp_decay = .9

def lrfn(epoch):
    def lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay):
        if epoch < rampup_epochs:
            lr = (max_lr - start_lr)/rampup_epochs * epoch + start_lr
        elif epoch < rampup_epochs + sustain_epochs:
            lr = max_lr
        else:
            lr = (max_lr - min_lr) * exp_decay**(epoch-rampup_epochs-sustain_epochs) + min_lr
        return lr
    return lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay)


lr_callback = tf.keras.callbacks.LearningRateScheduler(lambda epoch: lrfn(epoch), verbose=True)

callbacks = [model_checkpoint, earlystop, lr_callback]

Fit the model

#warm start
model.fit(train_ds, steps_per_epoch=steps_per_epoch, epochs=MAX_EPOCHS,
                      validation_data=val_ds, validation_steps=validation_steps,
                      callbacks=callbacks)

history = model.fit(train_ds, steps_per_epoch=steps_per_epoch, epochs=MAX_EPOCHS,
                      validation_data=val_ds, validation_steps=validation_steps,
                      callbacks=callbacks)

plot_seg_history_iou(history, hist_fig)

plt.close('all')
K.clear_session()

Model evaluation

Evaluate the same way as previously

scores = model.evaluate(val_ds, steps=validation_steps)

print('loss={loss:0.4f}, Mean IOU={iou:0.4f}'.format(loss=scores[0], iou=scores[1]))

loss=0.2936, Mean IOU=0.9748

Almost identical to the 3D example

sample_data_path = os.getcwd()+os.sep+'data/dunes/images/files'

test_samples_fig = os.getcwd()+os.sep+'dunes4d_sample_16class_est16samples.png'

sample_filenames = sorted(tf.io.gfile.glob(sample_data_path+os.sep+'*.jpg'))
IOU = []
plt.figure(figsize=(24,24))

for counter,f in enumerate(sample_filenames):
    image = seg_file2tensor(f, TARGET_SIZE)/255

    dem = seg_file2tensor(f.replace('images','dems').replace('ortho','dem'), TARGET_SIZE)/255

    merged = np.dstack((image.numpy(), dem.numpy()[:,:,0]))

    est_label = model.predict(tf.expand_dims(merged, 0) , batch_size=1).squeeze()

    l = tf.one_hot(tf.cast(L[counter], tf.uint8), 7) #9)
    iou = mean_iou_np(np.expand_dims(l.numpy(),0), np.expand_dims(est_label,0))

    est_label = tf.argmax(est_label, axis=-1)

    plt.subplot(4,4,counter+1)
    name = sample_filenames[counter].split(os.sep)[-1].split('.jpg')[0]
    plt.title(name, fontsize=10)
    plt.imshow(dem, cmap=plt.cm.gray)

    plt.imshow(est_label, alpha=0.5, cmap=cmap, vmin=0, vmax=6) #8)

    plt.axis('off')

    IOU.append(iou)

# plt.show()
plt.savefig(test_samples_fig,
            dpi=200, bbox_inches='tight')
plt.close('all')

Again, only an IOU of 0.32 - a marginal improvement over the 3D data. But overall I conclude that 1) you can use 2D, 3D, or 4D imagery with a U-Net and get a reasonable segmentation, however 2) I hypothesize that this workflow requires much more data. I achieved similar results with 8 classes.

From 2 geotiffs to a trained U-Net: 2D, 3D, and 4D imagery example. Part 1: Data

October 23, 2020

Dan Buscombe, Jonathan Warrick, Andy Ritchie (Pacific Coastal & Marine Science Center, Santa Cruz)

In this blog post, we present an end-to-end image segmetation workflow, using a small dataset, consisting of both imagery and digital elevation model (DEM). We are exploring a few goals:

  1. Can we use high-resolution RGB imagery with a U-Net model to segment sand dune types and forms
  2. Can we do so using a small dataset, using data augmentation?
  3. What creates the best prediction; RGB imagery alone? 2D elevation alone? Or the 4D combination of RGB and elevation?

This is a long workflow, so this blog post is split into two: data, then models. This post, 'Data', covers the following workflows:

  1. Split large geoTIFF orthomosaics and DEMs into smaller tiles, using python and GDAL, for labelling and segmentation. This will create 9-bit RGB imagery and coincident 8-bit elevation data. Elevation can be negative, and is usually measured on a continuous scale, but we convert elevation values to positive relative discrete values. This relative elevation raster might be combined with the 3D imagery to create 4D raster stacks
  2. Convert imagery to jpeg format
  3. Use dash_doodler to semi-supervised image segmentation into 6 classes that relate to dune types and cover, and adjacent land cover.
  4. The dataset is small - only 16 images, each only 613x613 pixels. Ordinarily for a deep learning project you would need MUCH more data. So, for this example we will use data augmentation to expand the dataset. In your adaptations of this workflow, you shouldn't rely so much on image augmentation and have more labelled examples. We use augmentation to create 64 new examples, each 608x608 pixels, which is the closest size to 613x613 pixels that will work with the U-Net model
  5. TFRecords are created from the 3D (RGB) imagery and the 2D (greyscale) labels
  6. TFRecords are created by combining 2D DEMs with 3D RGB imagery, with 2D (greyscale) labels

Dataset

The dataset consists of a 50-cm orthomosaic and a 50-cm digital elevation model (DEM) of sand dunes in central Monterey Bay. These data were collected and processed by Jon Warrick and Andy Ritchie at the U.S Geological Survey Pacific Coastal & Marine Science Center, Santa Cruz, using Structure-from-Motion photogrammetry.

The GeoTiFF image (below, left) show incipient dunes (Northern, at the river mouth), established dunes without iceplant (Central), established dunes with iceplant (Southern). The iceplant is a non-native and has been weeded from the central portion by non-profits. You can see the iceplant in the southern section by the 'red' patches. There is a distinct difference in elevation in the incipient dunes and the established dunes (below, right).

Prepare dataset for ML labeling/training using python/GDAL

Load the libraries we need. We'll be using mostly system calls (from os.system) to GDAL libraries installed, such that the commands gdal_translate and gdal_calc.py are accessible. GDAL is cross-platform (install instructions)

import os
from glob import glob

Create RGB and DEM tiles

Define some inputs; the size of the tiles we would like to create, the files we'd like to tile, and their heights and widths

tilesize = 613

bigfiles = ['MontBay_Dunes_Ortho_50cm.tif', 'MontBay_Dunes_DEM_50cm.tif']
widths = [1226, 1226]
heights = [4930, 4930]

Create tiled geotiffs by calling gdal_translate as a system command, within a nested for-loop that first cycles through files, then position on a grid in x, then position on a grid in y, just applying the window (srcwin) flag

for k in range(len(bigfiles)):

    for i in range(0, widths[k], tilesize):
        for j in range(0, heights[k], tilesize):
            if 'Ortho' in bigfiles[k]:
                gdaltranString = "gdal_translate -of GTIFF -srcwin "+str(i)+", "+str(j)+", "+str(tilesize)+", " \
                +str(tilesize)+" "+bigfiles[k]+" ortho_"+str(i)+"_"+str(j)+".tif"
            else:
                gdaltranString = "gdal_translate -of GTIFF -srcwin "+str(i)+", "+str(j)+", "+str(tilesize)+", " \
                +str(tilesize)+" "+bigfiles[k]+" dem_"+str(i)+"_"+str(j)+".tif"
            os.system(gdaltranString)

These images are at the edge of the raster and are thus smaller than the rest. We'll delete them

os.system('rm dem*0_4904.tif')
os.system('rm ortho*0_4904.tif')

We need to add a scalar to every value in each DEM tile, to avoid negative numbers (negative elevation umbers occur in coastal data due to datum specifications). In this example the absolute elevation values are not important, only relative ones, because eventually the DEM will be discretized as an 8-bit raster with values ranging from 0 to 255. To ensure all negative numbers are not encoded 0, 40m is sufficient in this case.

voffset = 40 #vertical offset to avoid negative numbers

files = glob('dem*.tif')

for file in files:
    gdalstring = 'gdal_calc.py -A '+file+' --outfile='+file.replace('.tif', '_a0.tif')+'  --calc="(A+'+str(voffset)+')" --NoDataValue=0'
    os.system(gdalstring)

Next we'll ensure no remaining negative numbers (setting them to zero) and convert to 16-bit. In this case there are no negative numbers left, but this would be a generic way to enforce non-negativity

files = glob('dem*_a0.tif')

for file in files:
    gdalstring = 'gdal_calc.py -A '+file+' --outfile='+file.replace('_a0.tif', '_a00.tif')+'  --calc="A*(A>0)" --NoDataValue=0 --type=Int16'
    os.system(gdalstring)

The full range of elevation in the data set is 7 to 36 m. The next command will simultaneously convert the image to BYTE or 8-bit unsigned integer, which is the format we need going forward, and scale to the full range (0 to 255). And, it will convert from tiff to jpg format

files = glob('dem*_a00.tif')

for file in files:
    gdalstring = 'gdal_translate -of GTIFF -ot BYTE -scale 7 36 0 255 '+file+' '+file.replace('_a00.tif', '_a000.jpg')
    os.system(gdalstring)

Remove those intermediate files; we no longer need them

os.system('rm dem*a0.tif')
os.system('rm dem*a00.tif')

Convert each RGB image from tiff to jpg format. Actually these files won't be in jpeg encoding yet, only have the extension and bit-depth. We will correct the encoding later

files = glob('ortho*.tif')

for file in files:
    gdalstring = 'gdal_translate -of GTIFF -ot BYTE '+file+' '+file.replace('.tif', '_a000.jpg')
    os.system(gdalstring)

Then run the following bash command to enforce jpeg encoding for all the images, optionally writing new versions by profided a suffix e.g _c in front of the file extension (.jpg), such as here

for file in *.jpg; do convert $file $"${file%.jpg}_c.jpg"; done

Labeling images

We'll use the new protoype "dash doodler" tool that you can find here for rapidly annotating these 16 images. I could label the RGB imagery in doodler, but it isn't designed to accept more than 3-band imagery, and that format wouldn't be helpful (we'll see later on that RGB+DEM imagery just looks like the RGB image with the high parts whited out - not helpful, generally)

These are the 6 classes I'm interested in, written to classes.txt :

water
marsh
beach
incipient_foredune
established_dune
iceplant

Here is a video of me annotating all 16 images (actually this is a previous version where I labeled 8 classes - but you get the idea):

Doodler creates RGB and greyscale label images. RGB images are easier to see and evaluate, and they can be used to create TFRecords and segmentation workflows, like we saw in the OBX data example, but ultimately you have to decode the RGB image and that isn't necessarily easy to do. In fact that becomes hard to do when you have many classes and encode your labels as jpegs. The former because you have to use colors that tend to be increasingly similar the more classes you have, and compression and chroma subsampling used in jpeg creation isn't ideal at preserving those color values, so it becomes difficult to decode the labels. Jpeg compression isn't generally a problem for simple greyscale integer label images. That is why doodler creates greyscale images, and greyscale label imagery is what we will use going forward.

Image augmentation

We'll create a few more examples to create enough data for demonstration purposes. Ordinarily you would try to label much more data

Define image dimensions. The images are actually 613 x 613 pixels, but that won't fit within the U-Net framework without modification, so we'll resize to the nearest dimension that will, which is 608 x 608 pixels. Within the mlmondays conda environment,

from imports import *
NY = 608
NX = 608

Define two ImageDataGenerator instances with the same arguments, one for images and the other for labels (masks)

data_gen_args = dict(featurewise_center=False,
                     featurewise_std_normalization=False,
                     rotation_range=2,
                     width_shift_range=0.1,
                     height_shift_range=0.1,
                     zoom_range=0.1)
image_datagen = tf.keras.preprocessing.image.ImageDataGenerator(**data_gen_args)
dem_datagen = tf.keras.preprocessing.image.ImageDataGenerator(**data_gen_args)
mask_datagen = tf.keras.preprocessing.image.ImageDataGenerator(**data_gen_args)

Put your images somewhere and define the labels

imdir = 'data/dunes/images'

dem_path = 'data/dunes/dems'

lab_path = 'data/dunes/labels'

Note that you'll have to actually place your images inside another folder inside these respective directories, and it doesn't matter what that folder is called. The reason is that ImageDataGenerator expects imagery to be in class folders, even though in this case we are not using that functionality

Next, you'll need a function to write your images to file. We'll also use a median filter with a disk kernel (you'll need to install scikit-image to use these commands: conda install scikit-image)

from imageio import imwrite
from skimage.filters.rank import median
from skimage.morphology import disk

You likely can't read all your images into memory if you did augmentation on a real-sized dataset (i.e. bigger), so you'll likely have to do this in batches. Therefore we present the batched workflow.

The following loop will grab a new random batch of images, apply the augmentation generator to the images to create alternate versions, then writes those alternate versions of images and labels to disk. We use class_mode=None because the images don't belong to any particular class. The list L is for enumerating all the unique values of each label image, to check that no integers outside the class range are present.

L = []
i=1
for k in range(4): #create 4 more sets ....

    #set a different seed each time to get a new batch of ten
    seed = int(np.random.randint(0,100,size=1))
    img_generator = image_datagen.flow_from_directory(
            imdir,
            target_size=(NX, NY),
            batch_size=16, ## .... of 16 images ...
            class_mode=None, seed=seed, shuffle=True)

    dem_generator = image_datagen.flow_from_directory(
            dem_path,
            target_size=(NX, NY),
            batch_size=16, ## .... of 16 images ...
            class_mode=None, seed=seed, shuffle=True)

    #the seed must be the same as for the training set to get the same images
    mask_generator = mask_datagen.flow_from_directory(
            lab_path,
            target_size=(NX, NY),
            batch_size=16,  ## .... and of 16 labels
            class_mode=None, seed=seed, shuffle=True)

    #The following merges the 3 generators (and their flows) together:
    train_generator = (pair for pair in zip(img_generator, dem_generator, mask_generator))

    #grab a batch of images, dems, and label images
    x, y, z = next(train_generator)

    # write them to file and increment the counter
    for im,dem,lab in zip(x,y,z):
        imwrite(imdir+os.sep+'augimage_000'+str(i)+'.jpg', im)
        imwrite(dem_path+os.sep+'augdem_000'+str(i)+'.jpg', dem)

        lab = median(lab[:,:,0]/255.0, disk(3)).astype(np.uint8)
        lab[im[:,:,0]==0]=0        
        L.append(np.unique(lab.flatten()))
        imwrite(lab_path+os.sep+'auglabel_000'+str(i)+'_label.jpg',(lab).astype(np.uint8))
        i += 1
        print(lab.shape)

    #save memory
    del x, y, im, dem, lab
    #get a new batch

when this completes you should have 64 new images, dems and labels, that look something like this.

Images:

DEMs:

Labels:

What unique values do we have in our augmented imagery?

print(np.round(np.unique(np.hstack(L))))

[0 1 2 3 4 5 6 7 8]

Great. Zero for null, and our 1 through 6 classes

4D image creation

Now you have 64 images, dems and labels total. Next, we need to merge the images and dems together into 4-band images

But how do you make a 4-band image from a 3-band image and a 1-band image?

First, import libraries

from imageio import imread
import numpy as np

Read RGB image

im1 = imread('data/dunes/images/augimage_0001.jpg')

Read elevation and get just the first band

im2 = imread('data/dunes/dems/augdem_0001.jpg')[:,:,0]

Merge bands - creates a numpy array with 4 channels

merged = np.dstack((im1, im2))

Make this figure:

using this code:

import matplotlib.pyplot as plt
plt.subplot(131)
plt.imshow(im1); plt.axis('off'); plt.title('RGB')
plt.subplot(132)
plt.imshow(im2); plt.axis('off'); plt.title('DEM')
plt.subplot(133)
plt.imshow(merged); plt.axis('off'); plt.title('4D')
plt.savefig('4d.png', dpi=200, bbox_inches='tight'); plt.close('all')

TFRecord creation

Getting set up

With the mlmondays conda environment enabled, import some libraries:

from imports import *
from imageio import imwrite

Define our three paths:

imdir = '/media/marda/TWOTB/USGS/SOFTWARE/Projects/DL-CDI2020/3_ImageSeg/data/dunes/images'
demdir = '/media/marda/TWOTB/USGS/SOFTWARE/Projects/DL-CDI2020/3_ImageSeg/data/dunes/dems'
labdir = '/media/marda/TWOTB/USGS/SOFTWARE/Projects/DL-CDI2020/3_ImageSeg/data/dunes/labels'

And this path for the tfrecord files

tfrecord_dir = '/media/marda/TWOTB/USGS/SOFTWARE/Projects/DL-CDI2020/3_ImageSeg/data/dunes'

Define images per shard (12, so there will be SHARD=8 tfrecord files, since there are 48 images) and shard sizes

nb_images=len(tf.io.gfile.glob(imdir+os.sep+'*.jpg'))

ims_per_shard = 8

SHARDS = int(nb_images / ims_per_shard) + (1 if nb_images % ims_per_shard != 0 else 0)

shared_size = int(np.ceil(1.0 * nb_images / SHARDS))

List the images:

img_files = tf.io.gfile.glob(imdir+os.sep+'*.jpg')

Encode back to jpeg for efficient binary string storage

def recompress_seg_image(image, label):

    image = tf.cast(image, tf.uint8)
    image = tf.image.encode_jpeg(image, optimize_size=False, chroma_downsampling=False, quality=100)

    label = tf.cast(label, tf.uint8)
    label = tf.image.encode_jpeg(label, optimize_size=False, chroma_downsampling=False, quality=100)
    return image, label

3D workflow (R, G, B)

This function reads an image

def read_seg_image_and_label_dunes(img_path):
    bits = tf.io.read_file(img_path)
    image = tf.image.decode_jpeg(bits)

    dem_path = tf.strings.regex_replace(img_path, "images", "dems")
    dem_path = tf.strings.regex_replace(dem_path, "augimage", "augdem")     
    bits = tf.io.read_file(dem_path)
    dem = tf.image.decode_jpeg(bits)

    # have to use this tf.strings.regex_replace utility because img_path is a Tensor object
    lab_path = tf.strings.regex_replace(img_path, "images", "labels")
    lab_path = tf.strings.regex_replace(lab_path, "augimage", "auglabel")
    lab_path = tf.strings.regex_replace(lab_path, ".jpg", "_label.jpg")      
    bits = tf.io.read_file(lab_path)
    label = tf.image.decode_jpeg(bits, channels=1)

    label = tf.cast(label, tf.uint8)

    ## merge dem and image
    if flag is '3d':
      merged = tf.stack([image[:,:,0], image[:,:,1], image[:,:,2]], axis=2)
    elif flag is '3db':
      merged = tf.stack([image[:,:,0], image[:,:,2], dem[:,:,0]], axis=2)
    else:
      merged = tf.stack([image[:,:,0], image[:,:,1], image[:,:,2], dem[:,:,0]], axis=2)

    merged = tf.cast(merged, tf.uint8)

    return merged, label
flag='3d'
dataset = tf.data.Dataset.list_files(imdir+os.sep+'*.jpg', seed=10000) # This also shuffles the images
dataset = dataset.map(read_seg_image_and_label_dunes)
dataset = dataset.map(recompress_seg_image, num_parallel_calls=AUTO)
dataset = dataset.batch(shared_size)

Now we can write our 3D dataset out to TFRecords:

for shard, (image, label) in enumerate(dataset):
  shard_size = image.numpy().shape[0]
  filename = tfrecord_dir+os.sep+"dunes3d-" + "{:02d}-{}.tfrec".format(shard, shard_size)

  with tf.io.TFRecordWriter(filename) as out_file:
    for i in range(shard_size):
      example = to_seg_tfrecord(image.numpy()[i],label.numpy()[i])
      out_file.write(example.SerializeToString())
    print("Wrote file {} containing {} records".format(filename, shard_size))

4D workflow (R, G, B, DEM)

JPEGs can be either 1 or 4 bands, not 4 bands. So, we use a png encoder this time, which means we'll have to use a png decoder when we read the data in later for model training

def recompress_seg_image4d(image, label):

    image = tf.cast(image, tf.uint8)
    image = tf.image.encode_png(image)

    label = tf.cast(label, tf.uint8)
    label = tf.image.encode_png(label)
    return image, label
flag='4d'
dataset = tf.data.Dataset.list_files(imdir+os.sep+'*.jpg', seed=10000) # This also shuffles the images
dataset = dataset.map(read_seg_image_and_label_dunes)
dataset = dataset.map(recompress_seg_image4d, num_parallel_calls=AUTO)
dataset = dataset.batch(shared_size)

Now we can write our 4D dataset out to TFRecords:

for shard, (image, label) in enumerate(dataset):
  shard_size = image.numpy().shape[0]
  filename = tfrecord_dir+os.sep+"dunes4d-" + "{:02d}-{}.tfrec".format(shard, shard_size)

  with tf.io.TFRecordWriter(filename) as out_file:
    for i in range(shard_size):
      example = to_seg_tfrecord(image.numpy()[i],label.numpy()[i])
      out_file.write(example.SerializeToString())
    print("Wrote file {} containing {} records".format(filename, shard_size))

Converting makesense.ai JSON labels to label (mask) imagery for an image segmentation project

October 14, 2020

Dan Buscombe

Annotate images on makesense.ai

makesense.ai is pretty great and the tool I generally recommend for labeling images because it:

  • works well and has a well designed interface
  • is free and open source
  • requires no account or uploading of data.

Press 'Get started'

Load images (in my example, I am using two pictures of coins and other objects on sand). Select 'object detection' which will give you the point, line, box, and polygon toolsets

Create a list of labels (mine are 'coin', 'sand' and 'other')

Use the polygon tool to start delineating the scene, and select the label from the drop down list for each annotation

Image two (these sorts of scenes are tricky to label because sand is really a background class)

Actions > Export annotations

Export in VGG JSON format in the polygon category

This is what your JSON format file looks like

Let's read it into python and convert it into a label mask

Create label images

Load the libraries we need

import json, os, glob
from PIL import Image, ImageDraw
import numpy as np

Define a class dictionary that allows for mapping of class string names to integers. Avoid zero - that is usually reserved for null/background for a binary segmentation. 'Other' is different in this context (rulers, and other things in the scene)

class_dict = {'coin':1, 'sand':2, 'other':3}

Load the contents of the VGG JSON file downloaded from makesense.ai into the dictionary, all_labels

json_file = 'labels_my-project-name_2020-10-15-03-40-44.json'
all_labels = json.load(open(json_file))

The keys of the dictionary are the image filenames

print(all_labels.keys())

And these are the quantities defined for each image

rawfile = '20181223_133712.jpg'

print(all_labels[rawfile].keys())

This function will strip image coordinates (X and Y) of polygons, and associated class labels (L) from

def get_data(data):
    X = []; Y = []; L=[] #pre-allocate lists to fill in a for loop
    for k in data['regions']: #cycle through each polygon
        # get the x and y points from the dictionary
        X.append(data['regions'][k]['shape_attributes']['all_points_x'])
        Y.append(data['regions'][k]['shape_attributes']['all_points_y'])
        L.append(data['regions'][k]['region_attributes']['label'])
    return Y,X,L #image coordinates are flipped relative to json coordinates

Use it to extract the polygons from the first image:

X, Y, L = get_data(all_labels[rawfile])

Open an image to get its dimensions:

image = Image.open(rawfile)

nx, ny, nz = np.shape(image)

Next we need a function that will create a label image from the polygon vector data (coordinates and labels)

def get_mask(X, Y, nx, ny, L, class_dict):
    # get the dimensions of the image
    mask = np.zeros((nx,ny))

    for y,x,l in zip(X,Y,L):
        # the ImageDraw.Draw().polygon function we will use to create the mask
        # requires the x's and y's are interweaved, which is what the following
        # one-liner does
        polygon = np.vstack((x,y)).reshape((-1,),order='F').tolist()

        # create a mask image of the right size and infill according to the polygon
        if nx>ny:
           x,y = y,x
           img = Image.new('L', (nx, ny), 0)
        elif ny>nx:
           #x,y = y,x
           img = Image.new('L', (ny, nx), 0)
        else:
           img = Image.new('L', (nx, ny), 0)
        ImageDraw.Draw(img).polygon(polygon, outline=0, fill=1)
        # turn into a numpy array
        m = np.flipud(np.rot90(np.array(img)))
        try:
            mask[m==1] = class_dict[l]
        except:
            mask[m.T==1] = class_dict[l]  

    return mask

Apply it to get the label mask for the first image

mask = get_mask(X, Y, nx, ny, L, class_dict)

Next we'll define a function that we rescale our integer codes into 8-bit integer codes that span the full range. This 8-bit scaling will facilitate creation of label images that can be viewed using ordinary operating system image viewer software

def rescale(dat,mn,mx):
    '''
    rescales an input dat between mn and mx
    '''
    m = min(dat.flatten())
    M = max(dat.flatten())
    return (mx-mn)*(dat-m)/(M-m)+mn

Rescale the mask and convert it into a greyscale Image object, then save to file

mask = Image.fromarray(rescale(mask,0,255)).convert('L')

mask.save(rawfile.replace('imagery','labels'), format='PNG')

Loop through several files at once:

for rawfile in all_labels.keys():

    X, Y, L = get_data(all_labels[rawfile])

    image = Image.open(rawfile)

    nx, ny, nz = np.shape(image)

    mask = get_mask(X, Y, nx, ny, L, class_dict)

    mask = Image.fromarray(rescale(mask,0,255)).convert('L')

    mask.save(rawfile.replace('.jpg','_label.jpg'), format='PNG')    

Here are the label images:

Clearly, I'm a careless labeller. How could you make these labels better? Read on ...

Refine label images with a CRF

A CRF is a model that we will introduce and use in Week 3 and is useful for pre-processing manual labels, such as here, or post-processing model estimates.

It works by examining the label in each pixel of the label image, and assessing the likelihood of it, given the distribution of image values that it observes in the same and other classes in the scene. It is a probabilistic assessment based on both image features that it extracts, append

A CRF is not a deep learning model, or a neural network at all, but it is a network-based (or so-called graphical model). You can read more about it in this paper, where it was used as a post-processing rather than a pre-processing step.

These are the extra python libraries we need (within the mlmondays conda environment)

import pydensecrf.densecrf as dcrf
from pydensecrf.utils import create_pairwise_bilateral, unary_from_labels

Next we define a function that will use the CRF to process the label with respect to the image, and provide a new refined label

def crf_refine(label, img):
    """
    "crf_refine(label, img)"
    This function refines a label image based on an input label image and the associated image
    Uses a conditional random field algorithm using spatial and image features
    INPUTS:
        * label [ndarray]: label image 2D matrix of integers
        * image [ndarray]: image 3D matrix of integers
    OPTIONAL INPUTS: None
    GLOBAL INPUTS: None
    OUTPUTS: label [ndarray]: label image 2D matrix of integers
    """
    H = label.shape[0]
    W = label.shape[1]
    U = unary_from_labels(label,1+len(np.unique(label)),gt_prob=0.51)
    d = dcrf.DenseCRF2D(H, W, 1+len(np.unique(label)))
    d.setUnaryEnergy(U)

    # to add the color-independent term, where features are the locations only:
    d.addPairwiseGaussian(sxy=(3, 3),
                 compat=3,
                 kernel=dcrf.DIAG_KERNEL,
                 normalization=dcrf.NORMALIZE_SYMMETRIC)
    feats = create_pairwise_bilateral(
                          sdims=(100, 100),
                          schan=(2,2,2),
                          img=img,
                          chdim=2)

    d.addPairwiseEnergy(feats, compat=120,kernel=dcrf.DIAG_KERNEL,normalization=dcrf.NORMALIZE_SYMMETRIC)
    Q = d.inference(10)
    return np.argmax(Q, axis=0).reshape((H, W)).astype(np.uint8)

Now we modify the get_mask function from before with the post-processing step


def get_mask_crf(X, Y, nx, ny, L, class_dict, image):
    # get the dimensions of the image
    mask = np.zeros((nx,ny))

    for y,x,l in zip(X,Y,L):
        # the ImageDraw.Draw().polygon function we will use to create the mask
        # requires the x's and y's are interweaved, which is what the following
        # one-liner does
        polygon = np.vstack((x,y)).reshape((-1,),order='F').tolist()

        # create a mask image of the right size and infill according to the polygon
        if nx>ny:
           x,y = y,x
           img = Image.new('L', (nx, ny), 0)
        elif ny>nx:
           #x,y = y,x
           img = Image.new('L', (ny, nx), 0)
        else:
           img = Image.new('L', (nx, ny), 0)
        ImageDraw.Draw(img).polygon(polygon, outline=0, fill=1)
        # turn into a numpy array
        m = np.flipud(np.rot90(np.array(img)))
        try:
            mask[m==1] = class_dict[l]
        except:
            mask[m.T==1] = class_dict[l]

    mask = crf_refine(np.array(mask, dtype=np.int), np.array(image, dtype=np.uint8))

    return mask

And use a similar loop as before to apply this CRF processing

for rawfile in all_labels.keys():

    X, Y, L = get_data(all_labels[rawfile])

    image = Image.open(rawfile)

    nx, ny, nz = np.shape(image)

    mask = get_mask_crf(X, Y, nx, ny, L, class_dict, image)

    mask = Image.fromarray(rescale(mask/255,0,255)).convert('L')

    mask.save(rawfile.replace('.jpg','_label_crf.jpg'), format='PNG')

Here are the CRF-refined label images. Now there is no black (0) background class. The black (0) class is class 1; class 2 is 127; and class 3 is 255.

Preparing a new dataset for an object recognition project

October 11, 2020

Dan Buscombe

Formatting your own data into a format that you can use in your data processing pipeline is arguably the most difficult aspect of this largely self-guided course.

The workflows I have provided each week to create tfrecords can be adapted to many different datasets, but will require some work on your part, using my provided examples and these blog posts. I don't know of any package that helps you out for every case. Perhaps I should write one! Perhaps you should!

We're going to:

  • download some data from the internet
  • convert the labels data into a format we can use
  • write all the data to one big TFrecord file
  • write the data out in chunks to TFRecord shards

This workflow is also provided in the following script: 2_ObjRecog/conservationdrones_make_tfrecords.py

Convert images and bounding boxes into TFrecords

Prepare your dataset as images and corresponding labels

To create an example, I headed over the excellent Labeled Information Library of Alexandria: Biology and Conservation and chose the Conservation Drones dataset. Specifically, I downloaded TrainReal.zip. Each of these folders contains folders for the annotation .csv files for each video (annotations) and the individual .jpg frames in each video (images).

We're not afraid of real-world examples in ML-Mondays - I chose a particularly difficult computer vision problem. We'll see why ...

I unzipped the TrainReal data to a folder on my computer called /media/marda/TWOTB/USGS/DATA/TrainReal, which contains images (jpegs) and labels (annotations in csv format). Here is an example image. Notice it is infrared and therefore greyscale. That's ok- the models we use can cope with single-banded inputs

The associated label data for this image is

142 2   276 243 11  12  0   2   0   0
142 4   204 260 16  13  0   2   0   0
142 5   266 246 11  10  0   2   0   0
142 108 424 136 11  12  0   2   0   0
142 114 430 101 17  21  0   2   0   0
142 115 429 121 11  11  0   2   0   0

This is the MOT annotation format, with the following columns:

[frame_number], [object_id], [x], [y], [w], [h], [class], [species], [occlusion], [noise]
  • class: 0 if animals, 1 if humans
  • species: -1: unknown, 0: human, 1: elephant, 2: lion, 3: giraffe, 4: dog, 5: crocodile, 6: hippo, 7: zebra, 8: rhino. 3 and 4 occur only in real data. 5, 6, 7, 8 occur only in synthetic data.
  • occlusion: 0 if there is no occlusion, 1 if there is an occlusion (i.e., either occluding or occluded) (note: intersection over union threshold of 0.3 used to assign * occlusion; more details in paper)
  • noise: 0 if there is no noise, 1 if there is noise (note: noise labels were interpolated from object locations in previous and next frames; for more than 4 consecutive frames without labels, no noise labels were included; more details in paper)

So, based on this info, we have 6 lions in the scene, each only 10-20 pixels in size

Converting between label formats

The first thing we need to do is convert this csv format into one of

[filename], [width], [height], [class], [xmin], [ymin], [xmax], [ymax]

which is perhaps slightly more standard for deep learning workflows. You could do this manually in excel, by using w and h to compute xmax and ymax, then convert the frame_number into a filename. In this case we'll use species as class.

Honestly, every dataset has its foibles and you have to wrangle the data into one form or another, so get used to it! The python library pandas can help a lot in these situations. I won't lie - this is tricky - as I said before, the data part of any data modeling project is just as hard - if not more so - than the 'modeling' part.

Combining all csv files into one

These are the libraries we'll need:

import pandas as pd
import numpy as np
from glob import glob
import os

This is the top level directory where all the annotation csv files are

csv_folder = '/media/marda/TWOTB/USGS/DATA/TrainReal/annotations'

First we define empty lists to collate image file names, and to concatenate all label data into one list

all_label_data = []; files = []

We cycle through each csv file, read it in using pandas, and append it to all_label_data. Next we get the filename of the image that id (column zero) corresponds to. This is not an easy file naming convention to deal with ... all strings are forced to have the same length.

for f in csv_files:
    dat = np.array(pd.read_csv(f))
    all_label_data.append(dat)
    # get the file name root
    tmp = f.replace('annotations', 'images').replace('.csv','')
    # construct filenames for each annotation
    for i in dat[:,0]:
       if i<10:
          files.append(tmp+os.sep+tmp.split(os.sep)[-1]+'_000000000'+str(i)+'.jpg')
       elif i<100:
          files.append(tmp+os.sep+tmp.split(os.sep)[-1]+'_00000000'+str(i)+'.jpg')
       elif i<1000:
          files.append(tmp+os.sep+tmp.split(os.sep)[-1]+'_0000000'+str(i)+'.jpg')
       elif i<10000:
          files.append(tmp+os.sep+tmp.split(os.sep)[-1]+'_000000'+str(i)+'.jpg')
       elif i<100000:
          files.append(tmp+os.sep+tmp.split(os.sep)[-1]+'_00000'+str(i)+'.jpg')

We use numpy's vstack to concatenate the list of lists into a numpy array with the correct shape (samples x columns)

all_label_data = np.vstack(all_label_data)
files = np.vstack(files).squeeze()

print(all_label_data.shape)
# 87167 annotations, 10 columns
print(files.shape)
# 87167 filenames, 1 column

We have converted all the ids to filenames already, next we need to make xmaxs ymaxs

xmax = all_label_data[:,2] + all_label_data[:,4] #xmin + width
ymax = all_label_data[:,3] + all_label_data[:,5] #ymin + height

Next we map the integers to strings - strings are better in general than integers for class identification

# list of integers
classes = all_label_data[:,7]
# mapping from integers to strings
class_dict = {-1:'unknown',0: 'human', 1:'elephant', 2:'lion', 3:'giraffe'}
#list of strings
classes_string = [class_dict[i] for i in classes]

Make a pandas dataset so we can write it out to csv file

d = {'filename': files, 'width': all_label_data[:,4], 'height': all_label_data[:,5], 'class': classes_string,
     'xmin': all_label_data[:,2], 'ymin': all_label_data[:,3], 'xmax': xmax, 'ymax': ymax }
df = pd.DataFrame(data=d)

Interrogate the columns:

df.keys()

Print the first few examples to screen:

df.head()

Print the last few examples to screen:

df.tail()

Write to file:

df.to_csv('conservationdrones_labels.csv')

Much better! All labels are in one file, that is more manageable and easier to read (in fact, it is stand-alone)

Writing data to a single TFRecord

Define some paths and inputs

root = 'data/conservationdrones'+os.sep
output_path = root+'conservationdrones.tfrecord'
csv_input = root+'conservationdrones_labels.csv'

Initiate a TFRecordWriter object that will write the TFRecords

import tensorflow as tf
writer = tf.io.TFRecordWriter(output_path)

Each image has variable number of annotations, so just like in the SECORRA example, we split the data into groups based on filename

examples = pd.read_csv(csv_input)
print('Number of labels: %i' % len(examples))
grouped = split(examples, 'filename')

How many images?

nb_images=len(grouped)
print('Number of images: %i' % nb_images)

We need a function like create_tf_example_coco that creates a bytestring from an image and associated boundng box

The following function differs from create_tf_example_coco in that filename paths need not be specified and concatenated, and that class_dict = {'unknown':-1,'human':0,'elephant':1, 'lion':2, 'giraffe':3} is usd to convert the strings back to integers

def create_tf_example_conservationdrones(group):
    """
    create_tf_example_conservationdrones(group)
    ""
    This function creates an example tfrecord consisting of an image and label encoded as bytestrings
    The jpeg image is read into a bytestring, and the bbox coordinates and classes are collated and
    converted also
    INPUTS:
        * group [pandas dataframe group object]
        * path [tensorflow dataset]: training dataset
    OPTIONAL INPUTS: None
    OUTPUTS:
        * tf_example [tf.train.Example object]
    GLOBAL INPUTS: BATCH_SIZE
    """
    with tf.io.gfile.GFile(group.filename, 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)

    filename = group.filename.encode('utf8')

    ids = []
    areas = []
    xmins = [] ; xmaxs = []; ymins = []; ymaxs = []
    labels = []
    is_crowds = []

    #for converting back to integer
    class_dict = {'unknown':-1,'human':0,'elephant':1, 'lion':2, 'giraffe':3}

    for index, row in group.object.iterrows():
        labels.append(class_dict[row['class']])
        ids.append(index)
        xmins.append(row['xmin'])
        ymins.append(row['ymin'])
        xmaxs.append(row['xmax'])
        ymaxs.append(row['ymax'])
        areas.append((row['xmax']-row['xmin'])*(row['ymax']-row['ymin']))
        is_crowds.append(False)

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'objects/is_crowd': int64_list_feature(is_crowds),
        'image/filename': bytes_feature(filename),
        'image/id': int64_list_feature(ids),
        'image': bytes_feature(encoded_jpg),
        'objects/xmin': float_list_feature(xmins), #xs
        'objects/xmax': float_list_feature(xmaxs), #xs
        'objects/ymin': float_list_feature(ymins), #xs
        'objects/ymax': float_list_feature(ymaxs), #xs
        'objects/area': float_list_feature(areas), #ys
        'objects/id': int64_list_feature(ids), #ys
        'objects/label': int64_list_feature(labels),
    }))

    return tf_example

Now we can write out each group (bounding boxes of each image)

for group in grouped:
    tf_example = create_tf_example_conservationdrones(group)
    writer.write(tf_example.SerializeToString())

Close the writer - we are done!

writer.close()
output_path = os.path.join(os.getcwd(), output_path)
print('Successfully created the TFRecords: {}'.format(output_path))

This is a big file (1.2 GB). But not as big as the 42,684 individual files that this one file contains, which is 2.3 GB

How do we split this big TFRecord up in small chunks?

Writing data to multiple TFRecords

This time we'll create smaller files with 1000 examples per file

ims_per_shard = 1000

How many individual files would we make?

SHARDS = int(nb_images / ims_per_shard) + (1 if nb_images % ims_per_shard != 0 else 0)
print(SHARDS)

shared_size = int(np.ceil(1.0 * nb_images / SHARDS))
print(shared_size)

Create indices into grouped that will enable writing SHARDS files, each containing shared_size examples

grouped_forshards = np.lib.stride_tricks.as_strided(np.arange(len(grouped)), (SHARDS, shared_size))

Write out each group to a different TFRecord file. Update a counter to increment the file name.

counter= 0
for indices in grouped_forshards[:-1]:

    tmp = [] #create a new list containing only data in indices
    for i in indices:
        tmp.append(grouped[i])

    # modify the original filepath in a consistent way
    output_path = root+'conservationdrones.tfrecord'
    output_path = output_path.replace('.tfrecord','')+ "{:02d}-{}.tfrec".format(counter, shared_size)
    writer = tf.io.TFRecordWriter(output_path)

    # write out each example to the shard
    for group in tmp:
        tf_example = create_tf_example_conservationdrones(group)
        writer.write(tf_example.SerializeToString())

    writer.close()
    print('Successfully created the TFRecords: {}'.format(output_path))

    counter += 1

How do you know you did it right?

You should read the data back in and plot it. Get a list of the tfrec files you made:

filenames = sorted(tf.io.gfile.glob(os.getcwd()+os.sep+root+'*.tfrec'))

This dictionary and associated parsing function is what you need to decode

features = {
    'image': tf.io.FixedLenFeature([], tf.string, default_value=''),
    'objects/xmin': tf.io.FixedLenSequenceFeature([], tf.float32, allow_missing=True),
    'objects/ymin': tf.io.FixedLenSequenceFeature([], tf.float32,allow_missing=True),
    'objects/xmax': tf.io.FixedLenSequenceFeature([], tf.float32,allow_missing=True),
    'objects/ymax': tf.io.FixedLenSequenceFeature([], tf.float32,allow_missing=True),
    'objects/label': tf.io.FixedLenSequenceFeature([], tf.int64,allow_missing=True),
}

def _parse_function(example_proto):
  # Parse the input `tf.train.Example` proto using the dictionary above.
  return tf.io.parse_single_example(example_proto, features)

Create a TFRecordDataset object from the list of files, and using the parsing function to create the dataset

dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(_parse_function)

Import a plotting library

import matplotlib.pyplot as plt

Get ten images and their associated labels and plot them

for i in dataset.take(10):
    image = tf.image.decode_jpeg(i['image'], channels=1)
    bbox = tf.numpy_function(np.array,[[i["objects/xmin"], i["objects/ymin"], i["objects/xmax"], i["objects/ymax"]]], tf.float32).numpy().T#.squeeze()
    print(len(bbox))

    ids = []
    for id in i["objects/label"].numpy():
       ids.append(class_dict[id])

    fig =plt.figure(figsize=(16,16))
    plt.axis("off")
    plt.imshow(image, cmap=plt.cm.gray)
    ax = plt.gca()

    for box,id in zip(bbox,ids):
        x1, y1, x2, y2 = box
        w, h = x2 - x1, y2 - y1
        patch = plt.Rectangle([x1, y1], w, h, fill=False, edgecolor=[0, 1, 0], linewidth=1)
        ax.add_patch(patch)
        ax.text(x1, y1, id, bbox={"facecolor": [0, 1, 0], "alpha": 0.4}, clip_box=ax.clipbox, clip_on=True, fontsize=5)
    plt.show()

Here's an example output. Two tiny(!) elephants in scene:

last word ...

Datasets like this are extremely difficult to acquire and in my view are just as valuable as other types of scholarly output, so I encourage you to publish your datasets (if they are F.A.I.R) and cite other's data. If you use this dataset, please consider citing the paper:

Bondi E, Jain R, Aggrawal P, Anand S, Hannaford R, Kapoor A, Piavis J, Shah S, Joppa L, Dilkina B, Tambe M. BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos.

Thanks to the authors for making it publicly available.

ML terminology, demystified

October 3, 2020

Dan Buscombe

Deep Learning (DL)

Use of large (or deep) neural networks. The primary advantage is the use of many layers that carry out automated feature extraction, rather than rules-based feature extraction, or feature engineering (research into pipelines that extract the optimal features).

Artificial Intelligence (AI)

I use this term to mean artificial general intelligence (machines that write their own programs from scratch, based on a comprehension of the task at hand). This doesn't exist in the Earth and Life sciences yet! In my opinion, conflating ML with AI is confusing and counter-productive, and it leaves no room for growth.

Model

A model of the data, if you will, or the way that data generates labels, to be more specific. The representation of what a machine learning system has learned from the training data.

Within TensorFlow, and Machine Learning more widely, the term model is overloaded:

  • The model technique/class or the specific python object (such as a Tensorflow graph) that expresses the structure of how a prediction will be computed.
  • The particular parameters (e.g. weights and biases) of that model, which are determined by training.

Regularization

The penalty on a model's complexity. Regularization helps prevent overfitting. Different kinds of regularization include:

  • L2 regularization: model penalties are squared magnitudes (model fits to the mean)
  • L1 regularization: model penalties are absolute magnitudes (model fits to the median)
  • dropout regularization
  • early stopping (this is not a formal regularization method, but can effectively limit overfitting)
  • data augmentation (again, not a formal regularization method)
  • batch normalization (see above)

Data augmentation

The process of creating transformed versions of the data. This process often, but not always, duplicates data. It's primary purpose is regularization, by providing alternative versions of the data, it learns feature representations that are invariant to location, scale, translation, etc. In ML Mondays we use augmentation in a few different ways; in Part 1 we create an augmentation pipeline that augments imagery without duplicating it, and we create and write augmented imagery to disk in Part 3.

Batch Normalization

Normalizing the input or output of the activation functions in a hidden layer. The normalization is slightly different each time, because it is dictated by the distribution of values in the batch. So therefore its effects are contingent in part to the size of the batch. Batch normalization can provide the following benefits:

  • Make neural networks more stable by protecting against outlier weights.
  • Enable higher learning rates, which in turn promotes faster training
  • Reduce overfitting.

Dropout Regularization

A form of regularization useful in training neural networks. Dropout regularization works by removing (actually, zeroing out temporarily) a random selection of a fixed number of the units in a network layer for a single gradient step. The more units dropped out, the stronger the regularization.

Early Stopping

A method for regularization that involves ending model training before training loss finishes decreasing. In early stopping, you end model training when the loss on a validation dataset starts to increase, that is, when generalization performance worsens.

Epoch

A full training pass over the entire dataset such that each example has been seen once. Thus, an epoch represents N/batch size training iterations, where N is the total number of examples.

Convergence

Informally, often refers to a state reached during training in which training loss and validation loss change very little or not at all with each iteration after a certain number of iterations. In other words, a model reaches convergence when additional training on the current data will not improve the model. In deep learning, loss values sometimes stay constant or nearly so for many iterations before finally descending, temporarily producing a false sense of convergence.

Convolutional Neural Network

A neural network in which at least one layer is a convolutional layer. A typical convolutional neural network consists of some alternate combinations of

  • convolutional layers
  • pooling layers

Followed by application of

  • dense (or fully connected) layers

Cross-Entropy

A measure of the difference between two probability distributions. A generalization of Log Loss to multi-class classification problems.

Discriminative model

A model that predicts labels from a set of one or more features. More formally, discriminative models define the conditional probability of an output given the features and weights; that is:

p(output | features, weights)

The vast majority of supervised learning models, including classification and regression models, are discriminative models.

Hyperparameter

The "knobs" that you can tweak during successive runs of training a model. For example, learning rate is a hyperparameter. Contrast with parameter, which is learned by the model during training

Loss (or cost)

A measure of how far a model's predictions are from its label. Or, a measure of how bad the model is. To determine this value, a model must define a loss function.

one-vs.-all

Given a classification problem with N possible solutions, a one-vs.-all solution consists of N separate binary classifiers—one binary classifier for each possible outcome. For example, given a model that classifies examples as animal, vegetable, or mineral, a one-vs.-all solution would provide the following three separate binary classifiers:

  • animal vs. not animal
  • vegetable vs. not vegetable
  • mineral vs. not mineral

this is the approach we take in segmentation.

Optimizer ("numerical solver")

A specific implementation of the gradient descent algorithm. TensorFlow's base class for optimizers is tf.train.Optimizer. Popular optimizers include:

  • AdaGrad, which stands for ADAptive GRADient descent.
  • Adam, which stands for ADAptive with Momentum (note that rsmprop is adam without momentum)

Supervised machine learning

Training a model from input data and its corresponding labels. Supervised machine learning is analogous to a student learning a subject by studying a set of questions and their corresponding answers. After mastering the mapping between questions and answers, the student can then provide answers to new (never-before-seen) questions on the same topic. Compare with unsupervised machine learning.

Semi-supervised learning

Training a model on data where some of the training examples have labels but others don’t. One technique for semi-supervised learning is to infer labels for the unlabeled examples, and then to train on the inferred labels to create a new model. Semi-supervised learning can be useful if labels are expensive to obtain but unlabeled examples are plentiful.

Unsupervised learning

Training a model to discover both the classes, and which class each sample represents. The 'holy grail' of Machine Learning, in many ways.

Transfer learning

Transferring information from one machine learning task to another. For example, in multi-task learning, a single model solves multiple tasks, such as a deep model that has different output nodes for different tasks. Transfer learning might involve transferring knowledge from the solution of a simpler task to a more complex one, or involve transferring knowledge from a task where there is more data to one where there is less data.

Most machine learning systems solve a single task. Transfer learning is a baby step towards artificial intelligence in which a single program can solve multiple tasks.

Tensor

The primary data structure in TensorFlow programs. Tensors are N-dimensional (where N could be very large) data structures, most commonly scalars, vectors, or matrices. The elements of a Tensor can hold integer, floating-point, or string values.

GPU "Determinism"

There is a lot of random parts to training a deep learning model, such as batches, augmentation, regularization, etc. When you run operations on several parallel threads, you typically do not know which thread will end first. It is not important when threads operate on their own data, so for example, applying an activation function to a tensor should be deterministic. But when those threads need to synchronize, such as when you compute a sum, then the result may depend on the order of the summation.

See this ML Mondays blog post for what we are doing in this course to counter these effects

"Eager execution"

A TensorFlow programming environment in which operations run immediately. By contrast, operations called in graph execution don't run until they are explicitly evaluated. Eager execution is an imperative interface, much like the code in most programming languages. Eager execution programs are generally far easier to debug than graph execution programs.

"Graph execution"

The opposite of eager execution. A TensorFlow programming environment in which the program first constructs a graph and then executes all or part of that graph. Graph execution was the default execution mode in TensorFlow 1.x. A give-away is use of tf.session

Adapted from entries in https://developers.google.com/machine-learning/glossary

Creating a Tensorflow Dataset for an image segmentation task

October 1, 2020

Dan Buscombe

Use case

You have a folder called imdir that folder contains tens to millions of jpeg files, and another, lab_path that contains tens to millions of corresponding label images, also as jpeg files. Label images are 8-bit, and encode labels as unique integer pixel values.

For the code to work, the images and label images need to be jpegs. On linux with the convert function from imagemagick, convert folder of pngs into jpegs

for file in *.png
do
convert $file $"${file%.png}.jpg"
done

Create a data pre-processing pipeline

Define a directory where you want to save your TFrecord files, called tfrecord_dir

Get a list of the images in imdir, get the number of images nb_images, and compute the size of each shard and the number of shards

images = tf.io.gfile.glob(imdir+os.sep+'*.jpg')

nb_images=len(tf.io.gfile.glob(imdir+os.sep+'*.jpg'))

SHARDS = int(nb_images / ims_per_shard) + (1 if nb_images % ims_per_shard != 0 else 0)

shared_size = int(np.ceil(1.0 * nb_images / SHARDS))s, 'filename')

Next, define a pre-processing workflow

dataset = tf.data.Dataset.list_files(imdir+os.sep+'*.jpg', seed=10000) # This also shuffles the images
dataset = dataset.map(read_image_and_label)
dataset = dataset.map(resize_and_crop_image, num_parallel_calls=AUTO)
dataset = dataset.map(recompress_image, num_parallel_calls=AUTO)
dataset = dataset.batch(shared_size)

Where the following function reads an image from file, and finds the corresponding label image and reads that in too (this example from the OBX dataset)

#-----------------------------------
def read_image_and_label(img_path):
    """
    "read_image_and_label(img_path)"
    This function reads an image and label and decodes both jpegs
    into bytestring arrays.
    This works by parsing out the label image filename from its image pair
    Thre are different rules for non-augmented versus augmented imagery
    INPUTS:
        * img_path [tensor string]
    OPTIONAL INPUTS: None
    GLOBAL INPUTS: None
    OUTPUTS:
        * image [bytestring]
        * label [bytestring]
    """
    bits = tf.io.read_file(img_path)
    image = tf.image.decode_jpeg(bits)

    # have to use this tf.strings.regex_replace utility because img_path is a Tensor object
    lab_path = tf.strings.regex_replace(img_path, "images", "labels")
    lab_path = tf.strings.regex_replace(lab_path, ".jpg", "_deep_whitewater_shallow_no_water_label.jpg")
    lab_path = tf.strings.regex_replace(lab_path, "augimage", "auglabel")
    bits = tf.io.read_file(lab_path)
    label = tf.image.decode_jpeg(bits)

    return image, label

The following function crops and image and label pair to square and resizes to a TARGET_SIZE

#-----------------------------------
def resize_and_crop_image(image, label):
    """
    "resize_and_crop_image"
    This function crops to square and resizes an image and label
    INPUTS:
        * image [tensor array]
        * label [tensor array]
    OPTIONAL INPUTS: None
    GLOBAL INPUTS: TARGET_SIZE
    OUTPUTS:
        * image [tensor array]
        * label [tensor array]
    """
    w = tf.shape(image)[0]
    h = tf.shape(image)[1]
    tw = TARGET_SIZE
    th = TARGET_SIZE
    resize_crit = (w * th) / (h * tw)
    image = tf.cond(resize_crit < 1,
                  lambda: tf.image.resize(image, [w*tw/w, h*tw/w]), # if true
                  lambda: tf.image.resize(image, [w*th/h, h*th/h])  # if false
                 )
    nw = tf.shape(image)[0]
    nh = tf.shape(image)[1]
    image = tf.image.crop_to_bounding_box(image, (nw - tw) // 2, (nh - th) // 2, tw, th)

    label = tf.cond(resize_crit < 1,
                  lambda: tf.image.resize(label, [w*tw/w, h*tw/w]), # if true
                  lambda: tf.image.resize(label, [w*th/h, h*th/h])  # if false
                 )
    label = tf.image.crop_to_bounding_box(label, (nw - tw) // 2, (nh - th) // 2, tw, th)

    return image, label

This function takes images and labels as byte strings and recodes as a 8bit jpeg

#-----------------------------------
def recompress_image(image, label):
    """
    "recompress_image"
    This function takes an image and label encoded as a byte string
    and recodes as an 8-bit jpeg
    INPUTS:
        * image [tensor array]
        * label [tensor array]
    OPTIONAL INPUTS: None
    GLOBAL INPUTS: None
    OUTPUTS:
        * image [tensor array]
        * label [tensor array]
    """
    image = tf.cast(image, tf.uint8)
    image = tf.image.encode_jpeg(image, optimize_size=True, chroma_downsampling=False)

    label = tf.cast(label, tf.uint8)
    label = tf.image.encode_jpeg(label, optimize_size=True, chroma_downsampling=False)
    return image, label

Write TFrecord shards to disk

Finally, write the dataset to tfrecord format files for 'analysis ready data' that is highly compressed and easy to share

for shard, (image, label) in enumerate(dataset):
  shard_size = image.numpy().shape[0]
  filename = tfrecord_dir+os.sep+"obx" + "{:02d}-{}.tfrec".format(shard, shard_size)

  with tf.io.TFRecordWriter(filename) as out_file:
    for i in range(shard_size):
      example = to_seg_tfrecord(image.numpy()[i],label.numpy()[i])
      out_file.write(example.SerializeToString())
    print("Wrote file {} containing {} records".format(filename, shard_size))

Image augmentation

Do you need to augment your imagery (create transformations of the image and label pairs and write them to disk)?

Define image dimensions

NY = 7360
NX = 4912

Define two ImageDataGenerator instances with the same arguments, one for images and the other for labels (masks)

data_gen_args = dict(featurewise_center=False,
                     featurewise_std_normalization=False,
                     rotation_range=5,
                     width_shift_range=0.1,
                     height_shift_range=0.1,
                     zoom_range=0.2)
image_datagen = tf.keras.preprocessing.image.ImageDataGenerator(**data_gen_args)
mask_datagen = tf.keras.preprocessing.image.ImageDataGenerator(**data_gen_args)

You likely can't read all your images into memory, so you'll have to do this in batches. The following loop will grab a new random batch of images, apply the augmentation generator to the images to create alternate versions, then writes those alternate versions of images and labels to disk

i=1
for k in range(14):

    #set a different seed each time to get a new batch of ten
    seed = int(np.random.randint(0,100,size=1))
    img_generator = image_datagen.flow_from_directory(
            imdir,
            target_size=(NX, NY),
            batch_size=10,
            class_mode=None, seed=seed, shuffle=True)

    #the seed must be the same as for the training set to get the same images
    mask_generator = mask_datagen.flow_from_directory(
            lab_path,
            target_size=(NX, NY),
            batch_size=10,
            class_mode=None, seed=seed, shuffle=True)

    #The following merges the two generators (and their flows) together:
    train_generator = (pair for pair in zip(img_generator, mask_generator))

    #grab a batch of 10 images and label images
    x, y = next(train_generator)

    # write them to file and increment the counter
    for im,lab in zip(x,y):
        imwrite(imdir+os.sep+'augimage_000'+str(i)+'.jpg', im)
        imwrite(lab_path+os.sep+'auglabel_000'+str(i)+'_deep_whitewater_shallow_no_water_label.jpg', lab)
        i += 1

    #save memory
    del x, y, im, lab
    #get a new batch

Making workflows reproducible

September 15, 2020

Dan Buscombe

Neural networks need randomization to effectively train, so this will always be a stochastic (rather than deterministic) process and loss curves will therefore always be different each time. Metrics like accuracy may also differ significantly.

However, there are some measures that can be taken that collectively attempt to ensure consistency in training.

Using TFRecords

One of the motivations for using TFRecords for data is to ensure a consistency in what images get allocated as training ad which get allocated as validation. These images are already randomized, and are not randomized further during training

Use deterministic operations

CuDNN does not guarantee reproducibility in some of its routines on GPUs, so it is not "deterministic". The operating system environment variable "TF_DETERMINISTIC_OPS" can be used to control this behaviour. Setting the environment variable TF_CUDNN_DETERMINISM=1 forces the selection of deterministic cuDNN convolution and max-pooling algorithms. When this is enabled, the algorithm selection procedure itself is also deterministic. However, selecting these deterministic options may reduce performance.

os.environ["TF_DETERMINISTIC_OPS"] = "1"

When we create a batched data set, we can set the option_no_order.experimental_deterministic to True

option_no_order = tf.data.Options()
option_no_order.experimental_deterministic = True

dataset = tf.data.Dataset.list_files(filenames)
dataset = dataset.with_options(option_no_order)

Use a seed for random number generation

Use a seed value and substantiate np.random.seed and tf.random.set_seed with it, which will subsequently apply to any numpy operations that involved random numbers

SEED=42
np.random.seed(SEED)
tf.random.set_seed(SEED)

Where possible, use a larger batch size

Larger batch sizes tend to promote more stable validation loss curves. This is usually only possible with relatively large hardware, because large batches mean larger amounts of GPU memory required.

Creating a Tensorflow Dataset for an image recognition task

September 1, 2020

Dan Buscombe

Use case

You have a folder called data, in which there are two additional folders train and test. The train folder contains hundreds to millions of jpeg files, each with a descriptive label in the file name. The test folder contains samples for which we do not yet know the label (we will use our trained model to estimate that). These are jpeg images without any descriptive label in the file name.

Problem:

We use a GPU (or TPU) to train Deep Neural Networks. To use such 'accelerated hardware' most efficiently, you should feed data fast enough to keep them busy. If your data stored as thousands of individual files (i.e. images), you may not be utilizing your GPU at maximum throughput.

Solution:

You should split your data across several larger files, and stream from multiple files in parallel

For maintaining efficient high-throughput for GPU computation

Tensorflow has strategies for this scenario:

  1. First, we batch the numerous small files in a TFRecord file
  2. Then, we use the power of tf.data.Dataset to read from multiple files in parallel.

The TFRecord file format is a record-oriented binary format. If your input data are on disk or working with large data then TensorFlow recommended using TFRecord format. You get a significant impact on the performance of your input pipeline. Binary data takes less space on disk, takes less time to copy and can be read more efficiently from disk.

import tensorflow as tf
import matplotlib.pyplot as plt
from tqdm import tqdm
from matplotlib.image import imread
import numpy as np
import os, json, random, requests, glob, math

tf.data.experimental.AUTOTUNE, or simply AUTO is used in the tf.data.Dataset API as a strategy for efficient use of your hardware. It will prompt the tf.data runtime to tune the value dynamically at runtime. This will be used here in the following contexts:

  1. set and optimize the number of parallel calls for functions that apply transformations to imagery (cropping, resizing, etc), using dataset.map calls
  2. Set and optimize the number of parallel calls for functions that load data into memory while models train, using dataset.interleave calls
  3. Set and optimize the number of parallel calls for functions that load data into asynchronously memory while models train, using dataset.prefetch calls. These use a background thread and an internal buffer to prefetch elements from the input dataset ahead of the time they are requested.

You could familiarize yourself with these topic by following this guide and then this one

os.environ["TF_DETERMINISTIC_OPS"] = "1"

SEED=42
np.random.seed(SEED)

AUTO = tf.data.experimental.AUTOTUNE

You have a folder called data, in which there are two additional folders train and test. The train folder contains hundreds to millions of jpeg files, each with a descriptive label in the file name. The test folder contains samples for which we do not yet know the label (we will use our trained model to estimate that). These are jpeg images without any descriptive label in the file name.

In this example below, we have a folder of more than 250,000 images of cats and dogs. Each jpeg file has either 'cat' or 'dog' in the file name. This workflow would apply to situations where you had more than 2 classes, as I will explain in a later post (illustrated using a different data set)

Creating TFRecords of images and discrete labels

Discrete labels in this sense means two things:

  1. There is only one label per image (known as single-label classification, to distingusih from multi-label classification where there are many labels per image)
  2. There is no overlap between labels, such that each label can be represented by a discrete number, i.e. an integer, counting up from zero.

We first have to define some variables - how many individual TFRecords per dataset -- called shards -- do we wish to make? (16)

How large are the square images we will use, in pixels? (160)

What are the classes? (cat, and dog, as a list of binary strings). Note that the class labels must be binary strings (b'binary string') rather than regular strngs ('string')

SHARDS = 16
TARGET_SIZE=160
CLASSES = [b'cat', b'dog']

Next we need to define the number of images in the train directory, nb_images, which is used to define how large each shard will be. Later, nb_images will be used for defining the number of model training and validation steps per epoch - more on that later.

nb_images=len(tf.io.gfile.glob('data/train/*.jpg'))

shared_size = math.ceil(1.0 * nb_images / SHARDS)

To create a TFRecord, we need a function that reads an image from a file, and strips the class name from the file name. This will be different for each dataset and therefore the below function would need modification for each new dataset. In the below, the label is extracted by first stripping the file separator out and removing all the file path before the file name (label = tf.strings.split(img_path, sep='/')), then taking everything before the dot (label = tf.strings.split(label[-1], sep='.')), then finally taking just the first item of the resulting list of file name parts label[0]. The image is simply read in as a binary string of bytes, then reconstructed into the jpeg format using the handy utility tf.image.decode_jpeg.

def read_image_and_label(img_path):

  bits = tf.io.read_file(img_path)
  image = tf.image.decode_jpeg(bits)

  label = tf.strings.split(img_path, sep='/')
  label = tf.strings.split(label[-1], sep='.')

  return image,label[0]

Neural nets often use imagery that is not at full resolution, due to memory limitations on GPUs. In this course, downsizing and cropping of imagery will be a fairly common practice. The function below carries out a resizing to the desired TARGET_SIZE (keeping track of which horizontal dimension is the largest), the crops to square

def resize_and_crop_image(image, label):
  w = tf.shape(image)[0]
  h = tf.shape(image)[1]
  tw = TARGET_SIZE
  th = TARGET_SIZE
  resize_crit = (w * th) / (h * tw)
  image = tf.cond(resize_crit < 1,
                  lambda: tf.image.resize(image, [w*tw/w, h*tw/w]), # if true
                  lambda: tf.image.resize(image, [w*th/h, h*th/h])  # if false
                 )
  nw = tf.shape(image)[0]
  nh = tf.shape(image)[1]
  image = tf.image.crop_to_bounding_box(image, (nw - tw) // 2, (nh - th) // 2, tw, th)
  return image, label

When a TFRecord is read back into memory, it is just a string of bytes, so the following function makes sure those bytes get encoded back into jpeg format (with no chroma subsampling)

def recompress_image(image, label):
  image = tf.cast(image, tf.uint8)
  image = tf.image.encode_jpeg(image, optimize_size=True, chroma_downsampling=False)
  return image, label

You should stuff data in a protocol buffer called Example. Example protocol buffer contains Features. The feature is a protocol to describe the data and could have three types: bytes (images), float (floating point labels), and int64 (discrete labels).

def _bytestring_feature(list_of_bytestrings):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=list_of_bytestrings))

def _int_feature(list_of_ints): # int64
  return tf.train.Feature(int64_list=tf.train.Int64List(value=list_of_ints))

def _float_feature(list_of_floats): # float32
  return tf.train.Feature(float_list=tf.train.FloatList(value=list_of_floats))

We serialize the protocol buffer to a string and write it to a TFRecords files.

def to_tfrecord(img_bytes, label):  

  class_num = np.argmax(np.array(CLASSES)==label)
  feature = {
      "image": _bytestring_feature([img_bytes]), # one image in the list
      "class": _int_feature([class_num]),        # one class in the list      
  }
  return tf.train.Example(features=tf.train.Features(feature=feature))

Get a list of files and shuffle them, then create a mapping to link them to the jpeg files so they can be read on the fly

dataset = tf.data.Dataset.list_files('data/train/*.jpg', seed=10000) # This also shuffles the images
dataset = dataset.map(read_image_and_label)

Next, create a new mapping to the resize and crop function

dataset = dataset.map(resize_and_crop_image, num_parallel_calls=AUTO)  

Finally, a mapping to the jpeg recompression function, and then set the batch which dictates how many images per shard

dataset = dataset.map(recompress_image, num_parallel_calls=AUTO)
dataset = dataset.batch(shared_size)

Okay, now the dataset is set up, we can begin writing the data to TFRecords. The following function oads image files, resizes them to a common size and then stores them across NUM_SHARDS TFRecord files. It reads from files in parallel and disregards the order of the data in favour of reading speed. Selecting each (random) pair of image and label sequentially, we call the to_tfrecord function we defined earlier to create the example record, then that is serialized to string and appended to the file. When the requisite number of images for a shard has been reached, a new shard is created.

for shard, (image, label) in enumerate(dataset):
  shard_size = image.numpy().shape[0]
  filename = "cat_dog" + "{:02d}-{}.tfrec".format(shard, shard_size)

  with tf.io.TFRecordWriter(filename) as out_file:
    for i in range(shard_size):
      example = to_tfrecord(image.numpy()[i],label.numpy()[i])
      out_file.write(example.SerializeToString())
    print("Wrote file {} containing {} records".format(filename, shard_size))

Preparing for model training

We now need some functions to read those TFRecords in and use them during model training. We wish to parallelize the data loading step as much as possible; individual images and labels are small, but there are many thousands of them, so we need to ensure both fast and also consistent rate of data throughput. We acheive this by interleaving the contents of the datasets. The number of datasets to overlap can be specified by the cycle_length argument (set to 16 here). This is where we also see many of the benefits of tf.data.experimental.AUTOTUNE, or simply AUTO is used in the tf.data.Dataset API to tune the value dynamically at runtime.

def get_batched_dataset(filenames):
  option_no_order = tf.data.Options()
  option_no_order.experimental_deterministic = False

  dataset = tf.data.Dataset.list_files(filenames)
  dataset = dataset.with_options(option_no_order)
  dataset = dataset.interleave(tf.data.TFRecordDataset, cycle_length=16, num_parallel_calls=AUTO)
  dataset = dataset.map(read_tfrecord, num_parallel_calls=AUTO)

  dataset = dataset.cache() # This dataset fits in RAM
  dataset = dataset.repeat()
  dataset = dataset.shuffle(2048)
  dataset = dataset.batch(BATCH_SIZE, drop_remainder=True) # drop_remainder will be needed on TPU
  dataset = dataset.prefetch(AUTO) #

  return dataset

We call that function for both training and validation subsets, that we will define below

def get_training_dataset():
  return get_batched_dataset(training_filenames)

def get_validation_dataset():
  return get_batched_dataset(validation_filenames)

The following function will read an individual example (random image, label pair) in a TFRecord. Extract the tf.train.Example protocol buffer messages from a TFRecord-format file. Each tf.train.Example record contains one or more “features”, and the input pipeline typically converts these features into tensors.

def read_tfrecord(example):
    features = {
        "image": tf.io.FixedLenFeature([], tf.string),  # tf.string = bytestring (not text string)
        "class": tf.io.FixedLenFeature([], tf.int64),   # shape [] means scalar
    }
    # decode the TFRecord
    example = tf.io.parse_single_example(example, features)

    image = tf.image.decode_jpeg(example['image'], channels=3)
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.reshape(image, [TARGET_SIZE,TARGET_SIZE, 3])

    class_label = tf.cast(example['class'], tf.int32)

    return image, class_label

Here we set the batch size. This is a hyperparameter (i.e. set by you, not by model training) and its value is dictated (for the most part) by GPU memory considerations. We are using small imagery, so we can fit a relatively large batch into memory at once. We'll go for something relatively high (> say, 20)

BATCH_SIZE = 32

This bit of code just makes sure that the dataset will be read correctly from the TFRecord files. We find them all using glob pattern matching (using 'cat*.tfrec') to form an input dataset, then use the .take() command to grab 1 batch. Print the labels out to screen - they should be integers. We also print the image dimensions out to ensure they are correct

training_filenames=tf.io.gfile.glob('cat*.tfrec')

train_ds = get_training_dataset()
for imgs,lbls in train_ds.take(1):
  print(lbls)
  print(imgs.shape)

We wrote all of our images out to TFRecords - we didn't split them into test and validation sets first. That gives us more flexibility to assign train/validation subsets (called splits) here. Below we define VALIDATION_SPLIT, which is the proportion of the total data that will be used for validation. The rest will be used for training. We grab the filenames (again). These are already shuffled, but we can shuffle yet again to ensure the images really do get drawn as randomly as possible from the deck. Then we define train and validation file lists based on the split.

VALIDATION_SPLIT = 0.19

filenames=tf.io.gfile.glob('cat*.tfrec')

random.shuffle(filenames)
split = int(len(filenames) * VALIDATION_SPLIT)

training_filenames = filenames[split:]
validation_filenames = filenames[:split]

During model training, one epoch provides the model an opportunity to 'see' the entire dataset. So the number of steps per epoch is essentially the number of unique samples of your dataset divided by the batch size.

validation_steps = int(nb_images // len(filenames) * len(validation_filenames)) // BATCH_SIZE
steps_per_epoch = int(nb_images // len(filenames) * len(training_filenames)) // BATCH_SIZE  

Model training using TFRecords

To demonstrate training using this workflow, we choose a simple (so-called vanilla) model that we construct using a few convolutional filter blocks of increasing size, interspersed with MaxPooling layers, and finally a global pooling and a classifier head layer. This is very similar in design to dozens of examples you can find online using toy datasets such as this. This isn't necessarily the most optimal or powerful model for this or any other dataset, but it'll do fine for demonstration (and will actually likely to be close to optimal considering the relative simplicity of the data/problem)

model = tf.keras.Sequential([

    tf.keras.layers.Conv2D(kernel_size=3, filters=16, padding='same', activation='relu', input_shape=[TARGET_SIZE,TARGET_SIZE, 3]),
    tf.keras.layers.Conv2D(kernel_size=3, filters=32, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=2),

    tf.keras.layers.Conv2D(kernel_size=3, filters=64, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=2),

    tf.keras.layers.Conv2D(kernel_size=3, filters=128, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=2),

    tf.keras.layers.Conv2D(kernel_size=3, filters=256, padding='same', activation='relu'),

    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(1,'sigmoid')
])

You must .compile your model before you train, giving it an optimizer, loss function and a metric to keep track of. The options below are fairly standard - we use binary_crossentropy - 'binary' because we only have two classes (otherwise you would choose 'categorical') and 'crossentropy' because this is an image recognition problem

model.compile(optimizer='Adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

Call .fit() to train the model

model.fit(get_training_dataset(), steps_per_epoch=steps_per_epoch, epochs=10,
                      validation_data=get_validation_dataset(), validation_steps=validation_steps)

Model validation

This little function will convert the integer label into a string label

get_label = lambda x : 'cat' if (x==0) else 'dog'

Below we call the validation dataset (ds=get_validation_dataset()) and plot one (ds.take(1)) batch. The figure shows 8 x 4 example images, their actual labels and their model-predicted values

fig = plt.figure(figsize=(12,28))

cnt=1

ds=get_validation_dataset()

for imgs,lbls in ds.take(1):
  predicted_classes=model.predict_classes(imgs)
  for img,lbl,cl in zip(imgs,lbls,predicted_classes):

    fig.add_subplot(8,4, cnt)
    plt.title('obs: {} / est: {}'.format(get_label(lbl),get_label(cl[0])))
    plt.imshow(img)
    plt.axis('off')
    cnt=cnt+1

We need a new function for model evaluation. This version creates batches, but doesn't repeat them (no dataset.repeat() command)

def get_eval_dataset(filenames):
  option_no_order = tf.data.Options()
  option_no_order.experimental_deterministic = False

  dataset = tf.data.Dataset.list_files(filenames)
  dataset = dataset.with_options(option_no_order)
  dataset = dataset.interleave(tf.data.TFRecordDataset, cycle_length=16, num_parallel_calls=AUTO)
  dataset = dataset.map(read_tfrecord, num_parallel_calls=AUTO)

  dataset = dataset.cache() # This dataset fits in RAM
  dataset = dataset.shuffle(2048)
  dataset = dataset.batch(BATCH_SIZE, drop_remainder=True) # drop_remainder will be needed on TPU
  dataset = dataset.prefetch(AUTO) #

  return dataset

def get_validation_eval_dataset():
  return get_eval_dataset(validation_filenames)

To get a global sense of the skill of the model, call .evaluate on the entire validation set, which will use the model for prediction on all validation images, then compare the predicted versus observed labels for each, with what metrics you used when you compiled the model before training (we used accuracy). Print the mean accuracy in percent to screen.

loss, accuracy = model.evaluate(get_validation_eval_dataset())
print('Test Mean Accuracy: ', round((accuracy)*100, 2))

Model deployment

Apply to test (unseen) sample imagery - here I have limited to BATCH_SIZE number of images just for illustration

test_filenames = glob.glob('data/test1/*.jpg')[:BATCH_SIZE]
len(test_filenames)

For prediction on raw imagery (rather than pre-processed tensors in the TFRecords file), we need a resizing and converting and normalizing function

def preprocess_image(image):
  image = tf.image.resize(image, (TARGET_SIZE, TARGET_SIZE))
  image = tf.image.convert_image_dtype(image, tf.float32)
  image = image/255.
  return image

Test using one image

im = preprocess_image(imread(test_filenames[13]))
plt.imshow(im)
predicted_classes=model.predict_classes(np.expand_dims(im,axis=0))
get_label(predicted_classes.squeeze())

Test on a whole batch

imgs = []
predicted_classes = []
for f in test_filenames:
  im = preprocess_image(imread(f))
  imgs.append(im)
  predicted_classes.append(int(model.predict_classes(np.expand_dims(im,axis=0)).squeeze().astype('int')))

Make a similar plot as before, but this time we only have the model predicted class, not the ground truth class

fig = plt.figure(figsize=(12,28))

cnt=1
for img,cl in zip(imgs,predicted_classes):
  fig.add_subplot(8,4, cnt)
  plt.title('est: {}'.format(get_label(cl)))
  plt.imshow(img)
  plt.axis('off')
  cnt=cnt+1

Converting between YOLO and PASCAL-VOC object recognition formats, and creating a Tensorflow Dataset

August 17, 2020

Dan Buscombe

This blog post walks through the (somewhat cumbersome - I won't lie!) process of converting between YOLO and PASCAL-VOC 'bounding box' annotation data formats for image recognition problems.

The files we create using makesense.ai and downloaded in YOLO format with the .txt extension can be converted to the PASCAL-VOC format with the .xml extension. This blog post shows you how to do that with python. I also show you how to convert to a generic csv format that is also sometimes used. Finally, I show you how to convert your PASCAL-VOC format data into a Tensorflow TFRecord that use Protocol buffers, which are a cross-platform, cross-language library for efficient serialization of structured data.

Resources I used

These Tensorflow instructions for how to add a dataset, as well as some more specific Tensorflow object detection workflows, and finally the Tensorflow Model Garden, which you'll use here. This and this gave some outdated advice that was nevertheless useful, if not used here. This provides more details on TFRecords and their usages.

First, dealing with 'empty' imagery/annotations

We first need to make sure there is a txt file for every image. Any missing .txt files are for images with no annotations (i.e. no people). So, we create an empty txt file with the right name if it is missing.

We only need two libraries for this:

import os, glob

And one for-loop that iterates through each folder of images (test, train, and validation in the example below). If a certain .txt file is missing, it simply creates an empty one

for cond in ['test', 'train','validation']:

    jpg = glob.glob(cond+'/*.jpg')

    for f in jpg:
       file_query = f.replace('jpg','txt').replace(cond, cond+'_labels')
       if os.path.isfile(file_query):
          pass
       else:
          print("Creating %s" % (file_query))
          with open(file_query, 'w') as fp:
             pass

Second, YOLO to PASCAL-VOC format conversion

PASCAL-VOC is a very common object recognition data format, probably more common than the YOLO format. Many example workflows will use either one of these two formats. Here we convert YOLO (.txt) format to PASCAL-VOC (.xml).

Let's set up the problem. Define an IMG_PATH containing jpg images (in the example below, called test), and a corresponding folder containing the associated .txt files (called test_labels below). This is what my file paths look like on my Linux box:

IMG_PATH = "/media/marda/TWOTB/USGS/SOFTWARE/MLMONDAYS/2_ObjRecog/test"

# txt_folder is txt file root that using makesense.ai rectbox
txt_folder = "/media/marda/TWOTB/USGS/SOFTWARE/MLMONDAYS/2_ObjRecog/test_labels"

We define a list of labels. We only have one label, person

fw = os.listdir(IMG_PATH)
# path of save xml file
save_path = '' # keep it blank

labels = ['person']
global label
label = ''

Some utilities:

def csvread(fn):
    with open(fn, 'r') as csvfile:
        list_arr = []
        reader = csv.reader(csvfile, delimiter=' ')
        for row in reader:
            list_arr.append(row)
    return list_arr

def convert_label(txt_file):
    global label
    for i in range(len(labels)):
        if txt_file[0] == str(i):
            label = labels[i]
            return label
    return label

This is the code that extract the info from a YOLO record:

def extract_coor(txt_file, img_width, img_height):
    x_rect_mid = float(txt_file[1])
    y_rect_mid = float(txt_file[2])
    width_rect = float(txt_file[3])
    height_rect = float(txt_file[4])

    x_min_rect = ((2 * x_rect_mid * img_width) - (width_rect * img_width)) / 2
    x_max_rect = ((2 * x_rect_mid * img_width) + (width_rect * img_width)) / 2
    y_min_rect = ((2 * y_rect_mid * img_height) -
                  (height_rect * img_height)) / 2
    y_max_rect = ((2 * y_rect_mid * img_height) +
                  (height_rect * img_height)) / 2

    return x_min_rect, x_max_rect, y_min_rect, y_max_rect

Loop through each file (in fw) and carry out the conversion, writing one xml format file for each txt file you have

for line in fw:
    root = etree.Element("annotation")

    # try debug to check your path
    img_style = IMG_PATH.split('/')[-1]
    img_name = line
    image_info = IMG_PATH + "/" + line
    img_txt_root = txt_folder + "/" + line[:-4]
    # print(img_txt_root)
    txt = ".txt"

    txt_path = img_txt_root + txt
    # print(txt_path)
    txt_file = csvread(txt_path)

    # read the image  information
    img_size = Image.open(image_info).size

    img_width = img_size[0]
    img_height = img_size[1]
    img_depth = Image.open(image_info).layers

    folder = etree.Element("folder")
    folder.text = "%s" % (img_style)

    filename = etree.Element("filename")
    filename.text = "%s" % (img_name)

    path = etree.Element("path")
    path.text = "%s" % (IMG_PATH)

    source = etree.Element("source")

    source_database = etree.SubElement(source, "database")
    source_database.text = "Unknown"

    size = etree.Element("size")
    image_width = etree.SubElement(size, "width")
    image_width.text = "%d" % (img_width)

    image_height = etree.SubElement(size, "height")
    image_height.text = "%d" % (img_height)

    image_depth = etree.SubElement(size, "depth")
    image_depth.text = "%d" % (img_depth)

    segmented = etree.Element("segmented")
    segmented.text = "0"

    root.append(folder)
    root.append(filename)
    root.append(path)
    root.append(source)
    root.append(size)
    root.append(segmented)

    for ii in range(len(txt_file)):
        label = convert_label(txt_file[ii][0])
        x_min_rect, x_max_rect, y_min_rect, y_max_rect = extract_coor(
            txt_file[ii], img_width, img_height)

        object = etree.Element("object")
        name = etree.SubElement(object, "name")
        name.text = "%s" % (label)

        pose = etree.SubElement(object, "pose")
        pose.text = "Unspecified"

        truncated = etree.SubElement(object, "truncated")
        truncated.text = "0"

        difficult = etree.SubElement(object, "difficult")
        difficult.text = "0"

        bndbox = etree.SubElement(object, "bndbox")
        xmin = etree.SubElement(bndbox, "xmin")
        xmin.text = "%d" % (x_min_rect)
        ymin = etree.SubElement(bndbox, "ymin")
        ymin.text = "%d" % (y_min_rect)
        xmax = etree.SubElement(bndbox, "xmax")
        xmax.text = "%d" % (x_max_rect)
        ymax = etree.SubElement(bndbox, "ymax")
        ymax.text = "%d" % (y_max_rect)

        root.append(object)

    file_output = etree.tostring(root, pretty_print=True, encoding='UTF-8')
    # print(file_output.decode('utf-8'))
    ff = open('%s.xml' % (img_name[:-4]), 'w', encoding="utf-8")
    ff.write(file_output.decode('utf-8'))

Third, create a TF-RECORD from the PASCAL-VOC data

The preferred way to carry out this procedure seems to change regularly, so it can be tricky to find out this information. Let's start by creating a new conda environment for this task, callled tf_test_py36, containing a specific version of python (my current go-to at the time of writing is 3.6 rather than the stable 3.7, because of dependency issues that can sometimes arise on Windows OS)

conda create --name tf_test_py36 python=3.6 tensorflow lxml contextlib2

and activate:

conda activate tf_test_py36

Getting Tensorflow Garden set up

Clone the Tensorflow Garden GitHub repository:

git clone https://github.com/tensorflow/models.git

Add the top-level /models folder to your system Python path.

export PYTHONPATH=$PYTHONPATH:/media/marda/TWOTB/USGS/SOFTWARE/models

Install other dependencies:

pip install --user -r official/requirements.txt
cd research
protoc object_detection/protos/*.proto --python_out=.

Create the tf-record

(Your current directory should be models/research)

This workflow is specific to the POB (People on Beaches) dataset that only has one label, so first, create a file called object_detection/data/pob_label_map.pbtxt and copy the following into it:

item {
  id: 1
  name: 'person'
}

Second, create a new file called POB_images, and copy all your jpg files and corresponding xml files into it - all together

Third, create a new file object_detection/dataset_tools/create_pob_tf_record.py and copy the following code into it. The variable num_shards is the number of pieces you'd like to create. It matters not for this dataset; we use 10.

from glob import glob
import hashlib, io, os, logging, random, re, contextlib2
from lxml import etree
import numpy as np
import PIL.Image
import tensorflow.compat.v1 as tf

from object_detection.dataset_tools import tf_record_creation_util
from object_detection.utils import dataset_util
from object_detection.utils import label_map_util

flags = tf.app.flags
flags.DEFINE_string('data_dir', '', 'Root directory to raw pet dataset.')
flags.DEFINE_string('output_dir', '', 'Path to directory to output TFRecords.')
flags.DEFINE_string('label_map_path', 'data/pet_label_map.pbtxt',
                    'Path to label map proto')
flags.DEFINE_integer('num_shards', 10, 'Number of TFRecord shards')

FLAGS = flags.FLAGS

The following is the main function that gets called to carry out the conversion. It creates a single tf.Example message (or protobuf), which is a flexible message type that represents a {"string": value} mapping

def dict_to_tf_example(data,
                       label_map_dict,
                       image_subdirectory,
                       ignore_difficult_instances=False):
  """Convert XML derived dict to tf.Example proto.

  Notice that this function normalizes the bounding box coordinates provided
  by the raw data.

  Args:
    data: dict holding PASCAL XML fields for a single image (obtained by
      running dataset_util.recursive_parse_xml_to_dict)
    label_map_dict: A map from string label names to integers ids.
    image_subdirectory: String specifying subdirectory within the
      Pascal dataset directory holding the actual image data.
    ignore_difficult_instances: Whether to skip difficult instances in the
      dataset  (default: False).

  Returns:
    example: The converted tf.Example.

  Raises:
    ValueError: if the image pointed to by data['filename'] is not a valid JPEG
  """
  img_path = os.path.join(image_subdirectory, data['filename'])
  with tf.gfile.GFile(img_path, 'rb') as fid:
    encoded_jpg = fid.read()
  encoded_jpg_io = io.BytesIO(encoded_jpg)
  image = PIL.Image.open(encoded_jpg_io)
  if image.format != 'JPEG':
    raise ValueError('Image format not JPEG')
  key = hashlib.sha256(encoded_jpg).hexdigest()

  width = int(data['size']['width'])
  height = int(data['size']['height'])

  xmins = []
  ymins = []
  xmaxs = []
  ymaxs = []
  classes = []
  classes_text = []
  truncated = []
  poses = []
  difficult_obj = []
  if 'object' in data:
    for obj in data['object']:
      difficult = bool(int(obj['difficult']))
      if ignore_difficult_instances and difficult:
        continue
      difficult_obj.append(int(difficult))

      xmin = float(obj['bndbox']['xmin'])
      xmax = float(obj['bndbox']['xmax'])
      ymin = float(obj['bndbox']['ymin'])
      ymax = float(obj['bndbox']['ymax'])

      xmins.append(xmin / width)
      ymins.append(ymin / height)
      xmaxs.append(xmax / width)
      ymaxs.append(ymax / height)
      class_name = 'person' #get_class_name_from_filename(data['filename'])
      classes_text.append(class_name.encode('utf8'))
      classes.append(label_map_dict[class_name])
      truncated.append(int(obj['truncated']))
      poses.append(obj['pose'].encode('utf8'))

  feature_dict = {
      'image/height': dataset_util.int64_feature(height),
      'image/width': dataset_util.int64_feature(width),
      'image/filename': dataset_util.bytes_feature(
          data['filename'].encode('utf8')),
      'image/source_id': dataset_util.bytes_feature(
          data['filename'].encode('utf8')),
      'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
      'image/encoded': dataset_util.bytes_feature(encoded_jpg),
      'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
      'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
      'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
      'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
      'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
      'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
      'image/object/class/label': dataset_util.int64_list_feature(classes),
      'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
      'image/object/truncated': dataset_util.int64_list_feature(truncated),
      'image/object/view': dataset_util.bytes_list_feature(poses),
  }

  example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
  return example

This portion does the file writing (i.e. creates the .tfrecord files from the collection of tf.Example records):

def create_tf_record(output_filename,
                     num_shards,
                     label_map_dict,
                     annotations_dir,
                     image_dir,
                     examples):
  """Creates a TFRecord file from examples.

  Args:
    output_filename: Path to where output file is saved.
    num_shards: Number of shards for output file.
    label_map_dict: The label map dictionary.
    annotations_dir: Directory where annotation files are stored.
    image_dir: Directory where image files are stored.
    examples: Examples to parse and save to tf record.
  """
  with contextlib2.ExitStack() as tf_record_close_stack:
    output_tfrecords = tf_record_creation_util.open_sharded_output_tfrecords(
        tf_record_close_stack, output_filename, num_shards)
    for idx, example in enumerate(examples):
      if idx % 100 == 0:
        logging.info('On image %d of %d', idx, len(examples))
      xml_path = os.path.join(annotations_dir, 'xmls', example.split('.jpg')[0] + '.xml')

      if not os.path.exists(xml_path):
        logging.warning('Could not find %s, ignoring example.', xml_path)
        continue
      with tf.gfile.GFile(xml_path, 'r') as fid:
        xml_str = fid.read()
      xml = etree.fromstring(xml_str)
      data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']

      try:
        tf_example = dict_to_tf_example(
            data,
            label_map_dict,
            image_dir)
        if tf_example:
          shard_idx = idx % num_shards
          output_tfrecords[shard_idx].write(tf_example.SerializeToString())
      except ValueError:
        logging.warning('Invalid example: %s, ignoring.', xml_path)

The main function reads all the jpg files in POB_images, as well as all the xml files in the same directory. Note that this could be set up differently, to read the xml files from a separate annotations_dir. The files are randomly shuffled. The number of training examples is 70% of the total, and the remaining 30% of files are used for validation. It then calls the create_tf_record function to create the .tfrecord set of 10 files.

def main(_):
  data_dir = FLAGS.data_dir
  label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path)

  logging.info('Reading from POB dataset.')
  image_dir = annotations_dir = os.path.join(data_dir, 'POB_images')

  examples_list = glob(image_dir+'/*.jpg')

  # Test images are not included in the downloaded data set, so we shall perform
  # our own split.
  random.seed(42)
  random.shuffle(examples_list)
  num_examples = len(examples_list)
  num_train = int(0.7 * num_examples)
  train_examples = examples_list[:num_train]
  val_examples = examples_list[num_train:]
  logging.info('%d training and %d validation examples.',
               len(train_examples), len(val_examples))

  train_output_path = os.path.join(FLAGS.output_dir, 'pob_train.record')
  val_output_path = os.path.join(FLAGS.output_dir, 'pob_val.record')

  # call to create the training files
  create_tf_record(
      train_output_path,
      FLAGS.num_shards,
      label_map_dict,
      annotations_dir,
      image_dir,
      train_examples)

# call again to make the validation set
  create_tf_record(
      val_output_path,
      FLAGS.num_shards,
      label_map_dict,
      annotations_dir,
      image_dir,
      val_examples)

if __name__ == '__main__':
  tf.app.run()

And finally, run the script and make your tf-record format data ...

python object_detection/dataset_tools/create_pob_tf_record.py --label_map_path=object_detection/data/POB_label_map.pbtxt  --data_dir=`pwd` --output_dir=`pwd`

This will create 10 files for the training data, and 10 for the validation set. You can now use these files to efficiently train a model using Tensoflow/Keras.

[OPTIONAL] XML to CSV

Sometimes you also see people use object annotations in csv format. Luckily we can use the xml library to help carry out the data parsing, and pandas to easily convert to a dataframe, and then to a formatted csv file

import os, glob
import pandas as pd
import xml.etree.ElementTree as ET

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df

Then use like this:

image_path = os.path.join(os.getcwd(),'test_labels_xml')
xml_df = xml_to_csv(image_path)
xml_df.to_csv('test_labels.csv', index=None)

Trimming and decompiling a video into png image files, for use in your deep learning project

August 5, 2020

Dan Buscombe

You can access from the NOAA NOS Web Camera Applications Testbed (WebCAT) Live Cameras and Historic Feeds site. The image above shows how to view the image I will use in the demonstration below.

To extract a subsection (or trim) of the file staugustinecam.2019-01-11_0900.mp4, from 1 min 47 s to 6 min 22 s, and output it to the file staugustinecam.2019-01-11_0900_trim.mp4:

ffmpeg -i staugustinecam.2019-01-11_0900.mp4 -ss 00:01:47 -t 00:06:22 -async 1 staugustinecam.2019-01-11_0900_trim.mp4

To extract png format files from that trimmed video, writing the frame number to the file name:

ffmpeg -i staugustinecam.2019-01-11_0900_trim.mp4 staugustinecam.2019-01-11_0900_%d.png

Make a directory:

mkdir staugustinecam_2019-01-11_0900

Move all of the png files into that directory:

mv *.png staugustinecam_2019-01-11_0900

Make a bash script so you can execute the decompiling (only) sequence of commands on an arbitrary mp4 file from the command line. First, open a new file called nano decompile_mp4.sh:

nano decompile_mp4.sh

and write or copy the following into it:

echo "decompiling video file $1, and moving png frames into $2, in T minus 5 seconds ..."
sleep 5s
ffmpeg -i $1 $2_%d.png
mkdir $2
mv *.png $2

To exit nano:

Ctrl+X (for exit) Y (for yes) enter (for save)

Use it like this:

bash decompile_mp4.sh myvid.mp4 myframes

To extract 1 frame per minute, modify to:

echo "decompiling video file $1 - one frame every minute, and moving png frames into $2, in T minus 5 seconds ..."
sleep 5s
ffmpeg -i $1 -vf fps=1/60 $2_%d.png
mkdir $2
mv *.png $2

Let's say you've carried out the above procedure on lots of videos and you have lots of folders containing frames. Now you want to randomly move some images from each directory into three separate folders that you'll use in your deep learning project, called test, train, and validate (it is also common to use test for the purposes of both testing and validating)

mkdir train
mkdir test
mkdir validate

To move 10 random files to each from a directory called my_directory containing png frames:

cd my_directory
shuf -n 10 -e * | xargs -i mv {} ../train/
shuf -n 10 -e * | xargs -i mv {} ../test/
shuf -n 10 -e * | xargs -i mv {} ../validate/

or to generalize:

for direc in st*/
do
cd $direc
shuf -n 10 -e * | xargs -i mv {} ../train/
shuf -n 10 -e * | xargs -i mv {} ../test/
shuf -n 10 -e * | xargs -i mv {} ../validate/  
cd ..
done

In the above, I say st*/ to list only directories beginning with st, rather than listing all directories (/*), which would include test, train, and validate and would not work

Finally, to convert all png files to jpg files, use:

mogrify -format jpg *.png
"ML Mondays"
Internal links
DocsDataHelp
Community
Stack OverflowUSGS Community for Data Integration (CDI)USGS Remote Sensing Coastal Change Projectwww.danielbuscombe.com
More
BlogGitHubStar
Follow @magic_walnut
Marda Science
Copyright © 2020 Marda Science, LLC