Image classification using Machine Learning could have many useful applications for daily tasks and has the potential to make our lives easier.
Convolutional Neural Network (CNN) is one of the most common techniques when it comes to image classification but sometimes could be thought as black-boxes where we do not really know what characteristics of the images are triggering the responses. Thus, there is the need to improve the understanding of how the model operates. This greater understanding could lead to improve model accuracy and also to avoid biases in the datasets.
Regarding to biases in CNN models and its importance there is a very interesting urban legend involving the US Army and a tank detection program that could be read here: https://pyimagesearch.com/2020/03/09/grad-cam-visualize-class-activation-maps-with-keras-tensorflow-and-deep-learning/
Grad-CAM can help us to shine some light into the black-box by showing us which pixels of the image were the ones that the model found important to pick the class.
This project will:
- Develop Machine Learning programs to perform image classification on cutlery
- Implement Grad-CAM programs to highlight the areas of the image that were the most influential for image classification
You can choose between training your own model or by taking one that has already been trained on Edge Impulse following this link. If you choose to take the Edge Impulse way you can go directly to the Running Grad-CAM step.
Machine Learning model will give as an output one of four possible categories:
- Background
- Fork
- Knife
- Spoon
Grad-CAM will help us by highlighting which pixels of the image were the ones that the model found important to make the classification.
What You'll Need- Optional: Edge Impulse account (edgeimpulse.com)
First step in every ML project is to get the data, which in this case will be images. As bigger and less biased the dataset is the better the model will perform.
For this project I picked my mobile phone and I took 75 pictures of each of the 4 categories (making up to 300 images in total). In my case, each of the images has a size of 4000x2250 pixels and will be later shrinked to lighten the training model.
Once that you have finished taking the pictures download them to your computer.
Its always a good practice to tag the pictures and number them to make further steps easier.
Sampling biasesIts important to be extremely careful while taking the samples because a biased dataset could seriously affect model performance in its real world application. In this case, there are a few biases that I considered as acceptable as:
- All the images have the same background
- I only used one sort of fork, knife and spoon
- Photographs were taken from the top
Model will perform poorly in a different background, or with other spoon design or with images taken from a side for example.
Data manipulationOnce that you have your images tagged in your computer you can run JPG Image editor - Resizing images.py to reduce its size by ten times (in my case from 4000x2250 to 400x225).
Note 1: I'm assuming that the images are in JPG format
Note 2: Save and run this JPG Image editor - Resizing images.py file in a folder that only has the images to be shrinked.
import os
import numpy as np
import PIL
from PIL import Image
from PIL import ImageFilter
from PIL import ImageEnhance
from PIL import ImageOps
files_in_path = os.listdir()
files_in_path_lst = []
for fip in files_in_path:
files_in_path_lst.append(fip)
for i in files_in_path_lst:
aux = i[-4:]
name = i[:-4]
ext = ".jpg"
if aux == ext:
orig = Image.open(i)
width, height = orig.size
print(i, width, height)
newsize = (width//10, height//10) # width and heights reduced 10 times
orig = orig.resize(newsize)
raw = ImageOps.grayscale(orig)
ty = "raw "
fn = ty + name + ext
raw.save(fn)
Once that the code has run you should now have the same images but in a smaller size. Note that we made a new file for each image while the original ones remain unchanged.
We'll continue by grouping the images of the same class in different folders. This will make the following step easier to run and to understand.
Generating a dataset with images informationIts time to generate an excel file where we are going to store in one column the image location and in the second column image class name.
Images location and classes.py will go through the folders that are in the same path and record images location and its class according to its folder name. After it went through all the images in the same folder will generate an excel file called Image Classification database.xlsx.
import os
import pandas as pd
df = pd.DataFrame({'filename': [], 'category': []}) # Create a blank dataframe with two columns: image file name and the waste category
cd = os.getcwd()
folders = os.listdir(cd)
for fd in folders:
if (".py" not in fd and ".h5" not in fd and ".xlsx" not in fd):
folderpath = cd + "\\" + fd
files = os.listdir(folderpath)
for fl in files:
filepath = folderpath + "\\" + fl
df = df.append({'filename': filepath, 'category': fd}, ignore_index=True)
df = df.dropna()
print("Number of images = {}".format(df.size))
print (df['category'].value_counts()) # Quick look of the dataframe
with pd.ExcelWriter('Image Classification database.xlsx', mode='w') as writer:
df.to_excel(writer)
This excel file, along with the images, will be used to train the Machine Learning model.
In this step we are going to write our own code to train our own CNN model. I'm using as reference the following code that I picked from Kaggle (https://www.kaggle.com/code/roy2004/cnn-waste-classification-from-jpg-op-3)
Model summary:
- Goes through a dataset that has stored the path to the images and their category
- Definition and training of the model
- Comparison of the predicted class with the actual ones
- Saving the model in a .h5 file that we will use later on for running Grad-CAM
import os
from random import randint
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
from PIL import Image
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential # Importing models from Keras
from tensorflow.keras.callbacks import EarlyStopping, LearningRateScheduler
from tensorflow.keras.layers import Dense, InputLayer, Dropout, Conv1D, Conv2D, Flatten, Reshape, MaxPooling1D, MaxPooling2D, BatchNormalization, TimeDistributed
from tensorflow.keras.optimizers import Adam
df = pd.read_excel('Image Classification database.xlsx', index_col=0) # Import a table with location to specific images and its category
df['category'].value_counts().plot.bar()
print (df['category'].value_counts())
df_train=df.sample(frac=0.8,replace=False) # Randomly sample 80% of data from dataframe for training. Replace = False to prevent repeat sampling
df_valid=df.drop(df_train.index.values) # The rest 20% is the validation images
df_train['category'].value_counts().plot.bar()
print (df_train['category'].value_counts())
df_valid['category'].value_counts().plot.bar()
print(df_valid['category'].value_counts())
#Image.open(random.choice(df_train['filename'])).show()
FAST_RUN = False # True if you want to quick test you model (training for 3 epochs). False to have a full train (50 epochs).
epochs = 3 if FAST_RUN else 100
IMAGE_WIDTH = 400 # Enter the width and height of images
IMAGE_HEIGHT = 225
IMAGE_SIZE = (IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS = 1 # 3 if RGB. 1 if Grayscale
batch_size = 32
d = 0.1 # Dropout rate
# Use when training on pre-trained weights
START_EPOCH = 0 # if fresh train, enter 0
Transfer = False
Pretrained_Link = os.getcwd() + "/model.h5"
#rdm = randint(0,len(df_train['filename']))
#sample = df_train['filename'].iloc[rdm]
#pic = Image.open(sample)
#pic.show()
classes_values = ["background", "fork", "knife", "spoon" ]
classes = len(classes_values)
# Create Keras Sequential Model
model = Sequential()
model.add(Conv2D(32, kernel_size=3, activation='relu', kernel_constraint=tf.keras.constraints.MaxNorm(1), padding='same'))
model.add(MaxPooling2D(pool_size=2, strides=2, padding='same'))
model.add(Conv2D(16, kernel_size=3, activation='relu', kernel_constraint=tf.keras.constraints.MaxNorm(1), padding='same', name = "last_conv2d"))
model.add(MaxPooling2D(pool_size=2, strides=2, padding='same'))
model.add(Flatten())
model.add(Dropout(0.25))
model.add(Dense(classes, activation='softmax', name='y_pred'))
if Transfer:
model.load_weights(Pretrained_Link)
opt = Adam(learning_rate=0.0005, beta_1=0.9, beta_2=0.999)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
#model.summary()
earlystop = EarlyStopping(patience=10,restore_best_weights=True)
LR_START = .001 # Learning rate (LR) schedule for TPU, GPU and CPU
LR_MIN = 1e-6
LR_EXP_DECAY = .94
# Define a Learning Rate function on epoch that will decrease exponentially.
def lrfn(epoch):
lr = (LR_START - LR_MIN) * LR_EXP_DECAY ** (epoch + START_EPOCH) + LR_MIN
return lr
lr_callback = LearningRateScheduler(lrfn, verbose=True)
rng = [i for i in range(START_EPOCH, epochs + START_EPOCH)] # Visualize the change in learning rate
y = [lrfn(x) for x in rng]
#plt.plot(rng, y)
#plt.show()
print("Learning rate schedule: {:.3g} to {:.3g}".format(y[0], y[-1]))
total_train = df_train.shape[0] # Total number of images for training
total_validate = df_valid.shape[0] # Total number of images for validation
print("Training: {}, Validation: {}".format(total_train,total_validate))
train_datagen = ImageDataGenerator(rotation_range=15, rescale=1./255, shear_range=0.1, horizontal_flip=True, vertical_flip=True)
train_generator = train_datagen.flow_from_dataframe(df_train, "", x_col='filename', y_col='category', target_size=IMAGE_SIZE, class_mode='categorical', batch_size=batch_size, color_mode = "grayscale") # According to the dataframe, pull images one by one from image directory
validation_datagen = ImageDataGenerator(rescale=1./255) # Validation doesn't need much data Augmentation
validation_generator = validation_datagen.flow_from_dataframe(df_valid, "", x_col='filename', y_col='category', target_size=IMAGE_SIZE, class_mode='categorical', batch_size=batch_size, color_mode = "grayscale") # According to the dataframe, pull images one by one from image directory
history = model.fit(train_generator, batch_size=batch_size, epochs=epochs, validation_data=validation_generator, validation_steps=total_validate//batch_size, steps_per_epoch=total_train//batch_size, callbacks=[earlystop, lr_callback])
model.save("model.h5") # Save Model in h5 format (old one). New models are saved as SaveModel
#model.save("model_raw") # Save Model
test_df = df.sample(frac = 0.3) # Randomly select 30% of data
nb_samples = test_df.shape[0] # Number of testing samples
test_gen = ImageDataGenerator(rescale=1./255) # Test generator in the same fashion of the train/validation generators
test_generator = test_gen.flow_from_dataframe(test_df, "", x_col='filename', y_col='category', class_mode=None, target_size=IMAGE_SIZE, batch_size=batch_size, shuffle=False, color_mode = "grayscale")
predict = model.predict_generator(test_generator, steps=np.ceil(nb_samples/batch_size))
test_df['pred_category'] = np.argmax(predict, axis=-1)
label_map = dict((v,k) for k,v in train_generator.class_indices.items())
test_df['pred_category'] = test_df['pred_category'].replace(label_map)
test_df["background"] = predict[:, [0]]
test_df["fork"] = predict[:, [1]]
test_df["knife"] = predict[:, [2]]
test_df["spoon"] = predict[:, [3]]
submission_df = test_df.copy()
with pd.ExcelWriter('Summary.xlsx', mode='w') as writer:
submission_df.to_excel(writer)
Please note that IMAGE_WIDTH = 400, IMAGE_HEIGHT = 225 and IMAGE_CHANNELS = 1 are according to my image specifications (being IMAGE_CHANNELS 1 for grayscale. If you are using coloured images IMAGE_CHANNELS should be 3)
Once the model has finished its training (might take a while according to how long is your image dataset, image sizes, colours, batch size, learning rate, etc.) it should create an .h5 file that we need for running Grad-CAM.
In this case, CNN model performance is really poor (accuracy is around 50%). This might be explained given the small dataset and the size of the images that doesn't allow the model to quickly recognize patterns.
It's really interesting to mention than in a previous project where I used Edge Impulse Transfer Learning for the exact same image dataset models accuracy was above 90%. Thus, Transfer Learning has a huge positive impact in models performance.
Finally, we are ready to run Grad-CAM to understand how the CNN is reasoning.
I'll be using as guide the following code that I took from Github. It's important to mention that this Grad-CAM code only works for CNN projects and its not currently working for Transfer Learning projects created on Edge Impulse. That's why we are performing Grad-CAM in this project with low accuracy instead that running it on the better one.
import PIL
import cv2
import numpy as np
import tensorflow as tf
import os
from tensorflow import keras
from keras import activations, layers, models, backend
from skimage.transform import resize
import matplotlib.pyplot as plt
LABELS = ["background", "fork", "knife", "spoon"] # Labels
IMAGE_PATH = r"your\image\path" # Change this based on your image sample
TRUE_LABEL = "yourimageclass"
# If you wrote your own model should match your image size
# If you are importing from Edge Impulse go to Image resolution (Edge Impulse project > Impulse design > Image data)
WIDTH = 400
HEIGHT = 225
true_idx = LABELS.index(TRUE_LABEL) # Find index of true label in label list
model = tf.keras.models.load_model("model.h5") # Load model file
model.summary()
img = PIL.Image.open(IMAGE_PATH) # Load image
img = img.convert('L') # Convert the image to grayscale
img = np.asarray(img) # Convert the image to a Numpy array
img = resize(img, (WIDTH, HEIGHT, 1), anti_aliasing=True) # Resize the image and normalize the values (to be between 0.0 and 1.0)
print("Actual label:", TRUE_LABEL) # Show the ground-truth label
plt.imshow(img, cmap='gray', vmin=0.0, vmax=1.0) # Display image (make sure we're looking at the right thing)
plt.show()
# The Keras model expects images in a 4D array with dimensions (sample, height, width, channel)
img_0 = img.reshape(img.shape + (1,)) # Add extra dimension to the image (placeholder for color channels)
images = np.array([img_0]) # Keras expects more than one image (in Numpy array), so convert image(s) to such array
print(images.shape) # Print dimensions of inference input
preds = model.predict(images) # Inference
# Print out predictions
for i, pred in enumerate(preds[0]):
print(LABELS[i] + ": " + str(pred))
model.layers[-1].activation = None # For either algorithm, we need to remove the Softmax activation function of the last layer
# Based on: https://github.com/keisen/tf-keras-vis/blob/master/tf_keras_vis/saliency.py
def get_saliency_map(img_array, model, class_idx):
img_tensor = tf.convert_to_tensor(img_array) # Gradient calculation requires input to be a tensor
# Do a forward pass of model with image and track the computations on the "tape"
with tf.GradientTape(watch_accessed_variables=False, persistent=True) as tape:
tape.watch(img_tensor) # Compute (non-softmax) outputs of model with given image
outputs = model(img_tensor, training=False)
score = outputs[:, true_idx] # Get score (predicted value) of actual class
grads = tape.gradient(score, img_tensor) # Compute gradients of the loss with respect to the input image
grads_disp = [np.max(g, axis=-1) for g in grads] # Finds max value in each color channel of the gradient (should be grayscale for this demo)
grad_disp = grads_disp[0] # There should be only one gradient heatmap for this demo
grad_disp = tf.abs(grad_disp) # The absolute value of the gradient shows the effect of change at each pixel. Source: https://christophm.github.io/interpretable-ml-book/pixel-attribution.html
heatmap_min = np.min(grad_disp) # Normalize to between 0 and 1 (use epsilon, a very small float, to prevent divide-by-zero error)
heatmap_max = np.max(grad_disp)
heatmap = (grad_disp - heatmap_min) / (heatmap_max - heatmap_min + tf.keras.backend.epsilon())
return heatmap.numpy()
saliency_map = get_saliency_map(images, model, true_idx) # Generate saliency map for the given input image
plt.imshow(saliency_map, cmap='magma', vmin=0.0, vmax=1.0) # Draw map
plt.show()
idx = 0 # Overlay the saliency map on top of the original input image
ax = plt.subplot()
ax.imshow(images[idx,:,:,0], cmap='gray', vmin=0.0, vmax=1.0)
ax.imshow(saliency_map, cmap='magma', alpha=0.25)
plt.show()
### This function comes from https://keras.io/examples/vision/grad_cam/
def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
grad_model = tf.keras.models.Model([model.inputs], [model.get_layer(last_conv_layer_name).output, model.output]) # First, we create a model that maps the input image to the activations of the last conv layer as well as the output predictions
# Then, we compute the gradient of the top predicted class for our input image with respect to the activations of the last conv layer
with tf.GradientTape() as tape:
last_conv_layer_output, preds = grad_model(img_array)
if pred_index is None:
pred_index = tf.argmax(preds[0])
class_channel = preds[:, pred_index]
grads = tape.gradient(class_channel, last_conv_layer_output) # This is the gradient of the output neuron (top predicted or chosen) with regard to the output feature map of the last conv layer
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2)) # This is a vector where each entry is the mean intensity of the gradient over a specific feature map channel
last_conv_layer_output = last_conv_layer_output[0] # We multiply each channel in the feature map array by "how important this channel is" with regard to the top predicted class then sum all the channels to obtain the heatmap class activation
heatmap = last_conv_layer_output @ pooled_grads[..., tf.newaxis]
heatmap = tf.squeeze(heatmap)
heatmap = tf.abs(heatmap) # The absolute value of the gradient shows the effect of change at each pixel. Source: https://christophm.github.io/interpretable-ml-book/pixel-attribution.html
heatmap_min = np.min(heatmap) # Normalize to between 0 and 1 (use epsilon, a very small float, to prevent divide-by-zero error)
heatmap_max = np.max(heatmap)
heatmap = (heatmap - heatmap_min) / (heatmap_max - heatmap_min + tf.keras.backend.epsilon())
return heatmap.numpy()
# We need to tell Grad-CAM where to find the last convolution layer
#for layer in model.layers:
# print(layer, layer.name) # Print out the layers in the model
last_conv_layer = None # Go backwards through the model to find the last convolution layer
for layer in reversed(model.layers):
if 'conv' in layer.name:
last_conv_layer = layer.name
break
if last_conv_layer is not None:
print("Last convolution layer found:", last_conv_layer) # Give a warning if the last convolution layer could not be found
else:
print("ERROR: Last convolution layer could not be found. Do not continue.")
heatmap = make_gradcam_heatmap(images, model, last_conv_layer) # Generate class activation heatmap
plt.imshow(heatmap, cmap='magma', vmin=0.0, vmax=1.0) # Draw map
plt.show()
# Overlay the saliency map on top of the original input image
big_heatmap = cv2.resize(heatmap, dsize=(HEIGHT, WIDTH), interpolation=cv2.INTER_CUBIC) # The heatmap is a lot smaller than the original image, so we upsample it
idx = 0 # Draw original image with heatmap superimposed over it
ax = plt.subplot()
ax.imshow(images[idx,:,:,0], cmap='gray', vmin=0.0, vmax=1.0)
ax.imshow(big_heatmap, cmap='magma', alpha=0.25)
plt.show()
You'll need to modify IMAGE_PATH = r"your\image\path" to the image path that we are using for testing and TRUE_LABEL = "yourimageclass" to images class.
The same model should work for the Edge Impulse h5 that you could download from the Dashboard of the project.
Keep in mind that if your are using some Edge Impulse project you might need to change models name and image size in Grad-CAM for CNN model - h5.py
As we can see, models poor accuracy is also shown in how balanced the percentages are between the classes. Just one of the images has a confidence of above 90%.
In some images we can see that Grad-CAM gives importance to the shadow of my hands. Thus, fails to identify the object, and instead related the shadow to the class. This is a good example on how a biased dataset could lead to inaccurate model predictions in the real world.
With these samples and insights we now have a much deeper understanding about how the model is making its decisions. And allows us to re-train it.
If we re-train the model with new images taken from this learning we should see an improvement in the accuracy because we've helped it to make better segregations and better identification of the features that make each utensil what it is.
SummaryWe've been able to develop a CNN Machine Learning for image classification.
Then we took its .h5 file to shine some light over how does the model take its decisions by using Grad-CAM.
This better understanding about models structure could later help us to improve its performance by taking more pictures for the cases were it made wrong classifications.
Comments
Please log in or sign up to comment.