Azure Custom Vision : Object detection for retraining

Alibek Jakupov
Oct 17, 2020
2 min read

Updated: Nov 19, 2021

In one of our previous articles we discussed the necessity of limiting the region being analyzed for a better performance. For instance, if we classify gloves, better limit the region of interest to hands, if we classify glasses, then use head or even eyes region. There are plenty of built-in OpenCV facilities allowing to detect some common object, like face, profile or eyes.

But what if we had an usual object, which is already present in our labeled training set, and there is no way to crop it automatically. In my case, I had this object and about 3k of training images that have already been manually labelled. I tried defining a common position and then cropping some particular cases manually, or extracting a background, but I had either to create a deep-learning model for that or use mask, both of the solutions were just too complicated for me. Finally, I've decided to train a simple object detection using Azure Custom Vision without any particular expectation and surprisingly it worked out.

Consequently, I find it important to the code with you, as there may another rookie developer like me who is struggling to set his computer vision solution up and running. There is really no deep stuff, only some basic (but reusable) code snippets that I may use in your projects. Up we go!

Train object detector.

This is a relatively simple part to do. You can do everything using Azure's UI, and then launch training. In short, you need to:

Create Custom Vision resources
Create a new project
Select Object Detection under Project Types
Choose training images
Upload and tag images
Click and drag a rectangle around the object in your image
Train the detector

Use the object detector

The main trick for me was to parse the JSON output correctly, and I've spent a certain amount of time looking for a clear documentation. So here you are, a completely re-usable code snippet that you can copy and paste, wherever you want to. The only thing you need to provide is the Endpoint and prediction key, but it should not be a challenge.

import os
import cv2
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

INPUT_FOLDER = "spoons"
INPUT_SUB_FOLDERS = ["silver_spoon"]
OUTPUT_FOLDER = "output//silver_spoon"
WINDOW_NAME = 'Spoon'
WHITE = (255, 255, 255)
prediction_key = "<your-prediction-key>"
ENDPOINT = "https://<resource-name>.cognitiveservices.azure.com/"
project_id = "<your-project-id>"
publish_iteration_name = "<your-iteration>"

# Now there is a trained endpoint that can be used to make a prediction
prediction_credentials = ApiKeyCredentials(
    in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(ENDPOINT, prediction_credentials)

for sub_folder in INPUT_SUB_FOLDERS:
 for image_file in os.listdir(os.path.join(INPUT_FOLDER, sub_folder)):
        full_path = os.path.join(INPUT_FOLDER, sub_folder, image_file)
        print(full_path)
        frame = cv2.imread(full_path)
 
        results = predictor.detect_image_with_no_store(project_id, publish_iteration_name, frame)
 
 for prediction in results.predictions:
 if prediction.probability > 0.9:
                x = int(prediction.bounding_box.left * frame.shape[1])
                y = int(prediction.bounding_box.top * frame.shape[0])

                w = x + int(prediction.bounding_box.width * frame.shape[1])
                h = y + \
                    int(prediction.bounding_box.height * frame.shape[0])

                print(x, y, w, h)
 
                cropped_hand = frame[y:y+h, x:x+w]
 # to not erase previously saved photos counter (image name) = number of photos in a folder + 1
                image_counter = len([name for name in os.listdir(OUTPUT_FOLDER)
 if os.path.isfile(os.path.join(OUTPUT_FOLDER, name))])

 # save image to the dedicated folder (folder name = label)
                cv2.imwrite(OUTPUT_FOLDER + '/' +
                            str(image_counter) + '.png', cropped_hand)
 # increment image counter
                image_counter += 1

The logic is quite simple.

We've got a spoons database, containg silver and golden spoons. But the images are too large thus it is impossible to classify them correctly, so it makes sense to get only spoons by keeping their tag and put them in some output folder. So we need to launch the script twice, by replacing the input and output folder to golden_spoon.

With this cropped image you may relaunch your classification experiment and the result will be much better.

Hope it will be useful.

Azure Custom Vision : Object detection for retraining

Train object detector.

Recent Posts

Comments