top of page
  • Writer's pictureAlibek Jakupov

Azure Custom Vision : Object detection for retraining

Updated: Nov 19, 2021

In one of our previous articles we discussed the necessity of limiting the region being analyzed for a better performance. For instance, if we classify gloves, better limit the region of interest to hands, if we classify glasses, then use head or even eyes region. There are plenty of built-in OpenCV facilities allowing to detect some common object, like face, profile or eyes.

But what if we had an usual object, which is already present in our labeled training set, and there is no way to crop it automatically. In my case, I had this object and about 3k of training images that have already been manually labelled. I tried defining a common position and then cropping some particular cases manually, or extracting a background, but I had either to create a deep-learning model for that or use mask, both of the solutions were just too complicated for me. Finally, I've decided to train a simple object detection using Azure Custom Vision without any particular expectation and surprisingly it worked out.

Consequently, I find it important to the code with you, as there may another rookie developer like me who is struggling to set his computer vision solution up and running. There is really no deep stuff, only some basic (but reusable) code snippets that I may use in your projects. Up we go!

Train object detector.

This is a relatively simple part to do. You can do everything using Azure's UI, and then launch training. In short, you need to:

  1. Create Custom Vision resources

  2. Create a new project

  3. Select Object Detection under Project Types

  4. Choose training images

  5. Upload and tag images

  6. Click and drag a rectangle around the object in your image

  7. Train the detector

Use the object detector

The main trick for me was to parse the JSON output correctly, and I've spent a certain amount of time looking for a clear documentation. So here you are, a completely re-usable code snippet that you can copy and paste, wherever you want to. The only thing you need to provide is the Endpoint and prediction key, but it should not be a challenge.

import os
import cv2
from import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

INPUT_FOLDER = "spoons"
INPUT_SUB_FOLDERS = ["silver_spoon"]
OUTPUT_FOLDER = "output//silver_spoon"
WHITE = (255255255)
prediction_key = "<your-prediction-key>"
ENDPOINT = "https://<resource-name>"
project_id = "<your-project-id>"
publish_iteration_name = "<your-iteration>"

# Now there is a trained endpoint that can be used to make a prediction
prediction_credentials = ApiKeyCredentials(
    in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(ENDPOINT, prediction_credentials)

for sub_folder in INPUT_SUB_FOLDERS:
 for image_file in os.listdir(os.path.join(INPUT_FOLDER, sub_folder)):
        full_path = os.path.join(INPUT_FOLDER, sub_folder, image_file)
        frame = cv2.imread(full_path)
        results = predictor.detect_image_with_no_store(project_id, publish_iteration_name, frame)
 for prediction in results.predictions:
 if prediction.probability > 0.9:
                x = int(prediction.bounding_box.left * frame.shape[1])
                y = int( * frame.shape[0])

                w = x + int(prediction.bounding_box.width * frame.shape[1])
                h = y + \
                    int(prediction.bounding_box.height * frame.shape[0])

                print(x, y, w, h)
                cropped_hand = frame[y:y+h, x:x+w]
 # to not erase previously saved photos counter (image name) = number of photos in a folder + 1
                image_counter = len([name for name in os.listdir(OUTPUT_FOLDER)
 if os.path.isfile(os.path.join(OUTPUT_FOLDER, name))])

 # save image to the dedicated folder (folder name = label)
                cv2.imwrite(OUTPUT_FOLDER + '/' +
                            str(image_counter) + '.png', cropped_hand)
 # increment image counter
                image_counter += 1

The logic is quite simple.

We've got a spoons database, containg silver and golden spoons. But the images are too large thus it is impossible to classify them correctly, so it makes sense to get only spoons by keeping their tag and put them in some output folder. So we need to launch the script twice, by replacing the input and output folder to golden_spoon.

With this cropped image you may relaunch your classification experiment and the result will be much better.

Hope it will be useful.

68 views0 comments

Recent Posts

See All
Post: Blog2_Post
bottom of page