top of page
  • Writer's pictureAlibek Jakupov

Azure Custom Vision: Run multiple models simultaneously in real time

Updated: Nov 19, 2021

Azure Custom Vision API offers an awesome possibility to train your own classifier using only several images, due to the hardest wizarding ( a.k.a transfer learning), that allows us to build upon the features and concept that were learned during the training of the base model, in other words cut off the final dense layer that is responsible for predicting the class labels of the original base model and replace it by a new dense layer that will predict the class labels of our new task at hand.

TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.[4] It is used for both research and production at Google.‍TensorFlow was developed by the Google Brain team for internal Google use. It was released under the Apache License 2.0 on November 9, 2015

Quote from Wikipedia

Tensorflow is one of the most popular machine learning libraries that allow implementing powerful AI solutions, from training to usage. It provides stable Python and C APIs. TensorFlow computations are expressed as stateful dataflow graphs.TensorFlow computations are expressed as stateful dataflow graphs and in most of the cases developers use one graph per session. However, what if we want to use multiple models in a single application without losing in performance ? In this article a reader will see some tips and tricks that he may find useful in implementing edge solutions. Up we go!


First of all let's have a look on a classical way of using Tensorflow. A typical scenario of using pre-trained models with Tensorflow is the following:

1. Initialize a graph of operations

to upload trained model

import tensorflow as tf
# graph of operations to upload trained model
graph_def = tf.compat.v1.GraphDef()

2. Import tensor flow graph

for instance from a binary file

# import tensor flow graph, r+b mode is open the binary file in read or write mode
with'skin_model.pb', mode='rb'as f:
    tf.import_graph_def(graph_def=graph_def, name='')

3. Define output layer, input node and predicted tag

output_layer = 'loss:0'
input_node = 'Placeholder:0'
predicted_tag = 'Predicted Tag'

4. Initialize a session

with tf.compat.v1.Session() as sess:
    prob_tensor = sess.graph.get_tensor_by_name(output_layer)

5. Get the input size of the model

Still inside the same "with"

# Get the input size of the model
input_tensor_shape = sess.graph.get_tensor_by_name(input_node).shape.as_list()
network_input_size = input_tensor_shape[1]

6. Run session

get the prediction with its probability

predictions =, {input_node: [augmented_image]})
# get the highest probability label
highest_probability_index = np.argmax(predictions)

If you have already tried using multiple graphs with the same session you may have probably seen several stackoverflow posts, like this one, where the user was wondering how to run multiple graphs in a Session using Tensorflow API.

In most of the cases the response is :

Each Session can only have a single Graph

The problem is that we do not want to close the session as if our application makes predictions in real time re-opening the session each time may be very costful from the point of view of general performance. However, if we try creating multiple sessions and run predict(), all the probabilities are the same.

But, where there is a will, there is a way and in today's article we are going to train two different models using Azure Custom Vision, export it to tensorflow and run them in the same application simultaneously.

Train Azure Custom Vision Models

At this stage there should be no problem. Simply upload your images, tag them and launch the training. Important: before starting your project be sure to make it ‘exportable’, i.e. select compact option.

Generate tensor flow model for each project

This step should not be too complicated. After the training ends (usually it’s a matter of few seconds) go to Performance tab and click on Export button. In the dialog menu choose Tensor Flow and download it, as we are going to use it in our Python application. After downloading our models, you should rename to easily distinguish them from each other. In this experiment we will simply call them "first", "second" and "third"

And here is the tip: we will use tf.graph_util.import_graph_def. According to the documentation:

name: (Optional.) A prefix that will be prepended to the names in graph_def. Note that this does not apply to imported function names. Defaults to "import".

Thus by adding this prefix, it is possible to distinguish different sessions.

Add some code

So, first we define the model parameters : classes, input .pb files, etc. Important: as we trained our models using Azure Custom Vision, they have the same configurations, thus the network input size is the same for all of them.

# List of classes
first_LABELS = ['f_class1''f_class2']
second_LABELS = ['s_class1''s_class2']
third_LABELS = [t_class1''t_class2']


# Exported models
first_MODEL_FILENAME = './src/ai/models/first_model.pb'
second_MODEL_FILENAME = './src/ai/models/second_model.pb'
third_MODEL_FILENAME = './src/ai/models/third_model.pb'

Now initialize graphs and sessions by adding corresponding prefixes:

first_graph_def = tf.compat.v1.GraphDef()
second_graph_def = tf.compat.v1.GraphDef()
third_graph_def = tf.compat.v1.GraphDef()

# Import the TF graph : first
first_file =, 'rb')
first_graph = tf.import_graph_def(first_graph_def, name='first')

# Import the TF graph : second
second_file =, 'rb')
second_graph = tf.import_graph_def(second_graph_def, name='second')

# Import the TF graph : third
third_file =, 'rb')
third_graph = tf.import_graph_def(third_graph_def, name='third')

# These names are part of the model and cannot be changed.
first_output_layer = 'first/loss:0'
first_input_node = 'first/Placeholder:0'

second_output_layer = 'second/loss:0'
second_input_node = 'second/Placeholder:0'

third_output_layer = 'third/loss:0'
third_input_node = 'third/Placeholder:0'

# initialize probability tensor
first_sess = tf.compat.v1.Session(graph=first_graph)
first_prob_tensor = first_sess.graph.get_tensor_by_name(first_output_layer)

second_sess = tf.compat.v1.Session(graph=second_graph)
second_prob_tensor = second_sess.graph.get_tensor_by_name(second_output_layer)

third_sess = tf.compat.v1.Session(graph=third_graph)
third_prob_tensor = third_sess.graph.get_tensor_by_name(third_output_layer)

And now predict (inside you while loop or inside "get" routine if you are developing a Flask server).

first_predictions, =
    first_prob_tensor, {first_input_node: [adapted_image]})
first_highest_probability_index = np.argmax(first_predictions)

second_predictions, =
    second_prob_tensor, {second_input_node: [adapted_image]})
second_highest_probability_index = np.argmax(second_predictions)

third_predictions, =
    third_prob_tensor, {third_input_node: [adapted_image]})
third_highest_probability_index = np.argmax(third_predictions)

Graph/Session initialization takes 3-5 seconds per model, and if you run predictiosn inside a loop, the process takes 200-300 miliseconds per model.


Hope this was useful. Enjoy!

272 views5 comments

5 comentários

21 de abr. de 2021

Here's my entire code, pls help.

def Initialize():

ageCategories = []

genders = []

# These are set to the default names from exported models, update as needed.

ageModelFilename = "models/age/model.pb"

ageLabelsFilename = "models/age/labels.txt"

genderModelFilename = "models/gender/model.pb"

genderLabelsFilename = "models/gender/labels.txt"

# Create a list of labels.

with open(ageLabelsFilename, 'rt') as lf:

for l in lf:


with open(genderLabelsFilename, 'rt') as lf:

for l in lf:


ageGraphDef = tf.compat.v1.GraphDef()

genderGraphDef = tf.compat.v1.GraphDef()

# Import the TF graph


with, 'rb') as f:


ageGraph = tf.import_graph_def(ageGraphDef, name='age')

with, 'rb') as f:


genderGraph = tf.import_graph_def(genderGraphDef, name='gender')

except Exception as e:

logger.LogError('Failed to load weights')

# # Get the input size of the model

with tf.compat.v1.Session(graph=ageGraph) as ageSess:


21 de abr. de 2021
Respondendo a

Hi, at:

ageSess.graph.get_tensor_by_name('Placeholder:0') your are missing your prefix. Your input node should be like:



Same for gender.

The rest seems correct, so give it a try!


20 de abr. de 2021

In my case age is first model and gender is second model.

Getting error in this line:

ageProbTensor = ageSession.graph.get_tensor_by_name(ageOutputLayer)

Error: KeyError: "The name 'age/loss:0' refers to a Tensor which does not exist. The operation, 'age/loss', does not exist in the graph."

Any help would be appreciated, thanks.

21 de abr. de 2021
Respondendo a


ageGraphDef = tf.compat.v1.GraphDef()

genderGraphDef = tf.compat.v1.GraphDef()

# Import the TF graph


with, 'rb') as f:


ageGraph = tf.import_graph_def(ageGraphDef, name='age')

with, 'rb') as f:


genderGraph = tf.import_graph_def(genderGraphDef, name='gender')

bottom of page