ssdlite_mobilenet_v2ssdlite + moiblenet v2 example using opencv

I recently find out that OpenCV have Deep Neural Network module which is purely CPU based and no GPU required. So I thought to give it a try. I searched online and come up with this Real time Object detection on Raspberry PI tutorial Which further remove my doubts on the DNN’s biggest nightmare of training on dataset. So I further explored and find out this MobileNet SSD Object Detection using OpenCV 3.4.1 DNN module tutorial and Now I was pretty clear that using a Deep learning is not that scary or heavy as I once thought it is. As soon as my fear of heavy resources gone, I directly dive in without any further thought.

Install OpenCV 4

First thing first, because I was using old Python2.7 version and very old version of the OpenCV so I find out I need to update my OpenCV version. I tried updating with pip but due to some reason I was not able to get newer version. I searched and find this stack-overflow link and updated OpenCV successfully with this command.

pip install opencv-contrib-python==4.1.0.25

Download Pretrained Model

Next thing for me was to look for some pretrained models. I find out that there are lot of available pre trained models with different kind of DNN architecture. Most popular one like YOLO, SDD, MobileNet, as well as Faster-RNN. But I was looking for some model which should be extremely small and light weight. So the obvious choice was MobileNet. Then during exploring the tensorlflow Model Zoo I found out SSDLite+MobileNet-V2 trained on COCO dataset[1]. So I downloaded this model along with other few models. Surprisingly this model was pretty light just 34MB file size at the time of download. But as soon as I extracted I got frustrated because, …

Generate missing PBTXT File

After reading the documentation and other few tutorials like [2], I was aware that I need two kind of files. One is the frozen file with extension “.pb” which I found in the downloaded model but the second file with the extension of “.pbtxt” file was missing. So I need to search a little bit more and I found out that although few models are already converted and tested with the OpenCV by its users that is the reason the files are available to download on OpenCV Github page. And to generate the related .pbtxt file I have to run few codes available on the OpenCV Github repo. So I downloaded these two files

tf_text_graph_ssd.py was required because I was using the SSD version. So I downloaded and pass the address of .ph and .config file in tf_text_graph_ssd.py and within a second the .pbtxt file was generated.

Finding Correct Classes Labels

Now before actually start testing the model, I came across this message from the previous tf_text_graph_ssd.py file. Which says that there are 90 classes in the model. I was little surprised and start looking for the classes labels for the COCO Dataset but find out only 80 labels for the dataset. It was quit confusing and frustrating.

Python IDLE logs for ssdlite+MobileNet V2 Object Detection using OpenCV
ssdlite+MobileNet V2 with OpenCV

Again, after a little bit of research over google, I came across this stackoverflow thread which was dealing with the same problem. I start looking for the root of tensorflow version of class labels and gladly find out it on tensorflow github repo as well. It was just the correct order with the missing classes. Because COCO released only 80 classes so few classes were omitted. The full labels list with correct order was listed on official COCO repository on Github. Now all left was to generate correct labels. Here is a complete 91 labels

COCO_labels = { 0: 'background',
    1: '"person"', 2: 'bicycle', 3: 'car', 4: 'motorcycle',
    5: 'airplane', 6: 'bus', 7: 'train', 8: 'truck', 9: 'boat',
    10: 'traffic light', 11: 'fire hydrant',12: 'street sign', 13: 'stop sign', 14: 'parking meter',
    15: 'zebra', 16: 'bird', 17: 'cat', 18: 'dog',19: 'horse',20: 'sheep',21: 'cow',22: 'elephant',
    23: 'bear', 24: 'zebra', 25: 'giraffe', 26: 'hat', 27: 'backpack', 28: 'umbrella',29: 'shoe',
    30: 'eye glasses', 31: 'handbag', 32: 'tie', 33: 'suitcase', 34: 'frisbee', 35: 'skis',
    36: 'snowboard', 37: 'sports ball', 38: 'kite', 39: 'baseball bat', 40: 'baseball glove',
    41: 'skateboard', 42: 'surfboard', 43: 'tennis racket', 44: 'bottle', 45: 'plate',
    46: 'wine glass', 47: 'cup', 48: 'fork', 49: 'knife', 50: 'spoon', 51: 'bowl', 52: 'banana',
    53: 'apple', 54: 'sandwich', 55: 'orange', 56: 'broccoli', 57: 'carrot', 58: 'hot dog', 59: 'pizza',
    60: 'donut', 61: 'cake', 62: 'chair', 63: 'couch', 64: 'potted plant', 65: 'bed', 66: 'mirror',
    67: 'dining table', 68: 'window', 69: 'desk', 70: 'toilet', 71: 'door', 72: 'tv', 73: 'laptop',
    74: 'mouse', 75: 'remote', 76: 'keyboard', 78: 'microwave', 79: 'oven', 80: 'toaster', 81: 'sink',
    82: 'refrigerator', 83: 'blender', 84: 'book', 85: 'clock', 86: 'vase', 87: 'scissors',
    88: 'teddy bear', 89: 'hair drier', 90: 'toothbrush', 91: 'hair brush'}Code language: JavaScript (javascript)

and here is a list after removal but keeping the label ID

COCO_labels = { 0: 'background',
    1: '"person"', 2: 'bicycle', 3: 'car', 4: 'motorcycle',
    5: 'airplane', 6: 'bus', 7: 'train', 8: 'truck', 9: 'boat',
    10: 'traffic light', 11: 'fire hydrant', 13: 'stop sign', 14: 'parking meter',
    15: 'zebra', 16: 'bird', 17: 'cat', 18: 'dog',19: 'horse',20: 'sheep',21: 'cow',22: 'elephant',
    23: 'bear', 24: 'zebra', 25: 'giraffe', 27: 'backpack', 28: 'umbrella',
    31: 'handbag', 32: 'tie', 33: 'suitcase', 34: 'frisbee', 35: 'skis',
    36: 'snowboard', 37: 'sports ball', 38: 'kite', 39: 'baseball bat', 40: 'baseball glove',
    41: 'skateboard', 42: 'surfboard', 43: 'tennis racket', 44: 'bottle', 
    46: 'wine glass', 47: 'cup', 48: 'fork', 49: 'knife', 50: 'spoon', 51: 'bowl', 52: 'banana',
    53: 'apple', 54: 'sandwich', 55: 'orange', 56: 'broccoli', 57: 'carrot', 58: 'hot dog', 59: 'pizza',
    60: 'donut', 61: 'cake', 62: 'chair', 63: 'couch', 64: 'potted plant', 65: 'bed', 
    67: 'dining table',70: 'toilet', 72: 'tv', 73: 'laptop',
    74: 'mouse', 75: 'remote', 76: 'keyboard', 78: 'microwave', 79: 'oven', 80: 'toaster', 81: 'sink',
    82: 'refrigerator',84: 'book', 85: 'clock', 86: 'vase', 87: 'scissors',
    88: 'teddy bear', 89: 'hair drier', 90: 'toothbrush' }Code language: JavaScript (javascript)

Load and Run DNN

Now loading the pre-trained model and running a DNN is very simple in OpenCV. It is just a 3 step process. Load the model using cv2.dnn.readNetFromTensorflow funtion. The reason for the word TENSORFLOW here, is because we are using the model which is trained with the tensorflow framwork. Second step is to create a blob from your input image. Make sure to blob 300×300 because of the training. finally pass that blob as input to the model using model.setInput() method and model.forward() will do a forward pass and return the detection.

All we have to do is to call following code

import cv2

model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssdlite_mobilenet_v2_coco_2018_05_09.pbtxt')

image = cv2.imread("test6.jpg")
image = cv2.resize(image,(300,300)) # resize frame for prediction

model.setInput(cv2.dnn.blobFromImage(image, size=(300, 300), swapRB=True))
output = model.forward()Code language: PHP (php)

Parse the result

Now that we have our detection performed in output variable, we can parse simply by a for loop like this[2]

for detection in output[0, 0, :, :]:
    confidence = detection[2]
    if confidence > .5:
        class_id = detection[1]
        print(str(str(class_id) + " " + str(detection[2])  + " " + id_class_name(class_id,COCO_labels)))Code language: PHP (php)

This code will give result according to your image detection.

Draw bounding boxes and label Detected Objects

Finally we need to draw bounding boxes around our detected objects and put a label to show which bounding box belong to which class. So we can draw the bounding box with following code.

        x=detection[3]*300
        y=detection[4]*300
        w=detection[5]*300
        h=detection[6]*300
        cv2.rectangle(image, (int(x), int(y)), (int(w), int(h)), (0, 255, 0), thickness=5)

The reason for a constant 300 multiplied with [x,y,w,h] is because our image is the size of 300×300. If that not be the case then we need to multiply x and w with width and y and h with image height.

Final Code:

Here is the complete code used for this tutorial

import cv2
import numpy as np


def getClassLabel(class_id, classes):
    for key,value in classes.items():
        if class_id == key:
            return value

COCO_labels = { 0: 'background',
    1: '"person"', 2: 'bicycle', 3: 'car', 4: 'motorcycle',
    5: 'airplane', 6: 'bus', 7: 'train', 8: 'truck', 9: 'boat',
    10: 'traffic light', 11: 'fire hydrant', 13: 'stop sign', 14: 'parking meter',
    15: 'zebra', 16: 'bird', 17: 'cat', 18: 'dog',19: 'horse',20: 'sheep',21: 'cow',22: 'elephant',
    23: 'bear', 24: 'zebra', 25: 'giraffe', 27: 'backpack', 28: 'umbrella',
    31: 'handbag', 32: 'tie', 33: 'suitcase', 34: 'frisbee', 35: 'skis',
    36: 'snowboard', 37: 'sports ball', 38: 'kite', 39: 'baseball bat', 40: 'baseball glove',
    41: 'skateboard', 42: 'surfboard', 43: 'tennis racket', 44: 'bottle', 
    46: 'wine glass', 47: 'cup', 48: 'fork', 49: 'knife', 50: 'spoon', 51: 'bowl', 52: 'banana',
    53: 'apple', 54: 'sandwich', 55: 'orange', 56: 'broccoli', 57: 'carrot', 58: 'hot dog', 59: 'pizza',
    60: 'donut', 61: 'cake', 62: 'chair', 63: 'couch', 64: 'potted plant', 65: 'bed', 
    67: 'dining table',70: 'toilet', 72: 'tv', 73: 'laptop',
    74: 'mouse', 75: 'remote', 76: 'keyboard', 78: 'microwave', 79: 'oven', 80: 'toaster', 81: 'sink',
    82: 'refrigerator',84: 'book', 85: 'clock', 86: 'vase', 87: 'scissors',
    88: 'teddy bear', 89: 'hair drier', 90: 'toothbrush' }


model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssdlite_mobilenet_v2_coco_2018_05_09.pbtxt')



image = cv2.imread("test3.jpg")
im_h, im_w, _ = image.shape
#image = cv2.resize(image,(300,300)) # resize frame for prediction



model.setInput(cv2.dnn.blobFromImage(image, size=(300, 300), swapRB=True))
output = model.forward()

for detection in output[0, 0, :, :]:
    confidence = detection[2]
    if confidence > .5:
        class_id = detection[1]
        class_label = getClassLabel(class_id,COCO_labels)
        x=int(detection[3]*im_w)
        y=int(detection[4]*im_h)
        w=int(detection[5]*im_w)
        h=int(detection[6]*im_h)
        cv2.rectangle(image, (x,y,w,h), (0, 255, 0), thickness=5)
        cv2.putText(image,class_label ,(x,y+25),cv2.FONT_HERSHEY_SIMPLEX,1,(255, 0, 255),3,cv2.LINE_AA)
        print(str(str(class_id) + " " + str(detection[2])  + " " + class_label))


cv2.imshow('image',image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Code language: PHP (php)

References:

[1]T.-Y. Lin et al., “Microsoft COCO: Common Objects in Context.”

[2] R. Deep, “Real-Time Object Detection on Raspberry Pi Using OpenCV DNN,” Medium, 23-Oct-2018. [Online]. Available: https://heartbeat.fritz.ai/real-time-object-detection-on-raspberry-pi-using-opencv-dnn-98827255fa60. [Accessed: 16-Apr-2020].‌

By Abdul Rehman

My name is Abdul Rehman and I love to do Reasearch in Embedded Systems, Artificial Intelligence, Computer Vision and Engineering related fields. With 10+ years of experience in Research and Development field in Embedded systems I touched lot of technologies including Web development, and Mobile Application development. Now with the help of Social Presence, I like to share my knowledge and to document everything I learned and still learning.

One thought on “SSDLite + MobileNet Object Detection with OpenCV DNN”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.