Important announcement: Missinglink has shut down. Click here to learn more.
Making the Future Clearer
Don’t Think Twice
Methods, Best Practices, Applications
Methods and Applications
OpenCV (Open Source Computer Vision Library) is an open source library that helps users realize computer vision tasks. Released in 2017, version 3.3 of OpenCV surpasses its deep neural network module. Today, organizations running Convolutional Neural Network (CNN) and other neural network-based computer vision architectures, are using OpenCV.
This article provides a description of OpenCV, and the deep learning module in OpenCV. We’ll also outline the role of OpenCV in deep learning for computer vision, walk you through the OpenCV deep learning execution process, including step-by-step OpenCV Tutorials.
OpenCV (Open Source Computer Vision Library) is an open source library used to perform computer vision tasks. It offers over 2500 computer vision algorithms, including classic statistical algorithms and modern machine learning-based techniques, including neural networks. OpenCV boasts a community of almost 50,000 developers and over 18 millions downloads.
OpenCV is used by huge companies like Google, Yahoo, Microsoft and Intel, research bodies, governments and also startups and individual users. While it used to be difficult to learn and use, usability and documentation are gradually improving.
OpenCV applications include:
Systems, languages and frameworks supported
In 2017, OpenCV released version 3.3 and overhauled its Deep Neural Network Module, and OpenCV is now widely used to run Convolutional Neural Network (CNN) and other neural network-based computer vision architectures.
Let’s clarify the role of OpenCV in a deep learning computer vision project:
OpenCV deep learning execution process:
In the remainder of this article, we’ll summarize two excellent tutorials that will help you learn to use OpenCV with deep neural networks, using the Faster R-CNN and classic CNN architectures.
Mask R-CNN is an extension of the Faster R-CNN architecture (see our in-depth guide on using Faster R-CNN with TensorFlow ). It works by identifying Regions of Interest (ROI) within an image and then focusing the classification process on those regions. This is a deep learning image segmentation technique.
The Mask R-CNN process is as follows:
The following steps are summarized—see the full tutorial by Adrian Rosebrock. The tutorial uses OpenCV and Mask R-CNN to classify objects within images, using the COCO dataset with 90 image classes.
Prerequisite: Before following this and the other tutorials, install OpenCV on your workstation.
Load the COCO dataset class labels that we used to train the Mask R-CNN network (remember, networks used in OpenCV must be pre-trained).
labelsPath = os.path.sep.join([args["mask_rcnn"], "object_detection_classes_coco.txt"]) LABELS = open(labelsPath).read().strip().split("\n")
Obtain the Mask R-CNN weights obtained during training, and model configuration, and load the model:
weightsPath = os.path.sep.join([args["mask_rcnn"], "frozen_inference_graph.pb"]) configPath = os.path.sep.join([args["mask_rcnn"], "mask_rcnn_inception_v2_coco_2018_01_28.pbtxt"]) net = cv2.dnn.readNetFromTensorflow(weightsPath, configPath)
Load an image into a blob and do a pass through the neural network. Run ROI alignment and get the bounding box coordinates, then for each object in the image, perform pixel-wise image segmentation.
image = cv2.imread(args["image"]) (H, W) = image.shape[:2] blob = cv2.dnn.blobFromImage(image, swapRB=True, crop=False) net.setInput(blob) start = time.time() (boxes, masks) = net.forward(["detection_out_final", "detection_masks"]) end = time.time()
For each object extracted in the ROI align stage, get the predicted class, and if confidence is high enough, compute the bounding box coordinates relative to the size of the image.
for i in range(0, boxes.shape[2]): classID = int(boxes[0, 0, i, 1]) confidence = boxes[0, 0, i, 2] if confidence > args["confidence"]: clone = image.copy() box = boxes[0, 0, i, 3:7] * np.array([W, H, W, H]) (startX, startY, endX, endY) = box.astype("int") boxW = endX - startX boxH = endY - startY
Now convert the mask from a boolean to an integer with values 0-255, and show the extracted ROI with its mask:
if args["visualize"] > 0: visMask = (mask * 255).astype("uint8") instance = cv2.bitwise_and(roi, roi, mask=visMask) cv2.imshow("ROI", roi) cv2.imshow("Mask", visMask) cv2.imshow("Segmented", instance)
Extract only the masked region of the ROI, randomly select a color to visualize it, and create a transparent overlay:
roi = roi[mask] color = random.choice(COLORS) blended = ((0.4 * color) + (0.6 * roi)).astype("uint8") clone[startY:endY, startX:endX][mask] = blended
Finally, draw the bounding box on the image, together with the predicted label and probability, and show the output image:
color = [int(c) for c in color] cv2.rectangle(clone, (startX, startY), (endX, endY), color, 2) text = "{}: {:.4f}".format(LABELS[classID], confidence) cv2.putText(clone, text, (startX, startY - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2) cv2.imshow("Output", clone) cv2.waitKey(0)
Here is a commands you can use to execute the OpenCV code above and generate a visualization of the image:
$ python mask_rcnn.py --mask-rcnn mask-rcnn-coco --image images/example_01.jpg
An example of the output:
See the full tutorial for all the code and details on how to do the same thing for video streams in OpenCV.
This tutorial uses a classic Convolutional Neural Network to classify an image of letters in sign language, from the MNIST dataset. OpenCV will be used to apply the pre-trained CNN. We use Google Colab as the deep learning environment.
Test data will be live streaming video from a webcam – our model will identify letters in sign language based on live footage.
These steps are summarized—see the full tutorial by Arshad Kazi.
Download the MNIST sign language dataset here, load it into Colab, and visualize some of the image:
from keras.datasets import mnist (X_train, Y_train) , (X_test , Y_test) = mnist.load_data() display(X_train.info()) display(X_test.info()) display(X_train.head(n = 2)) display(X_test.head(n = 2))
Create an image using the train_X and test_X pixel values. Divide the array into 28×28 pixel groups.
X_train = np.array(X_train.iloc[:,:]) X_train = np.array([np.reshape(i, (28,28)) for i in X_train]) X_test = np.array(X_test.iloc[:,:]) X_test = np.array([np.reshape(i, (28,28)) for i in X_test]) num_classes = 26 y_train = np.array(y_train).reshape(-1) y_test = np.array(y_test).reshape(-1) y_train = np.eye(num_classes)[y_train] y_test = np.eye(num_classes)[y_test] X_train = X_train.reshape((27455, 28, 28, 1)) X_test = X_test.reshape((7172, 28, 28, 1))
We use a CNN model with two Conv2D and MaxPooling layers, followed by fully connected layers. Define the model in Keras, compile it and check accuracy.
classifier = Sequential() classifier.add(Conv2D(filters=8, kernel_size=(3,3),strides=(1,1),padding='same',input_shape=(28,28,1),activation='relu', data_format='channels_last')) classifier.add(MaxPooling2D(pool_size=(2,2))) classifier.add(Conv2D(filters=16, kernel_size=(3,3),strides=(1,1),padding='same',activation='relu')) classifier.add(Dropout(0.5)) classifier.add(MaxPooling2D(pool_size=(4,4))) classifier.add(Dense(128, activation='relu')) classifier.add(Flatten()) classifier.add(Dense(26, activation='softmax')) classifier.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy']) classifier.fit(X_train, y_train, epochs=50, batch_size=100) accuracy = classifier.evaluate(x=X_test,y=y_test,batch_size=32) print("Accuracy: ",accuracy[1])
Use these commands to download the trained model to your computer.
classifier.save('CNNmodel.h5') weights_file = drive.CreateFile({'title' : 'CNNmodel.h5'}) weights_file.SetContentFile('CNNmodel.h5') weights_file.Upload() drive.CreateFile({'id': weights_file.get('id')})
Create a “window” in OpenCV to take the input from our webcam. The input should be converted to 28×28 grayscale, because this is how we trained our model.
Here is how to capture the image from the webcam:
def main(): while True: cam_capture = cv2.VideoCapture(0) _, image_frame = cam_capture.read()
Crop, convert to grayscale, blur and resize:
im2 = crop_image(image_frame, 300,300,300,300) image_grayscale = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY) image_grayscale_blurred =cv2.GaussianBlur(image_grayscale, (15,15), 0) im3 = cv2.resize(image_grayscale_blurred, (28,28), interpolation = cv2.INTER_AREA)
Expand dimensions to 1x28x28x1:
im4 = np.resize(im3, (28, 28, 1)) im5 = np.expand_dims(im4, axis=0)
To predict an alphabet letter from an input image, we’ll use integers rather than alphabet letters (1 = A, 2 = B, etc).
Pass the input image into the classifier:
def keras_predict(model, image): data = np.asarray( image, dtype="int32" ) pred_probab = model.predict(data)[0]
Use Softmax to obtain a probability for each alphabet letter, and select the letter with higher probability:
pred_class = list(pred_probab).index(max(pred_probab)) return max(pred_probab), pred_class
That’s it! Your model can now be used to read sign language in live video footage.
In this article we explained how to use OpenCV to run pre-trained deep learning algorithms, specifically Convolutional Neural Networks (CNN) on image and live video footage, using the OpenCV frameworks.
When you start working on computer vision projects, processing and generating predictions for real images, audio and video, you’ll run into some practical challenges:
Tracking experiment progress, source code, and hyperparameters across multiple computer vision experiments. CNNs can have many different architectures and modifications. Testing each variation will require running and tracking large numbers of experiments.
Running experiments across multiple machines—computer vision algorithms are computationally intensive to train, and also to apply to large numbers of images using OpenCV. Most projects will require multiple machines or GPU hardware. Provisioning these machines and distributing experiments efficiently can be difficult.
Manage training data—OpenCV projects usually involve live video, and training sets can get huge, up to Gigabytes or Petabytes of data. Copying this data to training machines and replacing it each time as you tweak your dataset and neural network can be very time consuming.
MissingLink is a deep learning platform that can help you automate these operational aspects of neural networks, so you can concentrate on building winning experiments and running them with OpenCV.
Learn more about the MissingLink platform.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.
Request your personal demo to start training models faster