Important announcement: Missinglink has shut down. Click here to learn more.

Neural Network Concepts Cover

Computer Vision

6 Simple Steps to Build Your Own Computer Vision Models with Python

Does it have to be so difficult to build your own computer vision models?

Computer vision is a mix of programming, modeling and mathematics and is sometimes difficult to grasp. Python is a language very commonly used for AI and machine learning, but it has some peculiarities that take getting used to. The OpenCV framework, which is the easiest way to start with computer vision, packs over 2800 algorithms and can be a bit overwhelming at first.

In this article, we’ll offer a surprisingly simple way to get started with computer vision using Python and OpenCV. After going through these steps, you’ll be able to build apps like:

  • An image similarity app
  • A search engine app
  • A face detection app
  • An object detection app

Let’s Get Started: What Are the 6 Steps?

Once you’ve done these six things in OpenCV with Python, you should be able to confidently build basic computer vision models and move on to building your own AI applications:

  1. Installing OpenCV
  2. Understanding color models and drawing figures on images
  3. Learning edge detection
  4. Learning face detection
  5. Learning contour detection and eye detection
  6. Move from detecting faces in an image to detecting in a video via a webcam

In this article we’ll explain steps 1 and 2, which will give you a head start with OpenCV.

Coming up soon are additional articles which will explain OpenCV edge detection, OpenCV face detection, OpenCV contours and eye detection, and OpenCV webcam detection.

Brief Definitions

Let’s start with definitions to get us on the same page.

What is computer vision?

Computer vision is a field of deep learning that enables machines to see, identify and process images like humans. It is the automated extraction of information from images━anything from 3D models, camera position, object detection and recognition to grouping and searching image content.

What is Python?

Python is the language most commonly used today to build and train computer vision models. Python was designed to be easy to use and quick to learn, has an accessible syntax and offers many frameworks and libraries available for machine and deep learning, including NumPy, scikit-learn, OpenCV and deep learning frameworks like TensorFlow and PyTorch.

Step 1: Installing OpenCV

Here’s how to install OpenCV:

pip install opencv-python==3.4.2
pip install opencv-contrib-python==3.3.1

After you finish the installation, try importing the package in your Python code. If you manage to run the code without an error, you’re good to go.

import cv2

For more details, see the official documentation.

Step 2: Understanding Color Models and Drawing Figures on Images

It’s important to understand color models and how to transform images between them. Different color models may be more appropriate for different computer vision problems. You should convert your source images into the color model that will make it easiest for the model to identify the required elements in the image.

A color model is used to create a system of all possible colors using several primary colors. There are two types of models:

  • Additive models use light to represent colors in computer screens. The primary colors in an additive model are red, green and blue (RGB). Here is how to convert an image from BGR (the default model in OpenCV) to RGB:
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  • Subtractive models use ink to print colors on paper. The primary colors they use are cyan, magenta, yellow and black (CMYK).

Computer vision uses three main color models:

  • A grayscale model represents images by the intensity of black and white. It has only one color channel. Here is how to convert an image to grayscale in OpenCV:
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
plt.imshow(img_gray, cmap = 'gray')

Image Source

  • HSV stands for hue, saturation and value. Here the focus is the value of colors.
  • HSL stands for hue, saturation and lightness. Here the focus is the amount of light. Here is how to convert an image to HSV and HSL:
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
img_hls = cv2.cvtColor(img, cv2.COLOR_BGR2HLS)

Image Source

Another important skill is drawing things on your source images. This is often required in computer vision, for example to visualize the bounding boxes of objects you detect in your image.

How to draw a rectangle on an image in OpenCV

First copy the image, then use the cv2.rectangle() function to set the two corners that define the rectangle.

img_copy = img.copy()
cv2.rectangle(img_copy, pt1 = (700, 560), pt2 = (850, 630), 
              color = (255, 0, 0), thickness = 5)

How to draw an object interactively on an image

First you need to define a callback function, which returns data for the cursor position when you click the mouse.

def draw_circle(event, x, y, flags, param):
    if event == cv2.EVENT_LBUTTONDOWN:
  , center = (x, y), radius = 5, 
                       color = (87, 184, 237), thickness = -1)
    elif event == cv2.EVENT_RBUTTONDOWN:        
  , center = (x, y), radius = 10,  
                       color = (87, 184, 237), thickness = 1)

Next, call the window. The cv2.setMouseCallback() makes the connection between the window and the function you defined above:

img = cv2.imread('map.png')
cv2.namedWindow(winname = 'my_drawing')
cv2.setMouseCallback('my_drawing', draw_circle)

Finally, execute the window as follows:

while True:
    if cv2.waitKey(10) & 0xFF == 27:

That’s it! You’ve made your first steps in OpenCV and are able to perform basic operations with images.

Stay tuned for the next articles in this series to learn the really cool stuff, including edge detection, face detection and live object detection in a webcam.

Running OpenCV Projects in the Real World

In this article, we explained how to take your first steps with OpenCV and Python to create computer vision models.

When you start working on computer vision projects, processing and generating predictions for real images, audio and video, you’ll run into some practical challenges:

  • tracking experiments

    Tracking experiment progress, hyperparameters, image datasets and source code across multiple experiments. Computer vision models can have many different structures and variations. Testing many variations to see what works will require you to run and tracking possibly thousands of experiments.

  • running experiment across multiple machines

    Running experiments across multiple machines—computer vision algorithms are computationally intensive, especially if you need to run them on large numbers of images or video footage in OpenCV. Production scale projects will require multiple machines and GPU hardware. Distributing the work efficiently will be a challenge.

  • manage training datasets

    Manage training data—OpenCV projects usually involve large numbers of images, video files or live video, and training sets can weigh up to Gigabytes or Petabytes. Copying this data to machines and replacing it each time as you tweak your dataset and computer vision models can be very time consuming.


MissingLink is a deep learning platform that can help you automate these operational aspects of neural networks, so you can concentrate on building winning experiments and running them with OpenCV.

Learn more about the MissingLink platform.

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.