Computer Vision is an interdisciplinary field in Artificial Intelligence that enables machines to derive and analyze information from imagery (images and videos) and other forms of visual inputs. Computer Vision imitates the human eye and is used to train models to perform various functions with the help of cameras, algorithms, and data rather than optic nerves and the visual cortex. Computer vision has very significant real-world applications including facial recognition, self-driving cars, and predictive analysis. With self-driving cars Computer Vision (CV) allows the car’s computer to make sense of the visual input from a car’s cameras and other sensors. In various industries, CV is used many tasks such as for x-ray analysis in healthcare, for quality control in manufacturing, and predictive maintenance in construction just to name a few.

Outside of just recognition, other methods of analysis include

  • Video motion analysis which uses computer vision to estimate the velocity of objects in a video, or the camera itself.

  • In image segmentation, algorithms partition images into multiple sets of views which we will discuss later in this article.

  • Scene reconstruction which creates a 3D model of a scene inputted through images or video and is popular on social media.

  • In image restoration, noise such as blurring is removed from photos using Machine Learning based filters.

Scikit-Image

A great tool is Scikit-image which is a Python package dedicated to image processing. We’ll be using this tool throughout the article so to follow along you can use the code below to install it:

pip install scikit-image

# For Conda-based distributions

conda install -c conda-forge scikit-image

Basics for Scikit-image

Before getting into image segmentation, we will familiarize ourselves with the scikit-image ecosystem and how it handles images.

Importing images from skimage library

The skimage data module contains some built-in example data sets which are generally stored in jpeg or png format. We will use matplotlib to plot images which is an amazing visualization library in Python for 2D plots of arrays. You can find the link to our notebook here.

  • Importing a grayscale image

  • Importing a colored image

  • Importing images from an external source

Various factors affect methods used to process images some are color, format, and even size. More high-contrast images would need more advanced tools.

  • Loading multiple images

A ValueError will be raised if images in the ImageCollection don’t have identical shapes.

  • Saving images

Converting image format

RGB color model is an additive color model in which the red, green, and blue primary colors of light are added together at different intensities to reproduce a broad array of colors. RGB is the most common color model used today. Every television or computer monitor uses the RGB color model to display images.

  • RGB to Grayscale

So as to apply filters and other processing techniques, the expected input is a two-dimensional vector i.e. a monochrome image. This is great for basic segmentation that would not work properly with high-contrast images. rgb2gray module of the skimage package is used to convert a 3-channel RGB Image to one channel monochrome image.

Output:

  • RGB to HSV

An HSV (Hue, Saturation, and Value) color model is a color model designed to more closely resemble how the human vision perceives color. HSV is great for editing because it separates out the lightness variations from the hue and saturation variations. rgb2hsv() function is used to convert an RGB image to HSV format.

Output:

Image Segmentation

Image Segmentation is the process of splitting images into multiple layers, represented by an intelligent, pixel-wise mask. Simply put it is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics(for example, color, intensity, or texture). It involves merging, blocking, and separating an image from its integration level. Splitting a picture into a collection of Image Objects with comparable properties is the first stage in image processing. For this article, we will cover image segmentation with thresholding using supervised and unsupervised algorithms.

Thresholding

This is a simple way of segmenting objects in the background by choosing pixels of intensities above or below a certain threshold value. It is a way to create a binary image from a grayscale or full-color image. This is typically done in order to separate “object” or foreground pixels from background pixels to aid in image processing.

Supervised learning

This type of segmentation requires external input that includes things like setting a threshold, converting formats, and correcting external biases.

Segmentation by Thresholding — Manual Input

For this part we will use an external pixel value ranging from 0 to 255 is used to separate the picture from the background. The intensity value for each pixel is a single value for a gray-level image or three values for a color image. This will result in a modified picture that is more or less than the specified threshold as we will see below. To implement this thresholding we first normalize an image from 0–255 to 0–1. A threshold value is fixed and on the comparison, if evaluated to be true, then we store the result as 1, otherwise 0.

Output:

This globally binarized image can be used to detect edges as well as analyze contrast and color differences.

Active Contour Segmentation

An active contour is a segmentation approach that uses energy forces and restrictions to separate the pixels of interest from the remainder of the picture for further processing and analysis. The active contour model is among the dynamic approaches in image segmentation that uses the image’s energy restrictions and pressures to separate regions of interest. It is a technique for minimizing the energy function resulting from external and internal forces. An exterior force is specified as curves or surfaces, while an interior force is defined as picture data. The external force is a force that allows initial outlines to automatically transform into the forms of objects in pictures. Active Contour segmentation also called snakes and is initialized using a user-defined contour or line, around the area of interest. This contour then slowly contracts and is attracted or repelled from light and edges. The snakes model is popular in computer vision, and snakes are widely used in applications like object tracking, shape recognition, segmentation, edge detection, and stereo matching.

In the example below after importing the necessary libraries we will convert our image from the scikit-image package to grayscale. Then we will plot and draw a circle around the astronaut’s head to initialize the snake. active_contour() function active contours by fitting snakes to image features. Gaussian filter is also applied to denoise the image. For the parameters, alpha and beta, higher values of alpha will make this snake contract faster while beta makes the snake smoother.

Output:

Chan-Vese Segmentation

The Chan-Vese segmentation algorithm is designed to segment objects without clearly defined boundaries. The well-known Chan-Vese iterative segmentation method splits a picture into two groups with the lowest intra-class variance. The implementation of this algorithm is only suitable for grayscale images. Some of the parameters used are lambda1 and mu. The typical values for lambda1 and lambda2 are 1. However, if the ‘background’ is very different from the segmented object in terms of distribution then these values should be different from each other, for example, a uniform black image with figures of varying intensity. Typical values for mu are between 0 and 1, though higher values can be used when dealing with shapes with very ill-defined contours. The algorithm then returns a list of values that corresponds to the energy at each iteration. This can be used to adjust the various parameters we have discussed above.

In the example below, we begin by using rgb2gray to convert our image to grayscale. The chan_vese() function is used to segment objects using the Chan-Vese Algorithm whose boundaries are not clearly defined. Then we will plot the output tuple of 3 values which are the original image, the final level image, and one that shows the evolution of energy.

Output:

Unsupervised Learning

This type of image segmentation thresholding algorithm requires no user input. Consider an image that is so large that it is not feasible to consider all pixels simultaneously. So in such cases, Unsupervised segmentation can break down the image into several sub-regions, so instead of millions of pixels, you have tens to hundreds of regions. You may still be able to tweak certain settings to obtain desired outputs.

SLIC (Simple Linear Iterative Clustering)

SLIC algorithm utilizes K-means, a machine learning algorithm, under the hood. It takes in all the pixel values of the image and tries to separate them out into the given number of sub-regions.

SLIC works well with color so we do not need to convert images to grayscale. We will set the subregion to the average of that region which will make it look like an image that has decomposed into areas that are similar. label2rgb() replaces each discrete label with the average interior color.

Output:

Mark Boundaries

This technique produces an image with highlighted borders between labeled areas, where the pictures were segmented using the SLIC method.

In the example below we have segmented the image into 100 regions with compactness = 1 and this segmented image will act as a labeled array for the mark_boundaries() function. The mark_boundaries() function is to return images with boundaries between labeled regions.

Output:

Felzenszwalb’s Segmentation

Felzenszwalb uses minimum-spanning tree clustering for the machine-learning algorithm behind the scenes. Felzenszwaib doesn’t tell us the exact number of clusters that the image will be partitioned into. It will run and generate as many clusters as it thinks are appropriate for that given scale or zoom factor on the image. This may be used to isolate features and identify edges.

In the example below seg.felzenszwalb() function is used to compute Felsenszwalb’s efficient graph-based image segmentation. The parameter scale determines the level of observation. Sigma is used to smooth the pictures before segmentation. Scale is the sole way to control the quantity of generated segments as well as their size. The size of individual segments within a picture might change drastically depending on local contrast. This is useful in confining individual features, foreground isolation, and noise reduction, and can be useful to analyze an image more intuitively.

Output:

We can calculate the number of unique regions the image was partitioned into.

Let’s recolor the image using label2rgb() just like we did with the SLIC algorithm.

It is similar to a posterized image which is essentially just a reduction in the number of colors.

Conclusion

Image segmentation is a vital step in image processing. It is actively researched with its application in traffic and video surveillance to medical imaging. In this article, we have gone over image segmentation techniques using only the scikit image module. You could attempt some of these image segmentation methods with libraries like OpenCV. It is however important to mention some of the image segmentation techniques which use deep learning.