In this tutorial we will understand an important concept called "Selective Search" in object detection. We will also share the OpenCV code in C++ and Python.
An object detection algorithm identifies which objects are present in an image. It takes the entire image as input and generates class labels and class probabilities of the objects present in that image. For example, a class identifier could be "dog" and the associated class probability could be 97%.
On the other hand, an object detection algorithm not only tells you what objects are present in the image, but also generates bounding boxes (x,y,width,height) to show the position of the objects in the image.
The heart of all object detection algorithms is an object detection algorithm. Suppose we train an object recognition model that identifies dogs in image frames. This template will show you whether an image contains a dog or not. It doesn't tell you where the object is.
In order to locate the object, we need to select sub-regions (patches) of the image and then apply the object detection algorithm to these image patches. The location of the objects is given by the location of the blobs in the image where the class probability returned by the object detection algorithm is high.
The most direct way to create smaller patches is called the sliding window approach. However, the sliding window approach has several limitations. These limitations are overcome by a class of algorithms referred to as "range suggestion" algorithms. Selective search is one of the most popular region suggestion algorithms.
In the sliding window approach, we slide a box or window over an image to select a patch and classify each image patch covered by the window using the object detection model. It is an exhaustive search for objects throughout the image. Not only do we have to look in all possible places in the picture, but also in different scales. This is because object recognition models are typically trained on a specific scale (or range of scales). This leads to the classification of tens of thousands of image spots.
The problem doesn't end here. Sliding window sharpening works well on objects with a fixed aspect ratio, such as faces or pedestrians. Images are 2D projections of 3D objects. Characteristics of the subject, such as proportions and shape, vary significantly depending on the angle from which the picture is taken. The sliding window approach is computationally intensive when considering multiple aspect ratios.
I have an exclusive partnership withOpenCV.orgto offer you official courses in AI, Computer Vision and Deep Learning, taking you on a structured path from getting started to mastering.
The problems discussed so far can be solved using region suggestion algorithms. These methods use an image as input and output bounding boxes that correspond to all patches in an image that are likely to be objects. These range hits may be noisy, overlapping and may not contain the object perfectly, but under these range hits there will be a hit that is very close to the actual object in the image. We can then classify these suggestions using the object detection model. Region suggestions with high probability values are object locations.
Range suggestion algorithms use segmentation to identify potential objects in an image. In segmentation, we group neighboring regions that are similar to each other based on some criteria like color, texture, etc. Group pixels into a smaller number of segments. Therefore, the final number of suggestions generated is many times less than with the sliding window approach. This reduces the number of image blobs we need to classify. These generated suggested regions have different scales and proportions.
An important characteristic of a site proposal process is to have ahigh memory🇧🇷 It's just a fancy way of saying that the regions that contain the items we're looking for should be on our suggested regions list. To achieve this, our list of suggested regions might end up containing many regions that contain no objects. In other words, it's okay if the region suggestion algorithm produces a lot of false positives, as long as it catches all the true positives. Most of these false alarms are rejected by the object detection algorithm. The detection time increases when we have more false alarms and the accuracy is slightly affected. However, high recognition is still a good idea, since the alternative of missing the regions containing the real-world objects severely degrades the recognition rate.
Various methods of suggesting regions have been proposed, such as
- objectivity
- Restricted minimum parametric cuts for automatic segmentation of objects
- Category suggestions for independent objects
- randomized primitive
- selective search
Among all these region suggestion methods, selective search is the most used because it is fast and has a very high recovery.
What is selective search?
Selective search is a range suggestion algorithm used in object detection. It is designed to be fast with a very high recovery. It is based on calculating the hierarchical grouping of similar regions based on color, texture, size and shape compatibility.
The selective search starts by super segmenting the image based on pixel intensity using a chartsegmentation methodby Felzenszwalb and Huttenlocher. The output of the algorithm is shown below. The image on the right contains segmented areas represented by solid colors.
Can we use segmented parts of this image as suggested regions? The answer is no and there are two reasons why we cannot do this:
- Most real objects in the original image contain 2 or more segmented parts.
- Area suggestions for hidden objects, such as the plate covered by the cup or the filled coffee cup, cannot be generated using this method.
If we try to solve the first problem by mixing even more similar neighboring regions, we end up with a segmented region covering two objects.
Perfect orientation is not our goal here. We just want to predict many region suggestions, so some of them have a very high overlap with real objects.
The selective search uses super segments from the Felzenszwalb and Huttenlocher methods as seed. A super segmented image looks like this.
The selective search algorithm takes these super segments as initial input and performs the following steps
- Add all bounding boxes that correspond to the segmented parts to the list of regional suggestions
- Group adjacent segments based on similarity
- go to step 1
With each iteration, larger segments are formed and added to the list of suggested regions. Therefore, in a bottom-up approach, we create suggested regions from smaller segments to larger segments. This is what we mean by computing "hierarchical" slices using the Felzenszwalb and Huttenlocher superslices.
This image shows the first, middle and last step of the hierarchical segmentation process.
Let's see how to calculate the similarity between two regions.
Selective Search uses 4 similarity measures based on color, texture, size, and shape compatibility.
color similarity
A 25-bin color histogram is calculated for each channel in the image, and the histograms of all channels are concatenated to obtain a color descriptor, yielding a color descriptor with 25 × 3 = 75 dimensions.
The color similarity of two regions is based on the intersection of the histogram and can be calculated as follows:
is the value of the histogram for
am in the color description
texture similarity
The texture features are calculated by extracting Gaussian derivatives in 8 orientations for each channel. A 10-bin histogram is calculated for each orientation and for each color channel, resulting in a feature descriptor of 10 x 8 x 3 = 240 dimensions.
The texture similarity of two regions is also calculated using histogram intersection points.
is the value of the histogram for
bin in the texture descriptor
similarity in size
The similar size encourages smaller regions to join sooner. Ensures that region suggestions are formed in all parts of the image at all scales. If this measure of similarity is not taken into account, a single region will successively swallow up all smaller neighboring regions, so that multi-scale region suggestions are generated only at that point. The size similarity is defined as:
Wois the size of the image in pixels.
shape compatibility
Shape compatibility measures how well two regions (mi
) match each other. Yes
fits
We want to merge them to fill in the gaps and if they don't touch then they shouldn't merge.
Shape compatibility is defined as:
Wois a bounding box around
mi
.
ultimate resemblance
The final similarity between two regions is defined as a linear combination of the 4 similarities above.
Womi
are two areas or segments in the image and
indicates whether the similarity measure is used or not.
OpenCV's selective search implementation provides thousands of suggested regions, ranked in descending order of objectivity. For clarity, we share the results with the top 200-250 frames swept across the image. In general, 1000-1200 suggestions are enough for all region suggestions to be correct.
Let's take a look at how we can use the selective search-based segmentation implemented in OpenCV.
Download codeTo easily follow this tutorial, download the code by clicking the button below. It's free!
Download code
Selective search: C++
The code below is a C++ tutorial for selective search with OpenCV. Please read the comments to understand the code.
#include "opencv2/ximgproc/segmentation.hpp"#include "opencv2/highgui.hpp"#include "opencv2/core.hpp"#include "opencv2/imgproc.hpp"#include <iostream>#include <ctime>using from cv namespace;using namespace cv::ximgproc::segmentation;static void help() { std::cout << std::endl << "Usage:" << std::endl << ". /search input_image ( f | q)" << std::endl << "f=fast, q=quality" << std::endl << "Use l to show fewer rectangles, m to show more rectangles, q to end" < < std : :endl;}int main(int argc, char** argv) { // If image path and f/q are not passed as // command line arguments, exit and display help message if (argc < 3) { help() ; return -1; } // speed up with multithreading setUseOptimized(true); setNumThreads(4); // Read image Mat im = imread(argv[1]); // Resize the image int newHeight = 200; int newWidth = im.cols*newHeight/im.rows; resize(im, im, Size(newwidth, newheight)); // create a selective search segmentation object with the default parameters Ptr<SelectiveSearchSegmentation> ss = createSelectiveSearchSearchSegmentation(); // sets the input image on which we will do the segmentation ss->setBaseImage(im); // switch to a fast but selective search method with low recovery if (argv[2][0] == 'f') { ss->switchToSelectiveSearchFast(); } // switch to high fetch but slow selective search method else if (argv[2][0] == 'q') { ss->switchToSelectiveSearchQuality(); } // If the argument is neither f nor q, print the help message else { help(); return -2; } // Perform selective search segmentation on the input image std::vector<Rect> rects; ss->process(rects); std::cout << "Total number of suggested regions: " << rects.size() << std::endl; // Number of region suggestions to show int numShowRects = 100; // increase/decrease the total number of // suggested reasons for display int increment = 50; while(1) { // create a copy of the original image Mat imOut = im.clone(); // iterate over all region suggestions for(int i = 0; i < rects.size(); i++) { if (i < numShowRects) { rectangle(imOut, rects[i], Scalar(0, 255, 0 ) ) ; } else { rest; } } // Show output imshow("Output", imOut); // registers the pressed key int k = waitKey(); // m is pressed if (k == 109) { // increase the total number of rectangles to be displayed by increment numShowRects += increment; } // l is pressed else if (k == 108 && numShowRects > increment) { // decreases the total number of rectangles to display per increment numShowRects -= increment; } // q is pressed else if (k == 113) { break; } } returns 0;}
Selective search: Python
The code below is a Python tutorial for selective search using OpenCV 3.3. Note the OpenCV 3.2 error warning mentioned after the code block. Please read the comments to understand the code.
#!/usr/bin/env python'''Usage: ./ssearch.py input_image (f|q) f=fast, q=qualityUse 'l' to show fewer rectangles, 'm' to show more rectangles "q" to exit.'''import sysimport cv2if __name__ == '__main__': # If the image path and f/q are not passed as # command line arguments, exit and display the help message if len(sys.argv ) < 3: print(__doc__) sys.exit(1) # speed up with multithreading cv2.setUseOptimized(True); cv2.setNumHilos(4); # Read image im = cv2.imread(sys.argv[1]) # Resize image newHeight = 200 newWidth = int(im.shape[1]*200/im.shape[0]) im = cv2.resize(im, (newWidth, newHeight)) # Create a selective search segmentation object with the default parameters ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation() # Set the input image to perform segmentation on ss.setBaseImage( im) # Switch to select quickly but slow selective search method if (sys.argv[2] == 'f'): ss.switchToSelectiveSearchFast() # switch to high fetch but slow selective search method elif (sys.argv[2] == 'q'): ss .switchToSelectiveSearchQuality { }'.format(len(rects))) # number of suggested regions to show numShowRects = 100 # increment to increase/decrease the total number of # suggested reasons for showing increment = 50 while True: # make a copy of it the original image imOut = im.copy() # iterate over all pro-regions set to i, rect in enumerate(rects): # Draw rectangle for suggested area up to numShowRects if (i < numShowRects): x, y, w, h = rect cv2.retangle(imOut, (x, y), (x +w, y+h ), (0, 255, 0), 1, cv2.LINE_AA) plus: break # show output cv2.imshow("Output", imOut) # press record key k = cv2.waitKey( 0) & 0xFF # m is pressed, if k == 109 : # increase total number of rectangles to be displayed by increment numShowRects += increment # l is pressed elif k == 108 and numShowRects > increment: # decrease total number of rectangles to be displayed by increment numShowRects -= increment # q is pressed elif k == 113: break # close image display window cv2.destroyAllWindows()
Error warning:There was a bug in Python selective search links which has been fixed hereengage🇧🇷 So the Python code works for OpenCV 3.3.0 but not for OpenCV 3.2.0.
If you don't want to compile OpenCV 3.3.0 and you have the build folder for OpenCV 3.2.0 that you compiled earlier, you can also fix this error.
Check out Githubengageit's just a small change. You need to change line #239 in the file
opencv_contrib-3.2.0/modules/ximgproc/include/opencv2/ximgproc/segmentation.hpp
// fromCV_WRAP virtual void process (std::vector<Rect>& rects) = 0; // toCV_WRAP virtueller Void-Prozess (CV_OUT std::vector<Rect>& rects) = 0;
Now rebuild your OpenCV 3.2.0. If you have a build folder where OpenCV was previously built, running the make command will only build that module.
Subscription and Download Code
If you enjoyed this article and would like to download the code (C++ and Python) and example images used in this post, click here. Alternatively, sign up for a free Computer Vision resource guide. In our newsletter we share OpenCV tutorials and examples written in C++/Python, algorithms and news from computer vision and machine learning.
Download sample code