In this article,You will gain extensive hands-on experience with instance targeting using PyTorch and Mask R-CNN. Image segmentation is one of the main application areas of deep learning and neural networks. One of the most popular image segmentation techniques where we apply deep learning issemantic segmentation. In semantic segmentation, we mask a class in an image with a single color mask. Therefore, different classes have a different color skin. For more information on semantic segmentation, see one of myArticle. But in this article we will focus on the subject.Instance segmentation in Deep Learning with PyTorch and Mask R-CNN.
Take a look at the image below to get a better idea of instance targeting.

illustration 1shows how each person in the image on the left has a different colored mask, even though they all belong to thePersonaclassroom. Likewise, all the sheep are masked with different colors.
In this article, we'll try to use instance targeting and get results similar to the ones above.
So what will we learn in this article?
- We will not train our instance threading model in this tutorial. Instead, we used the PyTorch Mask R-CNN model that was trained onCOCO's record.
- We will start by learning a little more about the Mask R-CNN model. In particular, we know the input and output format of the model.
- Then we dive into the coding part with a very detailed explanation.
- Finally, we will test the Mask R-CNN deep learning model by applying it to images.
PyTorch-Maske R-CNN Model Deep Learning
Before we get into the input and output format of the Mask R-CNN model, let's see what it actually does and how it does it.
How the Mask R-CNN model works
Let's briefly explore how Mask R-CNN works and how it approaches deep learning instances.
We know that in semantic segmentation, each class of an image has a unique color mask. But with instance segmentation, each instance of a class has a different color.
How do we do that then?In simple words we can say thatWe can detect any object present in an image, get its bounding box, classify the object inside the bounding box, and mask it with a unique color. Therefore, instance segmentation is a combination of object detection and image segmentation. It sounds simple, but in practice and training it can get complicated very quickly. The same procedure is also used by the mask-R-CNN model.

what do you see in itFigure 2is an example of instance segmentation. You can see each object is detected and then a colored mask is applied to it.
In fact, Mask-RCNN is a combination of the famous Faster-RCNN deep learning object detector and image segmentation. We will not go into the technical details of the model here. But I highly recommend reading the original.Mask R-CNN paper here. And if you want to learn more about image targeting in general, I highly recommend reading one of mine.previous articles on image segmentation. It covers a lot of general things like scoring metrics, some key articles, and application areas of deep learning-based image targeting.
We don't have to worry too much about all the technical details of building such a model. We will use a pre-trained model provided by PyTorch. Therefore, it is much more beneficial to know more about the input and output format of a pre-trained model, which will help us with inference and coding.
The input and output format of the PyTorch Mask R-CNN model
oR-CNN Pretrained Model Maskthat provides PyTorch has a ResNet-50 FPN backbone.
The model expects stacked images for inference, and all pixels must be within the region.[0, 1]
. So the input format for the model will be[N, C, H, W]
. Herenorte
is the number of images or the size of the stack,C
is the dimension of the color channel andH
&C
are the height and width of the image. It's quite simple and also in the typical PyTorch format.
The model produces a lot of content. Remember that this is a combination of object detection and image segmentation. During inference, the model generates a dictionary list containing the resulting tensors. formally it is oneList[Dict[Tensor]]
. And the following are the contents I extractedPyTorch template site.
boxing
(FloatingTensioner[N,4])
: the boxes provided in[x1,y1,x2,y2]
format, with values ofX
in between0
miC
and values ofj
in between0
miH
.hang tags
(Int64Tensor[N]
): the captions provided for each imagescores
(Tensor[N]
): the results or any predictions.masks
(UInt8Tensor[N,1,H,W]
): the respective masks provided, in0-1
Area. To obtain the final segmentation masks, soft mask thresholds can be set, typically with a value of 0.5 (Mask>=0.5
).
So the dictionary contains four keys,boxing
,hang tags
,scores
, zmasks
. These keys contain the resulting tensors as values. And keep in mind thatWe need to consider mask values greater than or equal to 0.5.
I hope the above details clear up some of the technicalities. If not, it will be much clearer when we actually code our way. Coding and applying the mask-R-CNN model to images will help us understand how it works even better than before. So let's move on.
Project directory structure
Here we know the directory structure of the project. I hope you follow the same structure as this tutorial so that you can continue without any difficulties. Below is the directory structure that we will follow.
├───entrada│ imagen1.jpg│ imagen2.jpg│ imagen3.jpg│├───salidas│└───src │ coco_names.py │ mask_rcnn_images.py │ utils.py
So we have three folders.
- o
Verboten
The folder contains the images on which we will test the Mask R-CNN model. - o
Departures
The folder contains all the segmented images after going through the mask-R-CNN model. - And finally we have them
Origen
Folder that will contain the python scripts.
You can use any image of your choice to make inferences with the Mask-R-CNN model. However, if you want to use the same images as this tutorial, you can download the zipped input file below. the photos are frompixabay.
After downloading the files, extract them to the main directory of the project.
libraries we need
PyTorchNameit's the only big library we need for this tutorial. I used PyTorch 1.6 for this project. So you can go ahead andDescargar PyTorchif you haven't already.
All other libraries are common machine vision and deep learning libraries that you probably already have. If not, you can install them along the way.
Instance segmentation with PyTorch and Mask R-CNN
Starting in this section, we begin writing code forImage instance segmentation with PyTorch and Mask R-CNN.
Let's start by defining all the COCO dataset class names in a Python script.
The class names of the COCO records
We separate all class names from other Python code to keep our code clean.
Create onecoco_nombres.py
hyphen insideOrigen
folder and put the following list in it.
COCO_INSTANCE_CATEGORY_NAMES = [ '__background__', 'Person', 'Bike', 'Car', 'Motorbike', 'Plane', 'Bus', 'Train', 'Truck', 'Boat', 'Traffic Light', 'Fire' Fire hydrant', 'N/A', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant' , 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A', 'purse', 'tie', 'suitcase' , 'Frisbee', 'ski', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis bat' , 'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange ', 'broccoli', 'carrot', 'hot dog', 'pizza', 'doughnut', 'cake', 'chair', 'sofa', 'potted plant', 'bed', 'N/A' , 'dining table', ' N/A', 'N/A', 'Bathroom', 'N/A', 'TV', 'Laptop', 'Mouse', 'Remote', 'Keyboard', ' Telephone', 'Microwave', 'Oven', 'Toaster', 'Sink', 'Fridge', 'N/A', 'Book', 'Praise', 'Jar ron', 'scissors', 'teddy bear', 'hair dryer', 'toothbrush' ']
That's all we need for this python script. We will import it whenever we need it.
Writing some helper functions for instance threading
Now let's set up the utility script that will help us a lot in the tutorial. It basically contains all the important functions like stepping the image through the model and applying the segmented mask to the image.
Things will become clearer as we write the code. So let's go right in.
All this code goes toutils.py
hyphen insideOrigen
Pasta.
Below are the imports we need.
import cv2import numpy as npimport randomimport tochafrom coco_names import COCO_INSTANCE_CATEGORY_NAMES as coco_names
Note that we import theCOCO_INSTANCE_CATEGORY_NAMES
Voncoco_nombres.py
.
We have a total of 91 classes for segmentation and detection. And we want each object of each class to have a different color mask. In short, we want each object to have a different color mask.
We need to generate a different RGB tuple for each of the detected objects in an image. The following simple line of code does that for us.
# This will help us create a different color for each class COLORS = np.random.uniform(0, 255, size=(len(coco_names), 3))
We can use the colors generated above in the OpenCV drawing functions.
Function to retrieve the outputs.
We will write a simple function to get the results of the model after the inference. This function provides us with all the necessary output tensors for the correct visualization of the results. Let's call this functionget_outputs()
.
The definition of the function follows.
def get_outputs(image, model, threshold): with archote.no_grad(): # Pass the image through the module ().cpu().numpy()) # Index of scores that are above a certain threshold. threshold_preds_inidices = [scores.index(i) for i on scores if i > threshold] threshold_preds_inidices = len(threshold_preds_inidices) # get as mask = (outputs[0]['masks']>0.5).squeeze().detach( ) .cpu().numpy() # discard masks for objects below threshold mask = mask[:thresheld_preds_count ] # preserve bounding frames, on (x1, y1), (x2, y2) format frames = [[( int (i[0 ]), int(i[1])), (int(i[2] ), int (i[ 3]))] for i in Outputs[0]['boxes'].separate( ) .cpu() ] # Discard bounding boxes below threshold boxes = boxes[:thresheld_preds_count] # get class labels labels = [coco_names[i] for i in outputs[0]['labels']] returns masks, boxes, labels
oget_outputs()
The function accepts three input parameters. The first is the entrance.image
, the second is R-CNN maskModel
, and the third is theShould
Worth. The threshold is a predefined value below which we discard all outputs to avoid too many false positives. Let's see the code step by step.
- NOline 12, we got them
Departures
Advance by guiding the image through the model. It gives us a list with a dictionary. - NOline 15, we have it all
scores
from the dictionary and load it into the CPU. - also reinline 17, Have
thresholds_preds_inidices
. Contains the full index value of thescores
who are above thatShould
that we provide. - Get the length of the above list
umbral_preds_count
will help us extract all the masks and bounding boxes with just these values. - NOline 20, we got them
masks
that are greater than or equal to 0.5. - line 22discard all skins that are not within the scoring threshold. We only keep skins that are at least above the threshold.
- NOLines 25 and 27, we preserve the bounding boxes and discard all low-score box detections similar to how we did with masks.
- miline 30Get all tag names from COCO dataset mapping results
hang tags
indices withCoco_Name
List. - we finally give it back
masks
,boxing
, zhang tags
.
I hope you have understood the above steps. Try to check them again and you will surely get them.
Apply segmentation and draw a bounding box
Once we have the labels, masks, and bounding boxes, we can apply color masks to the object and also draw the bounding boxes.
Again, we write a very simple function for this. the function isdraw_map_segmentation()
accepts all four input parameters. they areimage
,masks
,boxing
, zhang tags
. oimage
is the original image to which we apply the resultmasks
and draw bounding boxes around the detected objects. He toohang tags
will help us put the class name at the top of each object.
The definition of the function follows.
def draw_segmentation_map(image, mask, boxes, labels): alpha = 1 beta = 0.6 # transparency for segmentation map gamma = 0 # scalar added to each sum for i in range (len(masks)): red_map = np. zeros_like(mask[i]).astype(np.uint8) green_map = np.zeros_like(masks[i]).astype(np.uint8) blue_map = np.zeros_like(masks[i]).astype(np.uint8) # apply a random color mask to each object color = COLORS[random.randrange(0, len(COLORS))] red_map[masks[i] == 1], green_map[masks[i] == 1], blue_map[ masks [i] == 1] = color # combine all masks into a single image segmentation_map = np.stack([red_map, green_map, blue_map], axis=2) # convert the original PIL image to NumPy format image = np. array (image ) # convert from RGN to OpenCV BGR format image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # mask the image cv2.addWeighted(image, alpha, segmentation_map, beta, gamma, image) # the bounding boxes around objects are drawn around cv2.rectangle(image, box chen[i][0], boxes[i][1], color=color, thickness=2) # Put the label text over the objects cv2.putText (image , label[i], (boxes[i][0 ] [0], boxes[i ][0][1]-10), cv2.FONT_HERSHEY_SIMPLEX, 1, color, weight=2, linetype= cv2.LINE_AA) Return Image
- First we have to
Alfa
,Beta
, zRollo
. Here,Alfa
miBeta
Define the weights of the original image and the segmentation map when we overlay the segmentation on the image.Rollo
is the scalar that is added to each sum, and keeping it at 0 is optimal in almost all cases. For more details seeOpenCV documentation. - Vonline 36, we started a
for
Loop for the number of skins we have. - NOLines 37, 38 and 39we define three NumPy arrays containing all zeros whose dimensions match those of the current mask.
- also reinline 42, we get a random color tuple of
KERN
List. - line 43apply above
Kor
on the object to create a mask. NumPy arrays now contain some color instead of just being black. - line 45stack the
Red card
,green card
, zblue_map
to get the full slice map for the current object. - NOline 47, we convert the original image from the PIL image format to the NumPy format and then convert it to the OpenCV BGR color format inline 49.
- NOline 51, we combine the image and the segmentation map. Basically, we overlay the segmentation map on top of the original image with a weight of 0.6. This gives us a translucent map in the image and we can easily deduce which object is really there.
- Lines 54 and 57Draw the bounding boxes and draw the title name for each of the current object.
- Finally, we return the resulting image inline 61.
The two features mentioned above were the most important parts of this tutorial. If you're with me this far, the rest of the article is pretty easy to follow.
Apply R-CNN mask to images
Now let's write the code to apply the Mask-R-CNN model to the images of our choice. This part is going to be pretty easy since we've written most of our logic inutils.py
road map.
All this code goes tomask_rcnn_images.py
Archive.
Let's start with the imports we need.
Import from TorchImport from TorchvisionImport from cv2Import gparsefrom PILImport from Imagefrom utilImport from draw_segmentation_map, get_outputsfrom archive.transforms Import transformations as transformations
We will provide the path to the input image using command line arguments. So now let's define our argument parser.
parser = argparse.ArgumentParser()parser.add_argument('-i', '--input', required=True, help='path to input data')parser.add_argument('-t', '--threshold ' , default=0.965, type=float, help='score threshold to remove detection') args = vars(parser.parse_args())
We also have the optional threshold in the code block above. By default, we drop all detections with a score less than 0.965. If you want, you can increase or decrease the value.Note, however, that increasing the value too much can cause objects to go unnoticed. And setting the value too low can also lead to many false alarms.
Prepare the model and define the transformation
The next step is to prepare our mask-R-CNN model.
# Initialize the model model = archvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True, progress=True, num_classes=91)# Configure compute device = Torch.device('cuda' if Torch.cuda.is_available() else 'CPU ' ) # Load the model on the compute device and set it to eval modemodel.to(device).eval()
NOline 16, we initialize the model. Please note that we provide thepre-trained
argument likeTRUE
. NOline 21, we load the model on the computing device and place the model on itAssessment()
Mode. Although a GPU is not really necessary since we will only be working with images, it is better if you have one.
The following block of code defines the transformations that we apply to the images.
# transform to convert the image to tensortransform = transforms.Compose([ transforms.ToTensor()])
We simply convert the images to tensors. We don't need to apply any further transformations to the images before sending them to the Mask-R-CNN model.
Read image and apply instance segmentation
We provide the path to the image as a command line argument. So, let's read the image path from there. The following code block reads the image and applies instance segmentation to it using the Mask-R-CNN model.
image_path = args['input']image = Image.open(image_path).convert('RGB')# Save a copy of the original image for OpenCV functions and apply masksorig_image = image.copy()# transform the image image = transform(image) # add stack dimension image = image.unsqueeze(0). to (device) masks, boxes, labels = get_outputs (image, model, arguments ['threshold']) result = draw_segmentation_map (orig_image, mask, boxes, labels ) # showimagecv2.imshow('Segmented Image', Output)cv2. waitKey(0)# Set path to savesave_path = f"../outputs/{args['input'].split('/')[-1 ] .split('.')[0]}.jpg"cv2 .imwrite(save_path, result)
- NOline 26We capture the image path and then ingest the image with PILline 27. We also keep a copy of the original raw image.line 29.
- Then we apply the transformations toline 32and add an extra stacking dimensionline 34.
- line 36llamémoslos
get_outputs()
the function ofuseful
road map. Here we feed the transformed image into the mask-R-CNN model. And that brings us backmasks
,boxing
, zhang tags
. - NOline 38, llamémoslos
draw_map_segmentation()
Function that overrides segmentation masks for each object in the original image. - We then display the resulting image on the screen.
- NOline 45let's create one
save_route
original input pathname and save the resulting image to disk asline 46.
This is all the code we need to apply Mask R-CNN's deep learning instance segmentation model to images. We are ready to run our code and see the results.
Run the file mask_rcnn_images.py
Let's see how well the Mask R-CNN model can detect and segment objects in images. If you use the downloaded images, be sure to unzip the file and extract its contentsVerboten
Binder. It's okay if you want to use your own images as well.
Let's start with the first picture of theVerboten
Binder. Open your terminal/command prompt andCD
NOOrigen
project directory. Then enter the following command.
python mask_rcnn_images.py --input ../input/image1.jpg
The resulting segmented image is shown below.

The model seems to work very well. Along with all the people in the picture,It is also able to recognize and target the laptop and the potted plant.. Still, the Mask R-CNN model can't fully see the hand of the woman in the middle. All other detections and targets look pretty good too.
Now let's try something that doesn't contain people.
python mask_rcnn_images.py --input ../input/image2.jpg

emFigure 4, we can see that the Mask R-CNN model is very good at detecting and targeting elephants. It is even able to recognize and target a partially visible elephant to the left.
So far everything works perfectly. Now let's look at a case where the Mask-R-CNN model fails to some extent. Let's try the model in the third image.
python mask_rcnn_images.py --input ../input/image3.jpg

Figure 5shows some of the main flaws of the Mask R-CNN model. It fails when you have to target a group of people nearby. Interestingly, the detections are all perfect.But the model cannot segment the boy next to the soldier, the boy on the far right, and the soldier's leg.. Therefore, you cannot segment when the objects are too close together.
You can also try more images if you want and share your results in the comments section.
Summary and conclusion
In this article, you learned about instance segmentation in deep learning. She has hands-on experience applying instance segmentation to images using the PyTorch Mask R-CNN model. I hope you have learned something new from this tutorial.
If you have any questions, ideas or suggestions, please leave them in the comments section. I will definitely contact them.
You can contact me throughContactSection. You can find me on tooLinkedIn, zGore.