Instance segmentation with PyTorch and Mask R-CNN (2023)

In this article,You will gain extensive hands-on experience with instance targeting using PyTorch and Mask R-CNN. Image segmentation is one of the main application areas of deep learning and neural networks. One of the most popular image segmentation techniques where we apply deep learning issemantic segmentation. In semantic segmentation, we mask a class in an image with a single color mask. Therefore, different classes have a different color skin. For more information on semantic segmentation, see one of myArticle. But in this article we will focus on the subject.Instance segmentation in Deep Learning with PyTorch and Mask R-CNN.

Take a look at the image below to get a better idea of ​​instance targeting.

Instance segmentation with PyTorch and Mask R-CNN (1)

illustration 1shows how each person in the image on the left has a different colored mask, even though they all belong to thePersonaclassroom. Likewise, all the sheep are masked with different colors.

In this article, we'll try to use instance targeting and get results similar to the ones above.

So what will we learn in this article?

  • We will not train our instance threading model in this tutorial. Instead, we used the PyTorch Mask R-CNN model that was trained onCOCO's record.
  • We will start by learning a little more about the Mask R-CNN model. In particular, we know the input and output format of the model.
  • Then we dive into the coding part with a very detailed explanation.
  • Finally, we will test the Mask R-CNN deep learning model by applying it to images.

PyTorch-Maske R-CNN Model Deep Learning

Before we get into the input and output format of the Mask R-CNN model, let's see what it actually does and how it does it.

How the Mask R-CNN model works

Let's briefly explore how Mask R-CNN works and how it approaches deep learning instances.

We know that in semantic segmentation, each class of an image has a unique color mask. But with instance segmentation, each instance of a class has a different color.

How do we do that then?In simple words we can say thatWe can detect any object present in an image, get its bounding box, classify the object inside the bounding box, and mask it with a unique color. Therefore, instance segmentation is a combination of object detection and image segmentation. It sounds simple, but in practice and training it can get complicated very quickly. The same procedure is also used by the mask-R-CNN model.

Instance segmentation with PyTorch and Mask R-CNN (2)

what do you see in itFigure 2is an example of instance segmentation. You can see each object is detected and then a colored mask is applied to it.

In fact, Mask-RCNN is a combination of the famous Faster-RCNN deep learning object detector and image segmentation. We will not go into the technical details of the model here. But I highly recommend reading the original.Mask R-CNN paper here. And if you want to learn more about image targeting in general, I highly recommend reading one of mine.previous articles on image segmentation. It covers a lot of general things like scoring metrics, some key articles, and application areas of deep learning-based image targeting.

We don't have to worry too much about all the technical details of building such a model. We will use a pre-trained model provided by PyTorch. Therefore, it is much more beneficial to know more about the input and output format of a pre-trained model, which will help us with inference and coding.

The input and output format of the PyTorch Mask R-CNN model

oR-CNN Pretrained Model Maskthat provides PyTorch has a ResNet-50 FPN backbone.

(Video) Instance Segmentation in PyTorch | Mask RCNN

The model expects stacked images for inference, and all pixels must be within the region.[0, 1]. So the input format for the model will be[N, C, H, W]. Herenorteis the number of images or the size of the stack,Cis the dimension of the color channel andH&Care the height and width of the image. It's quite simple and also in the typical PyTorch format.

The model produces a lot of content. Remember that this is a combination of object detection and image segmentation. During inference, the model generates a dictionary list containing the resulting tensors. formally it is oneList[Dict[Tensor]]. And the following are the contents I extractedPyTorch template site.

  • boxing (FloatingTensioner[N,4]): the boxes provided in[x1,y1,x2,y2]format, with values ​​ofXin between0miCand values ​​ofjin between0miH.
  • hang tags(Int64Tensor[N]): the captions provided for each image
  • scores(Tensor[N]): the results or any predictions.
  • masks(UInt8Tensor[N,1,H,W]): the respective masks provided, in0-1Area. To obtain the final segmentation masks, soft mask thresholds can be set, typically with a value of 0.5 (Mask>=0.5).

So the dictionary contains four keys,boxing,hang tags,scores, zmasks. These keys contain the resulting tensors as values. And keep in mind thatWe need to consider mask values ​​greater than or equal to 0.5.

I hope the above details clear up some of the technicalities. If not, it will be much clearer when we actually code our way. Coding and applying the mask-R-CNN model to images will help us understand how it works even better than before. So let's move on.

Project directory structure

Here we know the directory structure of the project. I hope you follow the same structure as this tutorial so that you can continue without any difficulties. Below is the directory structure that we will follow.

├───entrada│ imagen1.jpg│ imagen2.jpg│ imagen3.jpg│├───salidas│└───src │ │ │

So we have three folders.

  • oVerbotenThe folder contains the images on which we will test the Mask R-CNN model.
  • oDeparturesThe folder contains all the segmented images after going through the mask-R-CNN model.
  • And finally we have themOrigenFolder that will contain the python scripts.

You can use any image of your choice to make inferences with the Mask-R-CNN model. However, if you want to use the same images as this tutorial, you can download the zipped input file below. the photos are frompixabay.

After downloading the files, extract them to the main directory of the project.

libraries we need

PyTorchNameit's the only big library we need for this tutorial. I used PyTorch 1.6 for this project. So you can go ahead andDescargar PyTorchif you haven't already.

All other libraries are common machine vision and deep learning libraries that you probably already have. If not, you can install them along the way.

Instance segmentation with PyTorch and Mask R-CNN

Starting in this section, we begin writing code forImage instance segmentation with PyTorch and Mask R-CNN.

Let's start by defining all the COCO dataset class names in a Python script.

(Video) Instance Segmentation MASK R-CNN | with Python and Opencv

The class names of the COCO records

We separate all class names from other Python code to keep our code clean.

Create onecoco_nombres.pyhyphen insideOrigenfolder and put the following list in it.

COCO_INSTANCE_CATEGORY_NAMES = [ '__background__', 'Person', 'Bike', 'Car', 'Motorbike', 'Plane', 'Bus', 'Train', 'Truck', 'Boat', 'Traffic Light', 'Fire' Fire hydrant', 'N/A', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant' , 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A', 'purse', 'tie', 'suitcase' , 'Frisbee', 'ski', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis bat' , 'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange ', 'broccoli', 'carrot', 'hot dog', 'pizza', 'doughnut', 'cake', 'chair', 'sofa', 'potted plant', 'bed', 'N/A' , 'dining table', ' N/A', 'N/A', 'Bathroom', 'N/A', 'TV', 'Laptop', 'Mouse', 'Remote', 'Keyboard', ' Telephone', 'Microwave', 'Oven', 'Toaster', 'Sink', 'Fridge', 'N/A', 'Book', 'Praise', 'Jar ron', 'scissors', 'teddy bear', 'hair dryer', 'toothbrush' ']

That's all we need for this python script. We will import it whenever we need it.

Writing some helper functions for instance threading

Now let's set up the utility script that will help us a lot in the tutorial. It basically contains all the important functions like stepping the image through the model and applying the segmented mask to the image.

Things will become clearer as we write the code. So let's go right in.

All this code goes toutils.pyhyphen insideOrigenPasta.

Below are the imports we need.

import cv2import numpy as npimport randomimport tochafrom coco_names import COCO_INSTANCE_CATEGORY_NAMES as coco_names

Note that we import

We have a total of 91 classes for segmentation and detection. And we want each object of each class to have a different color mask. In short, we want each object to have a different color mask.

We need to generate a different RGB tuple for each of the detected objects in an image. The following simple line of code does that for us.

# This will help us create a different color for each class COLORS = np.random.uniform(0, 255, size=(len(coco_names), 3))

We can use the colors generated above in the OpenCV drawing functions.

Function to retrieve the outputs.

We will write a simple function to get the results of the model after the inference. This function provides us with all the necessary output tensors for the correct visualization of the results. Let's call this functionget_outputs().

The definition of the function follows.

def get_outputs(image, model, threshold): with archote.no_grad(): # Pass the image through the module ().cpu().numpy()) # Index of scores that are above a certain threshold. threshold_preds_inidices = [scores.index(i) for i on scores if i > threshold] threshold_preds_inidices = len(threshold_preds_inidices) # get as mask = (outputs[0]['masks']>0.5).squeeze().detach( ) .cpu().numpy() # discard masks for objects below threshold mask = mask[:thresheld_preds_count ] # preserve bounding frames, on (x1, y1), (x2, y2) format frames = [[( int (i[0 ]), int(i[1])), (int(i[2] ), int (i[ 3]))] for i in Outputs[0]['boxes'].separate( ) .cpu() ] # Discard bounding boxes below threshold boxes = boxes[:thresheld_preds_count] # get class labels labels = [coco_names[i] for i in outputs[0]['labels']] returns masks, boxes, labels

oget_outputs()The function accepts three input parameters. The first is the entrance.image, the second is R-CNN maskModel, and the third is theShouldWorth. The threshold is a predefined value below which we discard all outputs to avoid too many false positives. Let's see the code step by step.

  • NOline 12, we got themDeparturesAdvance by guiding the image through the model. It gives us a list with a dictionary.
  • NOline 15, we have it allscoresfrom the dictionary and load it into the CPU.
  • also reinline 17, Havethresholds_preds_inidices. Contains the full index value of thescoreswho are above thatShouldthat we provide.
  • Get the length of the above listumbral_preds_countwill help us extract all the masks and bounding boxes with just these values.
  • NOline 20, we got themmasksthat are greater than or equal to 0.5.
  • line 22discard all skins that are not within the scoring threshold. We only keep skins that are at least above the threshold.
  • NOLines 25 and 27, we preserve the bounding boxes and discard all low-score box detections similar to how we did with masks.
  • miline 30Get all tag names from COCO dataset mapping resultshang tagsindices withCoco_NameList.
  • we finally give it backmasks,boxing, zhang tags.

I hope you have understood the above steps. Try to check them again and you will surely get them.

(Video) Mask RCNN finetuning using PyTorch. PennFudan dataset. Image Instance segmentation on Custom dataset

Apply segmentation and draw a bounding box

Once we have the labels, masks, and bounding boxes, we can apply color masks to the object and also draw the bounding boxes.

Again, we write a very simple function for this. the function isdraw_map_segmentation()accepts all four input parameters. they areimage,masks,boxing, zhang tags. oimageis the original image to which we apply the resultmasksand draw bounding boxes around the detected objects. He toohang tagswill help us put the class name at the top of each object.

The definition of the function follows.

def draw_segmentation_map(image, mask, boxes, labels): alpha = 1 beta = 0.6 # transparency for segmentation map gamma = 0 # scalar added to each sum for i in range (len(masks)): red_map = np. zeros_like(mask[i]).astype(np.uint8) green_map = np.zeros_like(masks[i]).astype(np.uint8) blue_map = np.zeros_like(masks[i]).astype(np.uint8) # apply a random color mask to each object color = COLORS[random.randrange(0, len(COLORS))] red_map[masks[i] == 1], green_map[masks[i] == 1], blue_map[ masks [i] == 1] = color # combine all masks into a single image segmentation_map = np.stack([red_map, green_map, blue_map], axis=2) # convert the original PIL image to NumPy format image = np. array (image ) # convert from RGN to OpenCV BGR format image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # mask the image cv2.addWeighted(image, alpha, segmentation_map, beta, gamma, image) # the bounding boxes around objects are drawn around cv2.rectangle(image, box chen[i][0], boxes[i][1], color=color, thickness=2) # Put the label text over the objects cv2.putText (image , label[i], (boxes[i][0 ] [0], boxes[i ][0][1]-10), cv2.FONT_HERSHEY_SIMPLEX, 1, color, weight=2, linetype= cv2.LINE_AA) Return Image
  • First we have toAlfa,Beta, zRollo. Here,AlfamiBetaDefine the weights of the original image and the segmentation map when we overlay the segmentation on the image.Rollois the scalar that is added to each sum, and keeping it at 0 is optimal in almost all cases. For more details seeOpenCV documentation.
  • Vonline 36, we started aforLoop for the number of skins we have.
  • NOLines 37, 38 and 39we define three NumPy arrays containing all zeros whose dimensions match those of the current mask.
  • also reinline 42, we get a random color tuple ofKERNList.
  • line 43apply aboveKoron the object to create a mask. NumPy arrays now contain some color instead of just being black.
  • line 45stack theRed card,green card, zblue_mapto get the full slice map for the current object.
  • NOline 47, we convert the original image from the PIL image format to the NumPy format and then convert it to the OpenCV BGR color format inline 49.
  • NOline 51, we combine the image and the segmentation map. Basically, we overlay the segmentation map on top of the original image with a weight of 0.6. This gives us a translucent map in the image and we can easily deduce which object is really there.
  • Lines 54 and 57Draw the bounding boxes and draw the title name for each of the current object.
  • Finally, we return the resulting image inline 61.

The two features mentioned above were the most important parts of this tutorial. If you're with me this far, the rest of the article is pretty easy to follow.

Apply R-CNN mask to images

Now let's write the code to apply the Mask-R-CNN model to the images of our choice. This part is going to be pretty easy since we've written most of our logic inutils.pyroad map.

All this code goes tomask_rcnn_images.pyArchive.

Let's start with the imports we need.

Import from TorchImport from TorchvisionImport from cv2Import gparsefrom PILImport from Imagefrom utilImport from draw_segmentation_map, get_outputsfrom archive.transforms Import transformations as transformations

We will provide the path to the input image using command line arguments. So now let's define our argument parser.

parser = argparse.ArgumentParser()parser.add_argument('-i', '--input', required=True, help='path to input data')parser.add_argument('-t', '--threshold ' , default=0.965, type=float, help='score threshold to remove detection') args = vars(parser.parse_args())

We also have the optional threshold in the code block above. By default, we drop all detections with a score less than 0.965. If you want, you can increase or decrease the value.Note, however, that increasing the value too much can cause objects to go unnoticed. And setting the value too low can also lead to many false alarms.

Prepare the model and define the transformation

The next step is to prepare our mask-R-CNN model.

# Initialize the model model = archvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True, progress=True, num_classes=91)# Configure compute device = Torch.device('cuda' if Torch.cuda.is_available() else 'CPU ' ) # Load the model on the compute device and set it to eval

NOline 16, we initialize the model. Please note that we provide thepre-trainedargument likeTRUE. NOline 21, we load the model on the computing device and place the model on itAssessment()Mode. Although a GPU is not really necessary since we will only be working with images, it is better if you have one.

The following block of code defines the transformations that we apply to the images.

# transform to convert the image to tensortransform = transforms.Compose([ transforms.ToTensor()])

We simply convert the images to tensors. We don't need to apply any further transformations to the images before sending them to the Mask-R-CNN model.

Read image and apply instance segmentation

We provide the path to the image as a command line argument. So, let's read the image path from there. The following code block reads the image and applies instance segmentation to it using the Mask-R-CNN model.

image_path = args['input']image ='RGB')# Save a copy of the original image for OpenCV functions and apply masksorig_image = image.copy()# transform the image image = transform(image) # add stack dimension image = image.unsqueeze(0). to (device) masks, boxes, labels = get_outputs (image, model, arguments ['threshold']) result = draw_segmentation_map (orig_image, mask, boxes, labels ) # showimagecv2.imshow('Segmented Image', Output)cv2. waitKey(0)# Set path to savesave_path = f"../outputs/{args['input'].split('/')[-1 ] .split('.')[0]}.jpg"cv2 .imwrite(save_path, result)
  • NOline 26We capture the image path and then ingest the image with PILline 27. We also keep a copy of the original raw image.line 29.
  • Then we apply the transformations toline 32and add an extra stacking dimensionline 34.
  • line 36llamémoslosget_outputs()the function ofusefulroad map. Here we feed the transformed image into the mask-R-CNN model. And that brings us backmasks,boxing, zhang tags.
  • NOline 38, llamémoslosdraw_map_segmentation()Function that overrides segmentation masks for each object in the original image.
  • We then display the resulting image on the screen.
  • NOline 45let's create onesave_routeoriginal input pathname and save the resulting image to disk asline 46.

This is all the code we need to apply Mask R-CNN's deep learning instance segmentation model to images. We are ready to run our code and see the results.

(Video) Train Mask R-CNN for Image Segmentation (online free gpu)

Run the file

Let's see how well the Mask R-CNN model can detect and segment objects in images. If you use the downloaded images, be sure to unzip the file and extract its contentsVerbotenBinder. It's okay if you want to use your own images as well.

Let's start with the first picture of theVerbotenBinder. Open your terminal/command prompt andCDNOOrigenproject directory. Then enter the following command.

python --input ../input/image1.jpg

The resulting segmented image is shown below.

Instance segmentation with PyTorch and Mask R-CNN (3)

The model seems to work very well. Along with all the people in the picture,It is also able to recognize and target the laptop and the potted plant.. Still, the Mask R-CNN model can't fully see the hand of the woman in the middle. All other detections and targets look pretty good too.

Now let's try something that doesn't contain people.

python --input ../input/image2.jpg
Instance segmentation with PyTorch and Mask R-CNN (4)

emFigure 4, we can see that the Mask R-CNN model is very good at detecting and targeting elephants. It is even able to recognize and target a partially visible elephant to the left.

So far everything works perfectly. Now let's look at a case where the Mask-R-CNN model fails to some extent. Let's try the model in the third image.

python --input ../input/image3.jpg
Instance segmentation with PyTorch and Mask R-CNN (5)

Figure 5shows some of the main flaws of the Mask R-CNN model. It fails when you have to target a group of people nearby. Interestingly, the detections are all perfect.But the model cannot segment the boy next to the soldier, the boy on the far right, and the soldier's leg.. Therefore, you cannot segment when the objects are too close together.

You can also try more images if you want and share your results in the comments section.

Summary and conclusion

In this article, you learned about instance segmentation in deep learning. She has hands-on experience applying instance segmentation to images using the PyTorch Mask R-CNN model. I hope you have learned something new from this tutorial.

If you have any questions, ideas or suggestions, please leave them in the comments section. I will definitely contact them.

You can contact me throughContactSection. You can find me on tooLinkedIn, zGore.

(Video) Mask R-CNN Practical Implementation


1. Instance Segmentation with Mask RCNN (Intro to Computer Vision Part 4)
2. Instance Segmentation Using Mask R-CNN on Custom Dataset
(Code With Aarohi)
3. Instance Segmentation using Mask-RCNN with PixelLib and Python
(Nicholas Renotte)
4. Object segmentation using Mask-RCNN in PyTorch for Images.
(Datum Learning)
5. 283 - What is Mask R-CNN?
6. Instance Segmentation using Mask RCNN with Python
(Based Sensei)


Top Articles
Latest Posts
Article information

Author: Madonna Wisozk

Last Updated: 20/08/2023

Views: 6134

Rating: 4.8 / 5 (68 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Madonna Wisozk

Birthday: 2001-02-23

Address: 656 Gerhold Summit, Sidneyberg, FL 78179-2512

Phone: +6742282696652

Job: Customer Banking Liaison

Hobby: Flower arranging, Yo-yoing, Tai chi, Rowing, Macrame, Urban exploration, Knife making

Introduction: My name is Madonna Wisozk, I am a attractive, healthy, thoughtful, faithful, open, vivacious, zany person who loves writing and wants to share my knowledge and understanding with you.