In this tutorial, the estimate of a human pose based on deep learning uses the OpenCV. We will explain in detail how a temporary coffee model is used, which the Keytoint Challenge Coconut won in its own application in 2016, which happens under the bonnet.
This publication was tested in OpenCV 4.2.
1. estimate pose (also known as Chau's Point Detection)
The estimate of the pose is a general problem in the arithmetic vision in which we recognize the position and alignment of an object. This means recognizing points of points that describe the object.
For example, in the problem of estimating the pose of the face (also as facial milestones) we recognize milestones in a human face. We distributed ourselves far beyond the topic.Detection of facial brands with OpenCVmiDetection of facial frames using DLIB)
A related problem isHead position estimateWhere we use facial milestones to get the 3D orientation of a human head of the camera.
In this article we will concentrate on the estimated human pose in which it is necessary to identify and locate the main body parts (for example, shoulders, ankles, knees, pulse, etc.).
Do you remember the scene in which Tony Stark Iron Suit with gestures used?
If this process is created, it would require an estimate of the human pose!
For the purposes of this article, however, we will reduce our ambition and solve a easier problem to recognize important points in the body. A typical output of a pose detector can be seen below:
Generative Mestre for CV
Get special instructions, advice and internal tricks. Call impressive images, learn diffusion models for fine adaptation models, advanced image processing techniques such as painting, instructions from Pix2Pix and more
1.1.S -Punkt recognition data
Until recently, due to the lack of high -quality data records, there was hardly any progress in estimating the property. Seta is the enthusiasm in the AI, in which people believe that all problems are only a good amount of data that have to be torn down.The challenging data published in recent years, which made it easier to attack the problem with all its intellectual power.
Some of the data records are:
If we lose an important data record, mention in the comments and we are happy to be included in this list.
2. represent the pose of the estimation model of several people
The model used in this tutorial is based on an article with the titlePut the estimate of several peoplethrough the perception computer laboratory of Carnegie Mellon University. The authors of paper schools a very deep neural network for this task. Leave us briefly analyze architecture before explaining how the pre-training model is used.
2.1.
The model takes an image in the size of the size of g -fach and as an output creates the 2D points of the points -Chave for each person in the picture. The detection takes place in three steps:
- Level 0: The first 10 VGGnet layers are used to create cards of functions for the input picture.
- Level 1: A CNN of several levels of 2 branches is used if the first branch provides for a set of 2D trust -worthy maps of body parts (for example elbow, knee, etc.).Chau -Links shoulder.
The second branch ensures a set of 2D (L) vector fields of parts that encodes the degree of association between the parts. In the following figure, the affinity of the part between the neck and the left shoulder is shown.
Level 2: Greedy inference is analyzed to create 2D key points for all people in the picture by greedy inference.
This architecture won the Coco Keypoint Challenge in 2016.
2.2 Previous models for the estimate of the human pose
The authors of the article have trained two models in the database of several people (MPII) and the other in the coconut database. The Coco model delivers 18 points, while the MPII model scores 15 points. In one person he shows the picture below.
Coconut output formatNose - 0, neck - 1 right shoulder - 2, right elbow - 3, right doll - 4, left shoulder - 5, left elbow - 6, left wrist - 7, right hip - 8, right knee - 9, right ankle -10, left hip - 11, left knee - 12, lankle - 13, right eye - 14 left eye - 15, right ear - 16, left ear - 17, below - 18El Antiguo mpii mpii del formato mpii mpii mpiiiHead - 0, neck - 1 right shoulder - 2, right elbow - 3, right doll - 4, left shoulder - 5, left elbow - 6, left wrist - 7, right hip - 8, right knee - 9, right ankle -10, left hip - 11, left knee - 12, left ankle - 13, breast - 14, below - 15
You can download the model weight files with the scripts specifiedThis local.
3. Code for human pose without OpenCV
In this section we will see how the trained models loaded in OpenV and check the outputs. We will only discipline the code for an estimate of a single person to simplify things. As we have seen in the previous section, the output is made upTrusty cards and affinity cards. This results can be used to find the pose for all people at a meeting when several people are present. We will appear in a future publication in the event of several people.
First download the coded files and the model below. There are separate files for image and video inputs. Analyze the readme file if you have difficulty carrying out the code.
3.1.aAVE 1: Download the pesos of the model
Use the GetModels.SH file provided with the code to download all model weights from the respective folders, and it is verified that protocol Figuration files are already available in folders.
In the command line, carry the following in the downloaded folder.
sudo chmod a+x getModels.sh./getmodels.sh
Take a look at the folders to ensure that the model binaries have been downloaded (files. Caffemodel). If you cannot do the previous script, you can download the model by clicking on itHerepara modilo mpii eHereFor coconut model.
3.2 Step 2: Load red
We use models that are trained in the deep learning structure of coffee.
- . Prototx file that indicates the architecture of the neural network: how the different layers are organized, etc.
- .Faffemodel file in which the pesos of the trained model are stored
We will use these two files to load the network in storage.
Download codeTo simply follow this tutorial, download the code by clicking the button below. It is free!
Download code
C ++
// Specrifique Las Rutas Para 2 filestring protofile = "pose/mpi/pose_deploy_linevec_faster_4_stages.prototxt"; String WeightsFile = "Pose/mpi/pose_iter_160000.caffemodel"; // la Red de la Red de memoria
Felshaken
# Especifique las rutas para los 2 archivosprotofile = "pose/mpi/pose_deploy_linevec_faster_4_stages.prototxt" WeightsFile = "Pose/mpi/pose_iter_160000.caffemodel" protofile und mutmaßung = cv2.dnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnte hatte.
3.3.have 3: Read the picture and prepare the entrance to the network
The input plate that we read with OpenCV must become a blob input (e.g. coffee) so that the network can feed.The format converted. The parameters must be specified in the Blobfrom image function., 0, 0). There is no need to change the R and B channels, since the BGR OpenCV and coffee format uses the BGR format.
C ++
// mat frame = imreead ("single.jpg");// indicate the dimensions of the input image.;// Define the object that was created as Networknet.Setinput (inpblob).
Felshaken
# ImageFrame = cv2.imRead ("single.jpg")# indicate the dimensions of the input (Emwidthth, (Emwidthth, harmful), (0, 0, 0), swaprb = false,
3.4.
As soon as the image has been handed over to the model, you can make predictions with a single code line.Advance paymentThe method for the DNN class in OpenCV makes a passage through the network, which is just a different option to say that it makes a forecast.
C ++
Mat = net.forward output ()
Felshaken
salida = net.forward ()
The output is a 4D matrix:
- The first dimension is the image -id (if you hand over more than one picture to the network).
- The second dimension indicates the index of a jacket point. The model creates trustworthy cards and foot affinity cards. The same way produces it for MPI 44 points. We will only use the first points that meet the points of the points.
- The third dimension is the height of the output map.
- The fourth dimension is the width of the output map.
We check whether every jacket is present in the picture or not. We find the location of the jacket point and find the maximum of the trust card of this main point. We also use a limit to reduce incorrect recognitions.
As soon as the jacket points have been recognized, we simply draw them in the picture.
C ++
It is the HH -Holh.Bukue [branch of yk on = mulkes 3: And some read their lish yoksker sanker (puckerker; n do y.: 10: 2) 43 :; to Geolo Pötuch, 0, 5, & mlidie)Mém) mmlome; Give 1 min :: (al) p ..: c minine (ant) p ..
Felshaken
H = out.shape [2] w = out.shape [3] # empty list to save the most important key points = [] for i in the interval (len ()): # trust card of the body Partial correspondent. Probmap = output [0, i ,::: Find the global probmap.minval, pro, minloc, point = cv2.minmaxloc (probmap) # scale the point that in the original image x = (ardwidth * point [point [point [point [point [point [point[Point) should be set.Point [[point [0]) / W y = (structure * point * [1]) / h probably> threshold: cv2.circle (table, (x), int (y)), 15, (0, 255,255), thickness = -1, linetyp = cv.felled) CV2.PutText (table, "{}". Format (i), (x), int (y)), cv2.font_hershey_simplex, 1.4, (0,, 0), 255), 3, linetyp = cv2.line_aa) # Add the point of the Sea.
3.5.have 5: Draw skeleton
As we know the points of the previous points, we can draw the skeleton when we have the most important points by simply joining our colleagues. This is done with the code below.
C ++
for (int n = 0; n <npairs; n ++) {// ParkUp 2 connected points connected point2fpart = points [pose_pairs [n] [0]]; Point2f parb = points [pose_pairs [1]]];if (part.x <= 0 || part.y <= 0 || partb.x <= 0 || partb.y <= 0) Continued; line (Table, Partab, Climb (0.25.255), 8);(Table, output, 8, climbing (0.0.255), -1); circle (table, part, 8, climbing (0.0.255), -1);};
Felshaken
For passion in Posse_pairs: Part = for [0] Partb = pair [1] if points [part] and points [partb]: cv2.line (framecopy, points [part], points [partb], (0, 255,0), 3)
Take a look at the video screening with the code video version. We will find that the coconut model is 1.5 times slower than the MPI model. This is expected because we use a 4 -stepp version.
If you have ideas from some interesting applications that use these methods, mention in the comments!
Subscribe and download the code and download it
If you liked this article and want to download the code (C ++ and Python) and the sample pictures used in this publication, click here. Register as an alternative to get a free guide for computer vision resources.We share tutorials and OpenCV examples that were written in C ++/Python and algorithms and messages about computing and automatic learning.
Download the sample code
Additional references and reading
Original YouTube Videolink, which is used in the example video
Use of an open use
Posing recognition role
Real -Time real -time estimate
OpenCV DNN -Moduce
Download coffee models in OpenCV