WMVL banner
WMVL banner Penn State Mark WMVL banner
WMVL banner

Data & Code
Contact Us
Other Links


Particle Filtering for Face Tracking

by Kyle Brocklehurst

CSE 598 Object Tracking - Fall 2010 - Prof. Robert Collins

Goal: Detect and track multiple faces in video in real time.


Face Detection: I use the cascaded haar-feature face detector from OpenCV 2.1 to identify rectangular regions in the image that contain faces:

Gump GumpFace

I then classify pixels in the image as either skin or not-skin. My criteria for a skin pixel is:

in HSV space, H>245 or H<25 (this means hue is basically redish)

in YCbCr space, 140<Cb<195 and 140<Cr<165 (a region of orange to red to pink in red-difference and blue-difference channels)

If the recangular box for a face contains less than 30% skin, the face is eliminated as a false positive.

Gump GumpSkin

In my results, I perform face detection in the first frame and whenever the F key is pressed. I also tested performing face detection at a given low rate with respect to frame rate. I found that face detection can be performed as often as every 100 frames with minimal lag, but in my results I was more interested in tracking faces that had been detected than performing repetitive detection.


Sampling: In particle filtering, samples are drawn according to a motion model. My motion model predicts the location of the upper-left corner of the box containing the face according to constant velocity and gaussian noise with a standard deviation of 0.2 times the dimensions of the box (so a box that is wider than it is tall will have a greater st.dev. in X than in Y). This image shows sampled positions for two faces, the samples are drawn so the likelihood of drawing a sample from any given location is proportional to the gaussian weight at that location:


Calculating Weight / Likelihood: When a face is first detected, I calculate an RGB histogram to represent that face. In each channel, the histogram has 64 bins, each bin spanning 4 values of the scale from 0 to 255.

When tracking, after sampling according to my motion model in X,Y space, I extract an image region with the same width and height of the face I am tracking at each of the sample points (the point is the location of the upper left corner). I then calculate an RGB histogram for each sample the same way, then correlate the histograms for each channel in the sample with those that I built for the face when it was first detected, let these be called rCorr, gCorr, and bCorr. Then:

similarity = (rCorr*0.4 + gCorr*0.3 + bCorr*0.3); // a weighted sum placing slightly more importance on red (since faces are mostly red)

weight = pow(2.718281828, -16.0 * (1-similarity)); // the weight of a sample is given according to a curve that makes weight drop off steeply as similarity decreases

For some of my results, I simply performed sampling once and moved the updated face to the sample with highest weight, which worked ok but tended to jitter and wiggle a lot.

Sequential Importance Resampling: It is important in particle filtering to resample according to the weights that you found at each point in the last round of sampling.

For example, imagine you are trying to find a something green and you have 5 samples:


Those containing a green dot would probably have very high weight and those with other colors would have a low weight. To find higher concentrations of green, we want to resample around the better previous samples. We do this by drawing new samples according to a cumulative distribution of weights on the previous samples:


The figure above can help visualize the resampling process. The height of the bar for a given sample indicates the likelihood of a resample being performed there. Lots of resamples are taken, and drawing them according to this distribution results in the resamples being clustered around the better samples from the previous frame. In my results, when I resample I identify the previous sample I am working on, then add gaussian noise with a standard deviation of 0.1 times the box dimensions. This is lower than the st.dev. used in the first sampling on a new frame because the resampling can be clustered more tightly to refine better.

Resampling can be performed many times, each time working on the weights and samples from the last round of resampling. I find that 3 rounds of resampling achieves very good results and eliminates a lot of the wiggling from when resampling was not performed.

Dimensionality: Particle filtering is meant to find optimal values in a high dimensional space. For most of my results, I sample in X,Y space, but I also tried sampling in X,Y,W,H space (allowing the width and height of the box to change). This worked well on a video occasionally, but often proved unstable and shrank the box for the face too small - finding tiny rectangular regions that happen to match the RGB histogram for the detected face very well. This could be mitigated by using a better similarity score than histogram correlation, such as NCC of scaled images. However, this would take longer to compute and may reduce the framerate at which the tracking can run.

Results / Future Work: The results for tracking are very good. It is reliable at detecting frontal faces and maintaining tracking in spite of rotation of the head, fast motion, some lighting changes, and brief occlusion. Two things that I would like to improve are 1. the sampling dimensionality (see paragraph above) to use NCC of regions rectified to the same shape as a more reliable similarity measure than RGB histogram correlation and 2. when faces come too close, I would like to sample their next position(s) jointly, punishing the joint samples for becoming too close, and thus avoid cases where one face can "steal" the tracking from another (see the video).

Particle Filtering for Face Tracking from Kyle Brocklehurst on Vimeo.

Code: This project was done in C++ using OpenCV 2.1 libraries. You will need to install OpenCV 2.1 to compile and run this code. [link]

WMVL banner

Maintained by LPAC webmaster

WMVL banner WMVL banner