fccjxxw.com
非常超级学习网 学习超级帮手
当前位置:首页 >> >>

5750 NW Pacific Rim Blvd.


BAYESIAN METHODS FOR FACE RECOGNITION FROM VIDEO Rama Chellappa, Shaohua Zhou ? Center for Automation Research EE Department, University of Maryland College Park, MD 20742
ABSTRACT Face recognition (FR) from video necessitates simultaneously solving two tasks, recognition and tracking. To accommodate the video, a time series state space model is introduced in a Bayesian approach. Given this model, the goal reduces to estimating the posterior distribution of the state vector given the observations up to the present. The Sequential Importance Sampling (SIS) technique is invoked to generate a numerical solution to this model. However, the ultimate goal is to estimate the posterior distribution of the identity of humans for recognition purposes. Presented here are two methods to approximate the above distribution under different experimental scenarios. 1. INTRODUCTION Bayesian analysis of video has recently gained signi?cant attention in the computer vision community since the seminal work by Isard and Blake [1]. In their effort to solve the problem of visual tracking, they introduced a time series state space model parameterized by a tracking state vector (e.g. af?ne parameters) and developed the CONDENSATION algorithm to provide a numerical approximation to the posterior distribution of the state vector, and to propagate it over time according to the state equation. This has been extended to many areas [2, 3], including face recognition [4, 5, 6]. Refer to [7, 8] for surveys and [9] for experiments on face recognition. Experiments reported in [9] evaluate still-to-still scenarios, where the gallery and the probe consist of both still facial images. Some well-known still-to-still FR approaches include Principal Component Analysis (PCA) [10], Linear Discriminant Analysis (LDA) [11, 12], and Elastic Graph Matching (EGM)[13]. Typically, recognition is performed based on an abstract representation of an image after suitable geometric and photometric normalizations are performed. Following [9], we de?ne the gallery and probe as follows: the gallery consists of still facial templates and and the probe consists of video sequences containing the facial region. There are many instances where still-to-video algorithms are useful. Denote the gallery set as ? ? ? ?? ?? , indexed by the identity vari? ? ? . able ?, which lies in a ?nite sample space We also adopt the time series state space model to characterize the evolving dynamics or/and identity in the probe video. Let ?? be the state vector and ?? be the observation respectively at time ?. Given this model, the goal reduces to computing the posterior distribution of the state vector given the observations up to time

Baoxin Li Sharp Laboratories of America 5750 NW Paci?c Rim Blvd. Camas, WA 98607
?? ??? ?? ? ? with ?? ? ? ? ?? ?? . The SIS technique can be invoked to generate a numerical solution. Ultimately, we need to estimate the posterior distribution of the identity, ? ??? ? ?? ??? ?? ? ?, where ?? is the human identity variable at time ?. Presented here are two methods for approximating the distribution ? ??? ? under different experimental scenarios but same still-to-video setup. Method I [5] parameterizes the model with only an af?ne tracking state, denoted by ? , and approximates and propagates ? ? ? ? using the SIS algorithm. The distribution ? ??? ? is estimated by marginalizing ? ? ? ? over a proper af?ne region around the posterior mean ??? ?. Method II [6] parameterizes the model with the af?ne tracking state ? and the recognizing identity variable ?? , approximates and propagates the joint distribution ? ? ? ?? ? using the SIS algorithm. The distribution ? ??? ? is a free estimate from ? ? ? ?? ?, i.e., the true marginal distribution of ? ? ? ?? ?. Section 2 introduces a general time series state space model and brie?y reviews the SIS algorithm that approximates its solution. Sections 3 and 4 respectively describe the experimental scenarios and presents the two aforementioned methods and their results. Section 5 concludes the paper. ?, denoted by ? ??? ?

2. SIS ALGORITHM A general time series state space model consists of the following three components: 1. State equation governing the state evolution:
??

? ????? ?? ? ?

?

(1)

where ?? is the state noise and ? ? ? the state evolving function. Denote the state transition probability as ?? ??? ???? ?. 2. Observation equation depicting the observational behavior:
??

? ??? ?? ? ?

?

(2)

where ?? is the observation noise and ? ? ? the observation function. Denote the likelihood as ?? ??? ?? ?. 3. Prior probability ?? ??? ? and statistical independence:
?? ? × ?? ?× ? ×
?

?

&?

×

(3)

? Partially supported by the DARPA Grant N00014-00-1-0908.

Using this model, we attempt to compute the ?ltering posterior probability ? ??? ? ???? ?? ? ?. If the model is linear with Gaussian noise, it is analytically solvable by a Kalman ?lter which essentially propagates the mean and variance of a Gaussian distribution over time. For nonlinear and non-Gaussian cases, an extended Kalman ?lter (EKF) is proposed to arrive at an approximate

analytic solution. Recently, the SIS technique, a special case of Monte Carlo method, [1, 14, 15, 16] has been used to provide a numerical solution and to propagate an arbitrary distribution over time. The essence of Monte Carlo method is to represent an arbitrary probability distribution ??? closely by a set of discrete samples. It is ideal to draw i.i.d. samples ???? ? ? from ???. However ? it is often dif?cult to implement, especially for non-trivial distributions. Instead, a set of samples ???? ? ? is drawn from an ? importance function ??? which is easy to sample from, then a weight ??? ??? ??? ? ?? ? ?? ? (4) is assigned to each sample. This technique is called Importance Sampling (IS). It can be shown[16] that the importance sample set ??? ?? ???? ? ? ? is properly weighted to the target dis? tribution ???. To accommodate a video, importance sampling is used in a sequential fashion, which leads to SIS. SIS propagates ??? according to the sequential importance function ? ??? ???? ?, and calculates the weight using

measurement. The likelihood is assumed to be time-invariant and modeled as a truncated Gaussian:
???? ? ?
?

?

?

?

??

??

??

??

??

?

?? ? ?

?

??

?? ×

?

?

(6) where ? is a threshold and ? a constant. The error ? is computed as ? ? ? ? ? ?? ?× (7) ? ? ? ? ? ?? ?× ??

? ?

? ?

??

???? ?? ??? ?? ??? ??? ???? ?

? ??? ???? ?

(5)

where ?? is the number of jets, ?? the jet for the -th grid point ? ? in the template and ?× its counterpart in the current frame. It ? ? needs to be emphasized that ?× is found by ?rst applying to the grid the af?ne motion with ? followed by a local search. Using SIS, the tracking problem can be numerically solved by approximating ? ? ? ?. For pure tracking, we let the template be the facial part in the ?rst frame. Fig. 2 shows some pure tracking results. For both tracking and recognition, we use templates in the gallery set. In order to evaluate ? ??? ?? for template ? in the gallery, we ?rst invoke SIS to obtain ? ? ? ?, then compute it as follows:
? ???
??

? ?

For a complete description of the SIS method, refer to [14, 16]. 3. METHOD I This method[5] has been tested on a database containing ? subjects. In building the database, each person was asked to sit on a chair at a ?xed distance from the camera so that the scale was approximately the same for all persons, and to move his/her head and make any desired facial expression, simulating an automatic teller machine or access control scenario. Fig. 1 shows some sample frames from a probe video and some templates in the gallery.

? ? ??

?

(8)

where is a proper region interval around the posterior mean ? ? ?. The complete algorithm I is summarized below. Algorithm I Initialization: Rectify the template grid onto the ?rst frame using EGM. Draw ? random samples from ?? ? ? ?. Tracking and Recognition: ?, invoke the SIS algorithm to obtain Tracking: at time ? an updated set of samples for ? ? ? ?. To compute the likelihood of each sample, a local search around each node is performed to account for the deformation before computing the matching error. Recognition and Mean Shape Evaluation: at any time ? ?, the tracked set of jets is given by ? ? ?, plus a local search; and a ?nal matching score is calculated using the mean shape; the ? ? ?. posterior probability is computed in an interval around Fig. 2 shows pure tracking results with tracked grid points superimposed on the image. Even under dif?cult situations as shown in Fig. 2, tracking is successfully maintained. Fig. 3 shows the posterior probability ? ??? ? and the matching scores computed using the mean shape.

Fig. 1. I. Sample frames from the video (top) and templates (bottom). The templates will be referred to as face 1, face 2, ..., etc., counting from left, and obviously face 3 in the bottom row is the true hypothesis.

The state vector ?? is taken to be af?ne tracking parameter ? , which obeys a ?rst-order Gaussian-Markov model, i.e., the transition probability, assumed time-invariant ?? ? ??? ?, obeys a Gaussian distribution. In addition, a local deformation is introduced to account for the residual motion due to inaccuracies in af?ne modeling and other factors such as facial expressions. The observation ?? is taken to be Gabor-?ltered jets [13] de?ned on a sparse grid, shown in Fig. 2. Note that it is the grid in the template image that undergoes the af?ne motion and local deformation. The local deformation is implemented by performing a local search around each grid point for its best match when updating the likelihood

Fig. 2. I. Tracking results. Note the rotation in depth in the upper row and the large in-plane rotation in the lower row.

Posterior Probability vs Frame Index 1.05 1 0.95 0.95 0.9 0.85 0.85 0.8 0.75 0.7 0.75 0.65 0.6 0 20 40 60 80 0.7 0 0.8 0.9 1

Posterior Probability vs Frame Index

20

40

60

80

Matching Score vs Frame Index 63 62 61 60 59 58 57 56 55 0 20 40 60 80 63 62 61 60 59 58 57 56 55 0

Matching Score vs Frame Index

20

40

60

80

Fig. 3. I. Posterior probabilities and matching scores. Solid line is from the true hypothesis (face 3 in Fig.2). Dashed lines are from face 1 (left) and face 5 (right) corresponding to wrong hypotheses, respectively.

Fig. 5. II. Example frames in one probe video. The image size is 720x480 while the actual face size ranges approximately from 20x20 in the ?rst frame to 60x60 in the last frame.

4. METHOD II This method[6] has been tested on a database collected as part of the HumanID project by by National Institute of Standards and Technology and University of South Florida researchers. It contains ?? subjects walking towards a camera in order to simulate typical scenarios in visual surveillance. There are 30 subjects, each having one face template in the gallery and one video in the probe. The complete face gallery is shown in Fig. 4. Fig. 5 gives some example frames in one probe video. Note that the gallery was captured under different lighting circumstances from the probe and that the face in the probe is of low resolution, small size, and considerable scale change.

or eigenfaces (see Fig. 4 for the top 10 eigenfaces) and it is modeled as a transformed, noise-corrupted version of some template in the gallery, i.e., ??? ? ? ??? · ?? , where ? ? is timeinvariant image transformation function. Assume the likelihood to be a time-invariant truncated Laplacian.
???? ? ?? ?

??
?

???

?

??

? ??

?? ?? ×

?

(9)

where ? is a threshold and ? a constant. By employing the SIS technique, the joint distribution of the state vector and the identity variable, ? ? ? ?? ?, is estimated at current time and then propagated to the next, governed by the evolving equations for the state vector and the identity variable. The posterior distribution of the identity variable, ? ??? ?, is just a free estimate, i.e., the marginal of ? ? ? ?? ?. Algorithm II is summarized below. We have worked with two versions of Algorithm II. Algorithm IIa is a brute-force implementation; Algorithms IIb is a more ef?cient implementation. Details are in [6]. Algorithm II Initialization: Draw ? random samples jointly from ?? ? ? ? and the uniform prior ?? ??? ?.. Tracking and Recognition: Tracking: at time ? ?, invoke the SIS algorithm to obtain an updated set of samples for ? ? ? ?? ?. ?, marginalizing ? ? ? ?? ? Recognition: at any time ? over ? gives rise to ? ??? ?. Conditional entropy ? ??? ?? ? ? and MMSE estimate of ? are computed accordingly. Fig. 6 presents the plot of the posterior probability ? ??? ? against frame instance for probe video shown in Fig. 5. Suppose that the correct identity is . From Fig. 6, we can easily observe that the posterior probability ? ? ? increases as time proceeds and eventually approaches 1, and all others ? ??? ? go to 0 ?nally. Refer to [6] for a justi?cation for such convergence and more detailed discussions on the evolution of ? ??? ?.

Fig. 4. II. The face gallery (upper) with image size 48x42. The top 10 eigenfaces (lower).

The time series state space model is now parameterized by both af?ne tracking parameters and identity variable, respectively characterizing the dynamics and identity of human, i.e., ?? ?? ? ? ??? ??? ??? ???? ?. We assume ? ? ?? ?. So, ?? ??? ???? ? that ?? ? ? ??? ? is a time-invariant Gaussian, and that there is temporal invariance in the identity, i.e., ?? ??? ???? ? ??? ???? ?, where ? ? is an indicator function. The observation ?? is taken to be a reconstructed image from top 300 principal components

?

To change a viewing angle, we use the notion of entropy [17], which essentially measures the average uncertainty about a random variable. It is well known that among all distributions taking values on ? ? , the uniform distribution yields maximum ?? ? ? and the degenerate case yields the minimum ?, i.e., ? ? ?? ? ? . In the context of this problem, conditional entropy ? ??? ?? ? ? captures the evolving uncertainty of the identity variable given observations ?? ? . However, the knowledge of ???? ? ? is needed to compute ? ??? ?? ? ?, we simply assume that it is degenerate in the actual observations ?? ? since we observe only this particular sequence, i.e., ???? ? ? ? ??? ? ?? ? ?. Now,

gies. In addition, the still templates in the gallery can be generalized [18] to videos. 6. REFERENCES [1] M. Isard and A. Blake, “Contour tracking by stochatic propagation of conditional density,” Proc. of ECCV, 1996. [2] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel approach to nonlinear/non-gaussian bayesian state estimation,” IEE Proceedings on Radar and Signal Processing, vol. 140, pp. 107–113, 1993. [3] G. Qian and R. Chellappa, “Structure from motion using sequential monte carlo methods,” Proc. of ICCV, pp. 614– 621, 2001. [4] B. Li and R. Chellappa, “Simultaneous tracking and veri?cation via sequential posterior estimation,” Proc. of CVPR, pp. 110–117, 2000. [5] B. Li and R. Chellappa, “Face veri?cation through tracking facial features,” Submitted to JOSA 2001. [6] S. Zhou and R. Chellappa, “Probabilistic face recgnition from video,” Submitted to ECCV 02. [7] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and machine recognition of faces, a survey,” Proc. of IEEE, vol. 83, pp. 705–740, 1995. [8] W. Y. Zhao, R. Chellappa, A. Rosenfeld, and P. J. Phillips, “Face recognition: A literature survey,” UMD CAR-TR-948, 2000. [9] P. J. Philipps, H. Moon, S. A. Rivzi, and P. J. Rauss, “The feret evaluation metholodogy for face-recognition algorithms,” IEEE Trans. PAMI, vol. 22, no. 10, pp. 1090–1104, 2000. [10] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neutoscience, vol. 3, pp. 72–86, 1991. [11] K. Etemad and R. Chellappa, “Discriminant analysis for recognition of human face images,” Journal of Optical Society of America A, pp. 1724–1733, 1997. [12] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. ?sherfaces: Recognition using class speci?c linear projection,” IEEE Trans. PAMI, vol. 19, 1997. [13] M. Lades, J. C. Vorbruggen, J. Buhmann, J. Lange, C von der Malsburg, R. P. Wurtz, and W. Konen, “Distortion invariant object recognition in the dynamic link architecture,” IEEE Trans. Computers, vol. 42, pp. 300–311, 1993. [14] A. Doucet, S. J. Godsill, and C. Andrieu, “On sequential monte carlo sampling methods for bayesian ?ltering,” Statistics and Computing, vol. 10, no. 3, pp. 197–209, 2000. [15] G. Kitagawa, “Monte carlo ?lter and smoother for nongaussian nonlinear state space models,” J. Computational and Graphical Statistics, vol. 5, pp. 1–25, 1996. [16] J. S. Liu and R. Chen, “Sequential monte carlo for dynamic systems,” Journal of the American Statistical Association, vol. 93, pp. 1031–1041, 1998. [17] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, 1991. [18] V. Krueger and S. Zhou, “Exemplar-based face recognition from video,” submitted to ECCV 02.

?

? ??? ?? ? ?

?

?? ??

???? ?? ? ? ??

?

???? ?? ? ?

(10)

Fig. 7 presents the conditional entropy ? ??? ?? ? ? against ? and the MMSE estimate of the scale parameter ? against ?, both obtained using Algorithm IIa. Fig. 7 shows the decreasing conditional entropy ? ??? ?? ? ? and the increasing scale parameter, which matches with the scenario: a subject walking towards a camera. In Fig. 5, the tracked face is superimposed on the image using a black bounding box.
1 0.9 0.8 posterior probability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 time instance 30 35 40 posterior probability 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 time instance 30 35 40

Fig. 6. II. Posterior probability ? ??? ? against time instance, obtained by Algorithm IIa (left) and Algorithm IIb (right).

5
MMSE estimate of scale parameter

1.1

4.5 4 conditional entropy 3.5 3 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 time instance 30 35 40

1

0.9

0.8

0.7 0 5 10 15 20 25 time instance 30 35 40

Fig. 7. II. Conditional entropy ? ??? ?? ? ? (left) and MMSE estimate of scale parameter (right) against time instance. Both are obtained using Algorithm IIb.

5. CONCLUSION We have presented Bayesian methods for face recognition from a probe video, compared with a gallery of still templates. In both cases, a time series state space model is need to accommodate the video and SIS algorithms provide the numerical solutions to the model. But, the posterior probability of the identity given the observations up to present, ? ??? ?, is estimated using different strate-


更多相关文章:

非常超级学习网 fccjxxw.com

copyright ©right 2010-2021。
非常超级学习网内容来自网络,如有侵犯请联系客服。zhit325@126.com|网站地图