Next: Generating the Average 3D Up: Face Normalization and Recognition Previous: Face Normalization and Recognition

3D Face Data for Normalization

Several methods have been proposed to normalize a face's pose using a number of anchor points. Akamatsu et al. use an affine transformation which maps the triangle formed by three vertices (corresponding to the eyes and the mouth) into a standard view [1]. This normalization technique treats the face and the rest of the image as a thin sheet which can be scaled, rotated and sheared. This technique can account for translation, scaling and in-plane rotations of the face since these are only 2D effects that are not dependent on the 3D structure of the object. These in-plane rotations and translations are depicted in Figure

. However, a rigid object has two more degrees of rotational freedom which change its 2D projection in a non-homogenous way, as shown in Figure

. These can not be compensated for by a mere affine transformation since they induce non-homogenous warping and occlusion in the image and hence require a more sophisticated approach.

**Figure 4.1:** In-plane rotation, scaling and translation.
$\begin{figure}\center \begin{tabular}[b]{ccc} \epsfig{file=norm/figs/inplane2... ...,height=3cm}\\ (a) & (b) & (c) \end{tabular} \\ \vspace*{0.5cm} \end{figure}$

**Figure 4.2:** Out-of-plane or depth rotations.
$\begin{figure}\center \begin{tabular}[b]{cc} \epsfig{file=norm/figs/outplane1... ...ne2.ps,height=3cm}\\ (a) & (b) \end{tabular} \\ \vspace*{0.5cm} \end{figure}$

An alternate model for the face is an ellipsoid or other simple geometric structure such as a cylinder as in Figure [5]. Unlike the ``thin sheet'' model which cannot account for yaw or pitch, the ellipsoid has the ability to roughly mimic the out-of-plane rotations the face can undergo. This is due to the curvature of the ellipsoid which exhibits non-homogenous warping in a 2D sense. Unfortunately, a simple ellipsoid cannot encompass all the nuances of the face and fully normalize its 2D projection. For example, the nose can cause occlusion by rotating in front of the cheek. In addition, the human head is not quite ellipsoidal and is difficult to approximate with standard 3D geometric models.

$\begin{figure}% latex2html id marker 2268 \center \begin{tabular}[b]{ccc} \ep... ...and (c) but it is strangely warped by the geometry of the cylinder.}\end{figure}$

Clearly, the most accurate 3D model of a face would be the true 3D range data of the individual obtained from laser range-finder scanning. This cumbersome process is not only time-consuming and non-automated, it requires the use of sophisticated equipment which is not readily available^4.1. Some sample data obtained from such devices is shown in Figure as radial range and radial intensity images. The images are in a cylindrical coordinate system and the axes are appropriately labelled.

$\begin{figure}% latex2html id marker 2281 \center \begin{tabular}[b]{cc} \eps... ...ntensity images. (a) Radial range data. (b) Radial intensity data.}\end{figure}$

From the radial range data, we compute a polygonal mesh by converting the cylindrical coordinates into Cartesian form. The Cartesian 3D data can then be rendered and displayed as shown in Figure (a). Subsequently, we can ``colorize'' the 3D model with the radial intensity data and obtain a texture-mapped 3D model of the individual as shown in Figure (b). This 3D model can then be used to synthesize any view of the individual by treating the head as a rigid object and rotating and translating it with 6 degrees of freedom (see Figure and ).

$\begin{figure}% latex2html id marker 2295 \center \begin{tabular}[b]{cc} \eps... ... intensity data. (a) Shaded 3D model. (b) Texture-mapped 3D model.}\end{figure}$

Unfortunately, we do not and cannot have a 3D model for each individual that we will photograph for our recognition system. Thus, we shall attempt to use another individual's 3D model to normalize the photograph under the assumption that the 3D structure of most faces is somewhat constant. Therefore, we can use one 3D model of a face and texture-map new photographed faces onto it. Unfortunately, some individuals will have thinner or wider faces and the model will not fit them as well as it did with the original texture. We suggest deforming the model along its vertical axis to stretch or squash it to fit it to the new face, as shown in Figure . Ideally, we would like to deform the model arbitrarily with various small stretchings and warpings so that it can be locally adapted to each new individual. However, such a process is quite computationally expensive. Nevertheless, the single vertical stretch of the model and its six degrees of freedom gives us quite a good approximation of the faces we will encounter and is, by far, more accurate than the planar or ellipsoidal models used in previous experiments.

$\begin{figure}% latex2html id marker 2306 \center \begin{tabular}[b]{cc} \eps... ... along its vertical axis. (a) Stretched model. (b) Squashed model.}\end{figure}$

Next: Generating the Average 3D Up: Face Normalization and Recognition Previous: Face Normalization and Recognition

Tony Jebara
2000-06-23