شكرا لك اخي على الرد
2D systems can not cope well enough with the disappearance of feature points due to spatial dis-
tortion from projection and motion, and varying illumination. 3D model based methods often get
better tracking results.
Basu et al. [10] employ a full 3D ellipsoidal rigid model (gure 2.1(a)) which ts on the head to
regularize the optical
ow. They compare their system with a planar based one and prove that the
system works with good results. DeCarlo et al. [23] developed a deformable face model (gure 2.1(b))
integrated with optical
ow so both motion and shape of the model can be estimated. Their results
show that tracking large rotation is possible and they solve problems with self-occlusion. However,
occlusion with other objects is not tested. This could in
uence the estimation of optical
ow, since
it is computed using sampled points.
Black et al. [13] uses a planar model to interpret optical
ow with ane transformation. Despite
of the stable methodology, the amount of motion was limited because of the planar model and
because the tracker had no concept of self-occlusion at the sides of the head.
Cascia et al. [16, 18] developed a fast 3D head tracker that models the head as a texture mapped
cylinder and formulates the head tracking problem as image registration in the texture map. Their
work builds on and extends the work of Schodl and Cascia [54, 17]. In later work [19] Cascia et
al. demonstrate that the tracker works for facial expression analysis. Although they solve the
varying illumination problem by using a set of trained templates, their system is not able track
large rotations, since a single and static template (rst pose of the head in the rst frame) is
used. This system was also implemented by Brown [15] and slightly improved by adding additional
templates. However, this system was only tested on few videos and more analysis needs to be done to
determine if Browns system performs much better than the original. Aggarwal et al. [1] propose that
their system improves accuracy and robustness by making fewer assumptions in comparison with
Cascia et al. [16, 18]. For example, they remove the unnecessary parameters for camera calibration.
Unfortunately, they do not give much information about their experiments, so the accuracy of their
system can not be veried.
The cylindrical shaped model is also used by Xiao et al. [63], shown in gure 2.1(e). They
transform the 2D points on the image representing the face area to a 3D cylinder using perspective
projection. Through optical
ow they estimate the new position of the head and rotate or translate
the 3D cylinder accordingly. Perspective projection is then used again to transform the points back
from 3D to 2D. Because of this approach, Xiao et al. do not texture map the cylinder. Also, in
comparison with Cascia et al. [16, 18], the template is updated each frame, so larger rotations can
be tracked. To compensate for the possibility of an accumulative error caused by optical
ow, a
technique called re-registration is implemented. Another technique, called iteratively re-weighed
least squares is implemented to deal with non-rigid-head motion and the problem of occlusion.
Because of the good results, other authors [34, 4] have applied this method as well and have even
suggested some improvements for the remaining problem of varying illumination.
Zhang et al. [65] focus on partial occlusion and better pose estimation. They argue that 3D
face models like ellipsoidal and cylindrical shapes have been successfully applied before, but good
approximation to face shape is not achievable. Therefore they use an extended superquadric (ESQ)
face shaped model, shown in gure 2.1(d), to reduce the shape ambiguity during tracking. While
this ESQ model produces better accuracy in pose estimation then rougher 3D models, it suers from
a time consuming process of model acquisition and as a result can not work in real-time. This is also
the case with the system developed by Paterson et al. [48]. They use a generic deformable 3D head
model, which was build from head models obtained from a 3D scanner. Because of the complexity of the model this system only works non real-time.
The method of Malciu et al. [40] relies on 3D-2D matching between 3D object features of a
head model - which is best described as an ellipsoidal or ad hoc Fourier-synthesized surface shown
in gure 2.1(g) - and 2D features in the image that are estimated throughout the sequence. They
demonstrate that this method is stable for large head motions, occlusions, various head postures
and lighting variations. But, the model is still very time consuming and the system can work only
with pre-stored videos.
A real-time head tracking algorithm, based on a triangular mesh model to represent the head
is realized by Zivkovic and Van der Heijden [66]. Their method is drift insensitive and works in
various realistic conditions with cheap, low-end equipment. However, their method relies heavily on
the initial head detection image and, as a result, is only capable of tracking small movements.
Ahlberg et al. [2] rst nd the face using a colour based algorithm and this gives a rough
estimation of the size and position of the face. The colour based algorithm only works properly
if the camera is calibrated, but it has the advantage that it is a fast and simple algorithm. After
the initial detection, the 3D CANDIDE model1, shown in gure 2.1(f), is adapted according to
a training set of images and then shaped according to 12 parameters (6 Action Units controlling
lips and eyebrows, 3D-rotation, 2D-translation and scale). To continuously track the head pose in
the subsequent frames they implement a least square method. They texture map the image to the
CANDIDE model and reshape it to a normalized form. This form is approximated by eigentextures
that are learned in advance. The normalized form and the approximated eigentexture form are
subtracted from each other to compute the error image. The main drawback of this method is that
the implementation is actually based on a 2D image based method, rather than a 3D model. This
makes it impossible to estimate large rotations. In later work, Ahlberg and Dornaika [24, 25] and
Ahlberg and Forchheimer [3] improve the system by decoupling the head motion estimation from the
facial animation motion. A technique called Random Sample Consensus (RANSAC) works directly
on the 3D model and detects the features that correspond with the rigid motion only, it is not
in
uenced by facial features that are aected by non-rigid motion.
Rusako et al. [53] created a head tracker that works with stereo data. This has the advantage
of more accurate foreground segmentation. If the head is found, the 3D coordinates can easily be
determined (if the focal length of the camera is known). Their system is able to track the head
with very fast movements. Morency et al. [41, 43, 42] also developed a stereo rigid motion tracking
technique for interactive environments with uncontrolled lighting situations. More stereo cameras
are needed in the system of Kawanaka et al. [33]. They track the human head in 3D voxel space
using particle ltering. Unfortunately, stereo data is not always available, so these systems can only
be used in closed domains.
To resolve the problem that some 3D model based methods need al lot of time to acquire the
model, Ohayon et al. [47] use 3D feature points to represent the head geometry. Because the shape of
the model is not xed a-priori, this method can be used for non-human heads, like animals heads, as
well. This method does not suer from self occlusion problems, correspondence mismatch or feature
لا ادري انا عندما اقوم بالترجمة و بما انني اترجم بالكلمة لا افهم تماما الفقرة
ارجو انكم كل ما تترجموا فقرة تعطوها لي فقط
المفضلات