Traditionally the models of these transformations have been affine.
That is of the form
where
.
Affine describes rotation about the optical axis, zoom, translation
and shear. Cameras however do not produce a shear type of motion.
Unfortunaly, the affine model cannot describe pan and tilt. These
are the equivalents to real world ``keystoning'' and ``chirping''.
Therefore trying to describe the actual motions by using the affine
model will result in a very poor description that will both model
the motion poorly and is succpetible to noise.
However, the projective model is able to exactly describe all the
possible camera motions. The projective model consists of 8 parameters
and is of the following form
where
. This is what we had learned
in assignment 1. In [1]:
Becuause the parameters of the projective coordinate transformation had traditionally been thought to be too difficult to solve, most researchers have used the simpler affine model or other approximations to the projective model. ... we propose and demonstrate the featureless estimation of the parameters of the ``exact'' projective model ...Thus, the projective model properly expresses the pan and tilt motions which result in ``keystoning'' and ``chirping'' of the image. So, in order for us to properly express the difference between two images we need to solve for the 8 parameters. Obviously, this isn't trivial in the sense of setting up eight equations and solving for the 8 values.