The difference between getting the gist of a paper and the actual
understanding and implementation of it are two entirely different
things. I started with the understanding of the projective coordinate
transform. Writing some simple matlab code that would demonstrate
what a projective transform would do in
. For example, this
allowed me to understand what it meant by ``chirping'' a
wave.
Using the existing matlab code developed by Steve Mann, I attempted
to understand the various aspects of actually calculating the 8 projective
parameters (P) in
. Unbeknownst to me, I had great difficulties
due to the fact that this was all based on Lie algebra (which is different
than saying that I understand what Lie algebra is). The equations,
techniques and assumptions only hold when the changes are relatively
small.
It then progressed onto using
images and projectively chirping
(pchirp) them. Calculating P was done by first estimating P using
an approximation. Then using this as an initial guess to the rest
of the algorithm. This was done repetively in order to home in on
the value of P. The repetitive algorithm was also applied to multiple
downsampled images in order to generate increasingly accurate P approximations
for the next higher resolution image.
So, if everything is working then why work on it? Unfortunately, because the optical flow field is calculated over the entire image. When there is something that violates the static assumption or parallax is present (i.e. when a sign dominates your field of view) the resulting P is rather poor (due to the hyperbolic regression which is just least squares to a hyperbola). In actuality video orbits is rather tolerant of objects that violate the static scene assumption. But when the street sign takes up a large percentage of the image then it might be thrown off.