The difference between getting the gist of a paper and the actual understanding and implementation of it are two entirely different things. I started with the understanding of the projective coordinate transform. Writing some simple matlab code that would demonstrate what a projective transform would do in . For example, this allowed me to understand what it meant by ``chirping'' a wave.
Using the existing matlab code developed by Steve Mann, I attempted to understand the various aspects of actually calculating the 8 projective parameters (P) in . Unbeknownst to me, I had great difficulties due to the fact that this was all based on Lie algebra (which is different than saying that I understand what Lie algebra is). The equations, techniques and assumptions only hold when the changes are relatively small.
It then progressed onto using images and projectively chirping (pchirp) them. Calculating P was done by first estimating P using an approximation. Then using this as an initial guess to the rest of the algorithm. This was done repetively in order to home in on the value of P. The repetitive algorithm was also applied to multiple downsampled images in order to generate increasingly accurate P approximations for the next higher resolution image.
So, if everything is working then why work on it? Unfortunately, because the optical flow field is calculated over the entire image. When there is something that violates the static assumption or parallax is present (i.e. when a sign dominates your field of view) the resulting P is rather poor (due to the hyperbolic regression which is just least squares to a hyperbola). In actuality video orbits is rather tolerant of objects that violate the static scene assumption. But when the street sign takes up a large percentage of the image then it might be thrown off.