16-721 Project

Ethan Tira-Thompson

May 8, 2005

Code and original data available here
I'm afraid that the directory is very unorganized,
but all the matlab implementations are in there....


  • Geometric Calibration

    • Already presented in class, but I'll summarize...

    • Took 12 sample images (well, 12 good ones, threw away a bunch more)

    • Created a corner detector, with sub-pixel correction:
        +    => 
      Originally used both inner and outer points, but it's hard to get good accuracy with an outside corner, so it's better to only use the inner ones.  See how the outer points in this image twist around the square instead of staying on the corner:

      This problem only really occurs when the checkerboard is slanted (which is most of the time...)

    • Pick outer corners by quadrant from center of mass (assumes there aren't any false positives!)
      • use Manhattan distance to select farthest point in each quadrant -- euclidean distance can give bad results with highly projective images

    • Determine matrix transformation (T) for the trapazoid. (aka keystone correction?)  I solved this myself, because I'm a glutton for a good math puzzle.  There's probably better ways, but this one's mine:
      • Translate to put upper left corner at the origin
      • 3 remaining points, upper-right (ur), lower-left (ll), lower-right (lr)
      • Rotation and skew given by: [ ur ll ]
      • Add two terms a and b, which will provide trapezoidal component: [ ur · a    ul · b    0 ; (1-a)    (1-b)   1 ]
        • a is obtained by finding the intersection of line through ur in direction of ll with line through ll in direction of actual corner point.  Similarly, b can be found by the same process, swapping ur and ll.
        • you could also do an algebraic solution, but it's not intuitive.

    • Use inv(T) to project the detected corners back into a regular plane, then can match features between images

      If the sub-pixel registration isn't used earlier, these arrows don't give nearly such a nice distribution, indicating more noise in the data.  (so it is good for something!)  Note the outer corners give ragged residuals

    • Once features are matched up, compute homography and solve for camera parameters.
      • However, I stopped at this point and used this package on the caltech site.

      • Parameters obtained:
        %-- Focal length:
        fc = [ 198.807236147739843 ; 200.332749599841890 ];
         
        %-- Principal point:
        cc = [ 102.688782354632409 ; 85.039877985626319 ];
         
        %-- Skew coefficient:
        alpha_c = 0.000000000000000;
         
        %-- Distortion coefficients:
        kc = [ -0.147005257242916 ; 0.384849817477919 ; -0.003477766214471 ; ...
               0.000128730250105 ; 0.000000000000000 ];

         
        %-- Focal length uncertainty:
        fc_error = [ 2.454127338847488 ; 2.212276450344577 ];
         
        %-- Principal point uncertainty:
        cc_error = [ 2.556387295752588 ; 1.943240801392194 ];
         
        %-- Skew coefficient uncertainty:
        alpha_c_error = 0.000000000000000;
         
        %-- Distortion coefficients uncertainty:
        kc_error = [ 0.040374388404014 ; 0.204861426844300 ; 0.002956511056168 ; ...
                     0.003983820154585 ; 0.000000000000000 ];

         
        %-- Image size:
        nx = 208;
        ny = 160;

      • Now need to write code on the Aibo to apply the calibration parameters when needed: (this isn't thoroughly tested yet though)
        //!provides a ray from camera through pixel in image; where possible, use \
        computePixel for better accuracy (i.e. try to always move from world to \
        camera instead of the other way around)
        /*! We can't undo some terms of the distortion model -- this is an estimate.
        * @param[in] x x position in range [-1,1]
        * @param[in] y y position in range [-1,1]
        * @param[out] r_x x value of the ray
        * @param[out] r_y y value of the ray
        * @param[out] r_z z value of the ray (always 1) */

        void computeRay(float x, float y, float& r_x, float& r_y, float& r_z) {
        x=(x+1)*calibration_res_x/2;
        y=(y+1)*calibration_res_y/2;
        float yd=(y-principle_point_y)/focal_len_y;
        float xd=(x-principle_point_x)/focal_len_x-skew*yd;
        float r2=xd*xd+yd*yd; //estimate of original
        float radial=(1 + kc1_r2*r2 + kc2_r4*r2*r2 + kc5_r6*r2*r2*r2);
        r_x=(xd - 2*kc3_tan1*x*y - kc4_tan2*(r2+2*x*x))/radial; //estimating tangential
        r_y=(yd - kc3_tan1*(r2+2*y*y) - 2*kc4_tan2*x*y)/radial; //estimating tangential
        r_z=1;
        }

        //!provides a pixel hit in image by a ray going through the camera frame
        /*! Hopefully we'll eventually upgrade this to account for lens distortion
        * @param[in] r_x x value of the ray
        * @param[in] r_y y value of the ray
        * @param[in] r_z z value of the ray
        * @param[out] x x position in range [-1,1]
        * @param[out] y y position in range [-1,1] */

        void computePixel(float r_x, float r_y, float r_z, float& x, float& y) {
        if(r_z==0) {
        x=y=0;
        return;
        }
        r_x/=r_z;
        r_y/=r_z;
        float r2 = r_x*r_x + r_y*r_y; //'r2' for 'radius squared', not 'ray number 2'
        float radial=(1 + kc1_r2*r2 + kc2_r4*r2*r2 + kc5_r6*r2*r2*r2);
        float xd = radial*r_x + 2*kc3_tan1*r_x*r_y + kc4_tan2*(r2+2*r_x*r_x);
        float yd = radial*r_y + kc3_tan1*(r2+2*r_y*r_y) + 2*kc4_tan2*r_x*r_y;
        x=focal_len_x*(xd+skew*yd)+principle_point_x;
        y=focal_len_y*yd+principle_point_y;
        x=2*x/calibration_res_x-1;
        y=2*y/calibration_res_y-1;
        }

  • Radiometric Calibration

    • I chose to implement:  C. Manders, C. Aimone, and Steve Mann, "Camera Response Function Recovery from Different Illuminations of Identical Subject Matter"(pdf). (to appear in the IEEE International Conference on Image Processing 2004.)
      • Take images of a scene under different combinations of lighting conditions
        • e.g. light A, light B, and light A+B
      • Create large, sparse, matrix relating different levels of light in A, with the same pixels in B and A+B
        • matrix 'M' is 255 (or whatever your intensity resolution is) wide, and the number of pixel samples high
        • each row of M has 1's in the columns corresponding to the value of the pixel in A and B, and -1 in the column corresponding to the value in C
        • remove columns for pixel values which weren't found in the image (more on that later)
        • remove rows for pixel values which were 0 or over-saturated
        • use SVD to find the least singular value
      • Neat: doesn't require any information about the camera, invariant to most lens effects except the function we want to measure.
    • Take image in dark room to measure camera noise, shown enhanced here:

      Surprising results here: the darkest the camera gets is '16'.  The camera doesn't ever seem to actually get to 0.
      The corners are brighter (up to 25 in the upper right hand corner), even though there is no light in the room, and usually the corners are darker than the rest of the image.  To handle this, instead of subtracting the base value (16) from images later, I will subtract this bitmap itself to better account for the corners.
      Update: I have since learned that many YUV implementations offset the Y channel by 16 to avoid sub-black pixels.

    • First attempt was to take a series of images of a plain wall.  I thought it would help to avoid specular reflections (saturation)
      + =>
      This turned out to not work so well -- it's important to cover the whole range of brightness in each individual image, so that correspondences across the entire light range can be determined.

    • Second attempt, normal scene, with a good mix of light and dark with each light.
      + =>
      I actually went slightly beyond the methodolgy used in the paper, and added a third light as well for even more accurate results, yielding 7 image pairs:
      A+B=AB, A+C=AC, B+C=BC, A+BC=ABC, B+AC=ABC, C+AB=ABC, A+B+C=ABC

    • The response function I obtained:

      The red line is the measured response curve from the algorithm.  The blue line is a polynomial fit of that data, I chose a 3rd degree polynomial as the basis, yielding a response function of: 2.18042e-08 x3 + -3.49511e-06 x2 + 0.000549223 x
      Things to note:
      • The camera saturates early - maximum value in the above images is 223.  After subtracting the background, this gives us a working range of 0-207.
        Update: Learning more about the YUV representation, the CCIR 601 and CCIR 656 standards clamp the range of the Y channel to [16,235], so perhaps this isn't so odd.
      • However, since the last two points are poorly fit, I might consider 205 the beginning of saturation.  Or perhaps the function simply skyrockets towards the end -- when it is pointed directly into a blindlingly bright light, it tops out around 230 (before background subtraction).


    • The paper goes on to suggest a slightly less intuitive form of analysizing the data, which provides better robustness to noise.  However, since I had already averaged an image sequence for each sample, and used additional image sets, I was pretty happy with the smoothness of my results and decided to move on.

    • All of these images were taken using the Y channel of the system's native YUV color space.  I had hoped to also attempt a calibration of the U and V channels, however the mechanics of how these channels work prevented a straightforward run, so after playing with it a bit I noticed the time slipping away and decided skip this aspect and move on...

    • Results!
      Original
      Calibrated


      Perhaps not as nice to view, but much nicer to do mathematical operations.

  • Verification

    • The camera offers 3 exposure settings (1/50sec, 1/100sec, 1/200sec), and 3 gain settings (-6dB, 0dB, +6dB).  Normally we run with the longest exposure and the highest gain, which are the settings we used in calculating the response function.
      1/50 sec, +6dB
      1/100sec, 0dB
      1/200sec, -6dB




    • For each of the following figures, we consider original pixel values less than 21 to be undersaturated and values greater than 216 to be oversaturated.  In my implementation, I do these camparisons after subtracting the base image, so I actually use values of 5 and 200 for the min and max.

    • First, let's verify that the exposures given in the Aibo's documentation are indeed accurate.  If we divide the calibrated pixel values in one image by the other, they should be differ by a factor of 2 for each change in exposure:
      Medium exposure (1/100 sec) divided by long exposure (1/50 sec) (y-axis), vs. long exposure (x-axis); high gain

      polynomial fit response function
      red line: y = 0.5  (ideal)
      cyan line: y = 6.87e-4 x + 0.4003

      lookup-table response function
      red line: y = 0.5  (ideal)
      cyan line: y = 1.96e-4 x + 0.4677
      This shows the results of dividing the pixels of a medium exposure image (1/100sec) by a slow exposure image (1/50sec).  Sure enough, the results are almost 1/2.  The red line shows the ideal fit, the cyan line shows a least-squares linear fit of the actual data.  Given the relatively poor performance of the polynomial response function we calculated in the previous section (left figure), I will stick with using the direct response function lookup (right figure), which gives more accurate results and is probably a faster implementation to boot.
      The data seems to ride a little low overall -- this may be due to quantization error?

    • We'll do the same for the gain settings.  Note that the values given by the camera documentation indicate amplifications in decibels, ±6dB.  This would correspond to a change in power by a factor of 4 (10 log (A/B) = 6   =>  A/B = 10^.6 = 3.98).  However, we appear to get a factor of two:
      No gain (0dB) divided by high gain (+6dB) (y-axis), vs. high gain (x-axis); long exposure

      red line: y = 0.5  (ideal)
      cyan line: y = 8.10e-5 x + 0.4754
      Perhaps the writers were in reference to the voltage gain of the CCD chip, calculated as 20 log (A/B), and hence the gain of 6dB is actually only scales the values obtained by 3dB.  *shrug*
      If this is indeed the case, then a medium exposure, high gain image should be equivalent to a long exposure, high gain image:
      Long exposure, no gain divided by medium exposure, high gain (y-axis) vs.  medium exposure, high gain (x-axis)

      red line: y = 1  (ideal)
      cyan line: y = -1.94e-4 x + 1.010
      In other words, the camera settings let us trade motion blur for image noise while keeping the image's light levels invariant.

    • Just for fun, let's check to see what kind of results would be obtained if we didn't know the camera's response function.  I will repeat the calculations of the no-gain vs. high-gain comparison, this time without applying the response function:
      No gain (0dB) divided by high gain (+6dB) (y-axis), vs. high gain (x-axis); long exposure
      Subtracted base image, but without response function

      red line: y = 0.5  (ideal)
      cyan line: y = 2.01e-3 x + 0.2934
      Neither base image nor response function (raw pixels)

      red line: y = 0.5  (ideal)
      cyan line: y = -1.13e-3 x + 0.7287
      For once bad results are good -- so the response function is good for something ;)

  • High Dynamic Range Imaging

    • Given a scene with both very bright regions and very dark regions:
         from which we use the 'Y' channel =>

    • Let's implement HDRI:
      • Take images at various exposure/gain settings to capture the entire range of intensity values, some of which are shown here:
        1/50 sec, +6dB
        1/100sec, 0dB
        1/200sec, -6dB




      • For each image:
        • Pass image through radiometric calibration (the geometric calibration was for a super-resolution section which was pruned in the interest of time)
        • Mutliply image by the light scale -- high gain, long exposure is '1', and each decrease in gain or exposure means a given pixel value represents twice the light input, as established above.
        • For pixel values which aren't over- or under-saturated, we add one to that pixel's counter and add the pixel's value to the HDRI pixel's value.  (invalid pixels are ignored)
      • Finally, for each pixel in the HDRI image, divide by that pixel's count (so we wind up with the average value across all images which had a valid sample for that pixel)

    • Results!
      • Input images:
        The red regions in each image are considered over- or under-saturated
        long exposure, high gain

        long exposure, no gain

        long exposure, negative gain

        medium exposure, negative gain

        short exposure, negative gain

        total pixel counts

        red indicates regions without
        any valid samples

      • Outputs:
        Original Image
        HDRI Image
        8-bit Comparison

        long exposure, high gain

        HDRI, intensity range: [0,1/16]

        The high range image scaled to
        show low-intensity detail

        medium exposure, no gain

        HDRI, intensity range: [0,1/4]

        Upper-left is a low-intensity image
        scaled up, lower right is
        high-intensity scaled down

        short exposure, negative gain

        HDRI, intensity range: [0,1]

        The low range image scaled to
        show high range detail
        As these figures show, the single HDRI image contains the full range of intensity information, whereas the 8-bit images are missing large regions of the image because it was over or under exposed.

        Only the HDRI stores enough information to pick out both the RI logo taped under the desk, as well as the letters "HDRI" written in yellow on the canvas:
           

        As opposed to viewing clipped subsets of the intensity range, we can also use an extreme gamma correction (3.67) to stretch out the dark and compress the light, so that you can see both of these features at the same time (albeit, not very well -- we need high dynamic range computer displays to really display this correctly):


        Reapplying the U and V channels combined from the long exposure images, we can get a hint of the colorization of such an image.  The spotlight is yellowish in color, the RI logo under the table is in orange marker, the "HDRI" text was with yellow marker, the upper left of the paper was a green splotch, the chair (which is cut off on the left of the image) is indeed red.

        However actual high dynamic range treatment of the color channels is left as an exercise to the reader ;)


        For reference, original low and high intensity sample images (converted to RGB):