-
Geometric Calibration
- Already presented in class, but I'll summarize...
- Took 12 sample images (well, 12 good ones, threw away a
bunch more)
- Created a corner detector, with sub-pixel correction:
Originally used both inner and outer points, but it's hard to get good
accuracy with an outside corner, so it's better to only use the inner
ones. See how the outer points in this image twist around the
square instead of staying on the corner:
This problem only really occurs when the checkerboard is slanted (which
is most of the time...)
- Pick outer corners by quadrant from center of mass
(assumes there aren't any false positives!)
- use Manhattan distance to select farthest point in each
quadrant -- euclidean distance can give bad results with highly
projective images
- Determine matrix transformation (T) for the trapazoid.
(aka keystone correction?) I solved this myself, because I'm a
glutton for a good math puzzle. There's probably better ways, but
this one's mine:
- Translate to put upper left corner at the origin
- 3 remaining points, upper-right (ur), lower-left (ll), lower-right (lr)
- Rotation and skew given by: [ ur ll ]
- Add two terms a
and b, which will provide
trapezoidal component: [ ur
· a ul · b 0 ; (1-a) (1-b) 1 ]
- a is
obtained by finding the intersection of line through ur in direction of ll with line through ll in direction of actual corner
point. Similarly, b can
be found by the same process, swapping ur and ll.
- you could also do an algebraic solution, but it's not
intuitive.
- Use inv(T) to project the detected corners back into a
regular plane, then can match features between images
If the sub-pixel registration isn't used earlier, these arrows don't
give nearly such a nice distribution, indicating more noise in the
data. (so it is good for something!) Note the outer corners
give ragged residuals
- Once features are matched up, compute homography and
solve for camera parameters.
- However, I stopped at this point and used this package
on the caltech
site.
- Parameters obtained:
%-- Focal length:
fc = [
198.807236147739843 ; 200.332749599841890 ];
%-- Principal point:
cc = [
102.688782354632409 ; 85.039877985626319 ];
%-- Skew
coefficient:
alpha_c =
0.000000000000000;
%-- Distortion
coefficients:
kc = [
-0.147005257242916 ; 0.384849817477919 ; -0.003477766214471 ; ...
0.000128730250105 ;
0.000000000000000 ];
%-- Focal length
uncertainty:
fc_error = [
2.454127338847488 ; 2.212276450344577 ];
%-- Principal point
uncertainty:
cc_error = [
2.556387295752588 ; 1.943240801392194 ];
%-- Skew
coefficient uncertainty:
alpha_c_error =
0.000000000000000;
%-- Distortion
coefficients uncertainty:
kc_error = [
0.040374388404014 ; 0.204861426844300 ; 0.002956511056168 ; ...
0.003983820154585 ; 0.000000000000000 ];
%-- Image size:
nx = 208;
ny = 160;
- Now need to write code on the Aibo to apply the
calibration parameters when needed: (this isn't thoroughly tested yet
though)
//!provides a ray from camera through pixel in image; where possible, use \ computePixel for better accuracy (i.e. try to always move from world to \ camera instead of the other way around) /*! We can't undo some terms of the distortion model -- this is an estimate. * @param[in] x x position in range [-1,1] * @param[in] y y position in range [-1,1] * @param[out] r_x x value of the ray * @param[out] r_y y value of the ray * @param[out] r_z z value of the ray (always 1) */ void computeRay(float x, float y, float& r_x, float& r_y, float& r_z) { x=(x+1)*calibration_res_x/2; y=(y+1)*calibration_res_y/2; float yd=(y-principle_point_y)/focal_len_y; float xd=(x-principle_point_x)/focal_len_x-skew*yd; float r2=xd*xd+yd*yd; //estimate of original float radial=(1 + kc1_r2*r2 + kc2_r4*r2*r2 + kc5_r6*r2*r2*r2); r_x=(xd - 2*kc3_tan1*x*y - kc4_tan2*(r2+2*x*x))/radial; //estimating tangential r_y=(yd - kc3_tan1*(r2+2*y*y) - 2*kc4_tan2*x*y)/radial; //estimating tangential r_z=1; } //!provides a pixel hit in image by a ray going through the camera frame /*! Hopefully we'll eventually upgrade this to account for lens distortion * @param[in] r_x x value of the ray * @param[in] r_y y value of the ray * @param[in] r_z z value of the ray * @param[out] x x position in range [-1,1] * @param[out] y y position in range [-1,1] */ void computePixel(float r_x, float r_y, float r_z, float& x, float& y) { if(r_z==0) { x=y=0; return; } r_x/=r_z; r_y/=r_z; float r2 = r_x*r_x + r_y*r_y; //'r2' for 'radius squared', not 'ray number 2' float radial=(1 + kc1_r2*r2 + kc2_r4*r2*r2 + kc5_r6*r2*r2*r2); float xd = radial*r_x + 2*kc3_tan1*r_x*r_y + kc4_tan2*(r2+2*r_x*r_x); float yd = radial*r_y + kc3_tan1*(r2+2*r_y*r_y) + 2*kc4_tan2*r_x*r_y; x=focal_len_x*(xd+skew*yd)+principle_point_x; y=focal_len_y*yd+principle_point_y; x=2*x/calibration_res_x-1; y=2*y/calibration_res_y-1; }
-
Radiometric Calibration
- I chose to implement: C. Manders, C. Aimone, and
Steve Mann, "Camera Response Function Recovery from Different
Illuminations of Identical Subject Matter"(pdf). (to
appear in the IEEE International Conference on Image Processing 2004.)
- Take images of a scene under different combinations of
lighting conditions
- e.g. light A, light B, and light A+B
- Create large, sparse, matrix relating different levels
of light in A, with the same pixels in B and A+B
- matrix 'M' is 255 (or whatever your intensity
resolution is) wide, and the number of pixel samples high
- each row of M has 1's in the columns corresponding to
the value of the pixel in A and B, and -1 in the column corresponding
to the value in C
- remove columns for pixel values which weren't found
in the image (more on that later)
- remove rows for pixel values which were 0 or
over-saturated
- use SVD to find the least singular value
- Neat: doesn't require any information about the camera,
invariant to most lens effects except the function we want to
measure.
- Take image in dark room to measure camera noise, shown
enhanced here:
Surprising results here: the
darkest the camera gets is '16'. The camera doesn't ever seem to
actually get to 0.
The corners are brighter (up to 25 in the upper right hand corner),
even though there is no light in the room, and usually the corners are
darker than the rest of the image. To handle this, instead of
subtracting the base value (16) from images later, I will subtract this
bitmap itself to better account for the corners.
Update: I have
since learned that many YUV implementations offset the Y channel by 16
to avoid sub-black pixels.
- First attempt was to take a series of images of a plain
wall. I thought it would help to avoid specular reflections
(saturation)
This turned out to not work so well -- it's important to cover the
whole range of brightness in each individual image, so that
correspondences across the entire light range can be determined.
- Second attempt, normal scene, with a good mix of light
and dark with each light.
I actually went slightly beyond the methodolgy used in the paper, and
added a third light as well for even more accurate results, yielding 7
image pairs:
A+B=AB, A+C=AC, B+C=BC, A+BC=ABC, B+AC=ABC, C+AB=ABC, A+B+C=ABC
- The response function I obtained:
The red line is the measured
response curve from the algorithm. The blue line is a polynomial
fit of that data, I chose a 3rd degree polynomial as the basis,
yielding a
response function of: 2.18042e-08 x3
+ -3.49511e-06 x2
+ 0.000549223 x
Things to note:
- The camera saturates early - maximum value in the above
images is 223. After subtracting the background, this gives us a
working range of 0-207.
Update: Learning
more about the YUV representation, the CCIR 601 and CCIR 656 standards
clamp the range of the Y channel to [16,235], so perhaps this isn't so
odd.
- However, since the last two points are poorly fit, I
might consider 205 the beginning of saturation. Or perhaps the
function simply skyrockets towards the end -- when it is pointed
directly into a blindlingly bright light, it tops out around 230
(before background subtraction).
- The paper goes on to suggest a slightly less intuitive
form of analysizing the data, which provides better robustness to
noise. However, since I had already averaged an image sequence
for each sample, and used additional image sets, I was pretty happy
with the smoothness of my results and decided to move on.
- All of these images were taken using the Y channel of the
system's native YUV color space. I had hoped to also attempt a
calibration of the U and V channels, however the mechanics of how these
channels work prevented a straightforward run, so after playing with it
a bit I noticed the time slipping away and decided skip this aspect and
move on...
- Results!
Original
|
Calibrated
|

|

|
Perhaps not as nice to view, but much nicer to do mathematical
operations.
-
Verification
- The camera offers 3 exposure settings (1/50sec, 1/100sec,
1/200sec), and 3 gain settings (-6dB, 0dB, +6dB). Normally we run
with the longest exposure and the highest gain, which are the settings
we used in calculating the response function.
1/50
sec, +6dB
|
1/100sec,
0dB
|
1/200sec,
-6dB
|

|

|

|
- For each of the following figures, we consider original
pixel values less than 21 to be undersaturated and values greater than
216 to be oversaturated. In my implementation, I do these
camparisons after subtracting the base image, so I actually use values
of 5 and 200 for the min and max.
- First, let's verify that the exposures given in the
Aibo's documentation are indeed accurate. If we divide the
calibrated pixel values in one image by the other, they should be
differ by a factor of 2 for each change in exposure:
Medium exposure (1/100 sec) divided by
long exposure (1/50 sec) (y-axis), vs. long exposure (x-axis); high gain
|

polynomial fit response function
red line: y = 0.5 (ideal)
cyan line: y = 6.87e-4 x + 0.4003
|

lookup-table response function
red line: y = 0.5 (ideal)
cyan line: y = 1.96e-4 x + 0.4677
|
This shows the results of
dividing the pixels of a medium exposure image (1/100sec) by a slow
exposure image (1/50sec). Sure enough, the results are almost
1/2. The red line shows the ideal fit, the cyan line shows a
least-squares linear fit of the actual data. Given the relatively
poor performance of the polynomial response function we calculated in
the previous
section (left figure), I will stick with using the direct response
function lookup (right figure),
which gives more accurate results and is probably a faster
implementation to boot.
The data seems to ride a little low overall -- this may be due to
quantization error?
-
We'll do the same for the
gain settings. Note that the values given by the camera
documentation indicate amplifications in decibels, ±6dB.
This would correspond to a change in power by a factor of 4 (10 log
(A/B) = 6 => A/B = 10^.6 = 3.98). However,
we appear to get a factor of two:
No gain (0dB)
divided by high gain (+6dB) (y-axis), vs. high gain (x-axis); long
exposure
|

red line: y = 0.5 (ideal)
cyan line: y = 8.10e-5 x + 0.4754
|
Perhaps the writers were in reference to the voltage gain of the CCD
chip, calculated as 20 log (A/B), and hence the gain of 6dB is actually
only scales the values obtained by 3dB. *shrug*
If this is indeed the case, then a medium exposure, high gain image
should be equivalent to a long exposure, high gain image:
Long exposure, no
gain divided by medium exposure, high gain (y-axis) vs. medium
exposure, high gain (x-axis)
|

red line: y = 1 (ideal)
cyan line: y = -1.94e-4 x + 1.010
|
In other words, the camera settings let us trade motion blur for image
noise while keeping the image's light levels invariant.
- Just for fun, let's check to see what kind of results
would be obtained if we didn't know the camera's response
function. I will repeat the calculations of the no-gain vs.
high-gain comparison, this time without applying the response function:
| No gain (0dB) divided by high gain (+6dB)
(y-axis), vs. high gain (x-axis); long exposure |
Subtracted
base image, but without response function

red line: y = 0.5 (ideal)
cyan line: y = 2.01e-3 x + 0.2934
|
Neither
base image nor response function (raw pixels)

red line: y = 0.5 (ideal)
cyan line: y = -1.13e-3 x + 0.7287
|
For once bad results are good -- so the response function is good for
something ;)
-
High Dynamic Range Imaging
- Given a scene with both very bright regions and very
dark regions:
 from which we use the 'Y' channel =>
- Let's implement HDRI:
- Take images at various exposure/gain settings to
capture the entire range of intensity values, some of which are shown
here:
1/50
sec, +6dB
|
1/100sec,
0dB
|
1/200sec,
-6dB
|

|

|

|
- For each image:
- Pass image through radiometric calibration (the
geometric calibration was for a super-resolution section which was
pruned in the interest of time)
- Mutliply image by the light scale -- high gain, long
exposure is '1', and each decrease in gain or exposure means a given
pixel value represents twice the light input, as established above.
- For pixel values which aren't over- or
under-saturated,
we add one to that pixel's counter and add the pixel's value to the
HDRI
pixel's value. (invalid pixels are ignored)
- Finally, for each pixel in the HDRI image, divide by
that pixel's
count (so we wind up with the average value across all images which had
a
valid sample for that pixel)
- Results!
- Input images:
The red regions in each image are considered over- or under-saturated
long exposure, high gain

|
long exposure, no gain

|
long exposure, negative gain

|
medium exposure, negative gain

|
short exposure, negative gain

|
total pixel counts

red indicates regions without
any valid samples
|
- Outputs:
Original
Image
|
HDRI
Image
|
8-bit
Comparison
|

long exposure, high gain
|

HDRI, intensity range: [0,1/16]
|

The high range image scaled to
show low-intensity detail
|

medium exposure, no gain
|

HDRI, intensity range: [0,1/4]
|

Upper-left is a low-intensity image
scaled up, lower right is
high-intensity scaled down
|

short exposure, negative gain
|

HDRI, intensity range: [0,1]
|

The low range image scaled to
show high range detail
|
As these figures show, the single HDRI image contains the full range of
intensity information, whereas the 8-bit images are missing large
regions of the image because it was over or under exposed.
Only
the HDRI stores enough information to pick out both the RI logo taped
under
the desk, as well as the letters "HDRI" written in yellow on the canvas:
As opposed to viewing clipped subsets of the intensity range, we can
also use an extreme gamma correction (3.67) to stretch out the dark and
compress the light, so that you can see both of these features at the
same time (albeit, not very well -- we need high dynamic range computer
displays to really display this correctly):
Reapplying the U and V channels combined from the long exposure
images, we can get a hint of the colorization of such an image.
The spotlight is yellowish in color, the RI logo under the table is in
orange marker, the "HDRI" text was with yellow marker, the upper left
of the paper was a green splotch, the chair (which is cut off on the
left of the image) is indeed red.
However actual high dynamic range treatment of the color channels is
left as an exercise to the reader ;)
|