Structures of dynamic scenes can only be recovered using a real-time range sensor. Depth from defocus offers a direct solution to fast and dense range estimation. It is computationally efficient as it circumvents the correspondence problem faced by stereo and feature tracking in structure from motion. However, accurate depth estimation requires theoretical and practical solutions to a variety of problems including recovery of textureless surfaces, precise blur estimation and magnification variations caused by defocusing. In the first part of this project, both textured and textureless surfaces are recovered using an illumination pattern that is projected via the same optical path used to acquire images. The illumination pattern is optimized to ensure maximum accuracy and spatial resolution in computed depth. The relative blurring in two images is computed using a narrow-band linear operator that is designed by considering all the optical, sensing and computational elements of the depth from defocus system. Defocus invariant magnification is achieved by the use of an additional aperture in the imaging optics. A prototype focus range sensor has been developed that produces up to 512x480 depth images at 30 Hz with an accuracy better than 0.3%. Several experiments have been conducted to verify the performance of the sensor.
As a part of this project, we have also explored constant-magnification optics for imaging. Magnification variations due to changes in focus setting pose a problem for depth from defocus. The magnification of a conventional lens can be made invariant to defocus by simply adding an aperture at an analytically derived location. The resulting optical configuration is called "telecentric." We have shown that most commercially available lenses can be turned into telecentric ones. We have also conducted a detailed analysis of the photometric and geometric properties of telecentric lenses.
We have also addressed the problem of computing depth from defocus without the use of active illumination. We have developed a class of broadband operators that, when used together, provide invariance to scene texture and produce accurate and dense depth maps. Since the operators are broadband, a small number of them are sufficient for depth estimation of scenes with complex textural properties. In addition, a depth confidence measure is derived that can be computed from the outputs of the operators. This confidence measure permits further refinement of computed depth maps. Experiments have been conducted on both synthetic and real scenes to evaluate the performance of the proposed operators. The depth detection gain error is less than 1%, irrespective of the texture frequency.