Beginner’s Guide of Camera Calibration
Starting with Pinhole Model
Camera calibration herein is about “geometric” camera calibration, which estimates the parameters of an image or video camera consisting of a lens system and sensor. Camera parameters describe how an image is mapped through the camera system from the world coordinates. One may use these parameters to correct the image distortion, measure the size of the object in world units and calculate the location of the object relative to the camera. These tasks are widely used in applications, specifically for robotics related to navigation or 3-D scene reconstruction.

The word “camera model” is an assumption that describes how the object in the real world is projected onto the image plane. The “pinhole model” describes the lens as an aperture so it assumes that the rays pass through the aperture and form an inversed real image on the image plane. Apparently, pinhole model neglects the refraction caused by the material and the geometry of the lens, so it is applicable only when the “negligible refraction” holds.
Camera Parameters
In pinhole model, the camera parameters are represented in a 3-by-4 matrix called “camera matrix”, which maps the 3-D world into the image plane:

Camera matrix is composed by extrinsic and intrinsic parameters. The extrinsic parameters represent the location of the camera in the 3D scene by translation and rotation and the intrinsic parameters describes the optical center and the focal length of the camera.


The extrinsic parameters are to transform the world points from 3D-scene to camera
coordinates, and the camera coordinates are then mapped into image plane according to the intrinsic parameters. We may summarize: the extrinsic parameters describe the relative position between the camera system and the 3D world while the intrinsic parameters could define the optical features of the camera.
Distortion

Pinhole model does not account for the lens distortion; however, utilizing pinhole model without taking distortion into consideration is very unpractical. To well describe the projection from the world to image plane, a camera model includes both radial and tangential distortion. Fig SEQ Fig \* ARABIC 3 Radial distortion Radial distortion occurs due to the curvature of a lens, and it is more serious near the edge of a lens than it is at the optical center. This can be easily explained by Snell’s law because light rays bend more at the edge of a lens as the normal of the incident plane deviates from the optical axis more. The relationship between radially distorted point () and undistorted points ()can be represented as:


Tangential distortion results from the unparallel between the lens and sensor and shows a wedged image. This unparallel could be decomposed into translation and rotation as well. The relationship between tangentially distorted point and undistorted points (x,y) is:

In short, these five coefficients are known as distortion coefficients(k1,k2,p1,p2,k3). Typically, is sufficient to describe the distortion and one may include k3 for severe distortion.
The target of calibration process is to find out the extrinsic parameters, the intrinsic parameters, and the distortion coefficients. To be more concise, when we talk about “Intrinsic Parameters” with capitalized first letter hereinafter, it includes both intrinsic parameters and distortion coefficients. One shall be able to calculate world coordinates from pixel coordinates afterwards. Quite intuitively, the closer the calibrated parameters to the realistic camera setting are, the more accurate world coordinates can be obtained.
One Step More for Stereo Camera

One may wonder that, since the world coordinates of the image can be calculated after a single camera is calibrated, why do we need a stereo camera? Let’s reconsider how camera works: it projects a “3D” world into “2D” image plane – that means a dimension reduction occurs and the information in that dimension is lost during projection. Obviously, the lost information is “depth.” Without the depth information, there is intrinsic ambiguity for reconstruction of 3D scene from 2D images.
A simple method to revert depth from a single 2D image is to taking a picture of a reference object with known size and distance from the camera and to proportionally infer the depth of other objects mapped on the image plane. However, the accuracy and stability of the depth information acquired by this method are not sufficient for certain applications.
Epipolar Geometry and Triangulation Method

The epipolar geometry is a well-known approach to revert depth information in computer
vision. Its principle is to compare the image coordinates of the same object via cameras at different locations.
Applying the knowledge we introduced in the previous section, since the extrinsic
parameters describe the rotation and translation of a camera with respect to the world coordinate, the relative position between the stereo cameras could be obtained by the extrinsic parameters of stereo cameras. Taking one step further, the image coordinates on the stereo cameras can be interpreted by their individual Intrinsic Parameters. We are not going to dive into the mathematics of epipolar geometry herein, but one thing shall be kept in mind that, the epipolar geometry is applicable based on known Intrinsic Parameters and extrinsic parameters of stereo cameras, meaning camera calibration is essential to revert depth information for this method.

Triangulation method is an intuitive approach to obtain depth information, so it provides an efficient communication between end application and stereo camera designer. Luckily, triangulation is a special case of epipolar geometry under certain constrains are satisfied. These constrains are:
The focal lengths of stereo cameras are the same.
The image planes of cameras are parallel to each other and to the baseline, where the baseline is defined by the horizontal position of stereo cameras.
The vertical positions of the center of stereo cameras are identical to each other.
As stereo camera designer usually uses same lenses for stereo camera, the constraint (1) can hold if the stereo camera goes through the calibration stated above; however, constraints (2) and (3) are not included in camera calibration. Extra image processing called “image rectification” is then required prior to applying triangular method to guarantee the constraints (2) and (3) hold for images from stereo cameras.
In the beginning, the term “camera calibration” refers to the getting the intrinsic and extrinsic parameters of a camera. However, as the stereo camera emerges and the popularity of triangular method, image rectification almost becomes a “must” before a stereo camera to be shipped. Therefore, the term “camera calibration” for stereo camera manufacturer includes image rectification very often.
Moreover, the epipolar geometry and triangular method states herein do not account for the image quality. In fact, several image processing has to be applied to the images in order to successfully utilize either epipolar geometry or triangulation method. There are plenty of literature about image processing to improve depth information and this is not in the scope of this document.
How eCapture Depth Camera Works
Each eCapture depth camera is well calibrated before shipment. The camera parameters are stored in the registers after calibration. A depth engine implemented in the IC of each eCapture camera will read the camera parameters to perform all the necessary image processing to generate the depth information with respect to the world coordinates then. To get more detailed information of the output data, please refer to the brief of the products.
Figures | Source |
---|---|
Fig. 1 | |
Fig. 2 | |
Fig. 3 | |
Fig. 4 | |
Fig. 5 | |
Fig. 6 | |
Fig. 7 |