OpenCV  5.0.0-pre
Open Source Computer Vision
Loading...
Searching...
No Matches
Multi-view Camera Calibration Tutorial

Prev Tutorial: Interactive camera calibration application
Next Tutorial: USAC: Improvement of Random Sample Consensus in OpenCV

Original author Maksym Ivashechkin
Compatibility OpenCV >= 5.0

Structure:

This tutorial consists of the following sections:

  • Introduction
  • Briefly
  • How to run
  • Python example
  • Python visualization
  • Details Of The Algorithm
  • Method Input
  • Method Output
  • Method Input
  • Pseudocode
  • Python sample API
  • C++ sample API

Introduction

Multiview calibration is a very important task in computer vision. It is widely used in 3D reconstruction, structure from motion, autonomous driving etc. The calibration procedure is often the first step for any vision task that must be done to obtain intrinsics and extrinsics parameters of the cameras. The accuracy of camera calibration parameters directly influence all further computations and results, hence, estimating precise intrinsincs and extrinsics is crucial.

The calibration algorithms require a set of images for each camera, where on the images a calibration pattern (e.g., checkerboard, aruco etc) is visible and detected. Additionally, to get results with a real scale, the 3D distance between two neighbor points of the calibration pattern grid should be measured. For extrinsics calibration, images must share the calibration pattern obtained from different views, i.e., overlap of cameras' field of view. Moreover, images that share the pattern grid have to be taken at the same moment of time, or in other words, cameras must be synchronized. Otherwise, the extrinsics calibration will fail.

The intrinsics calibration incorporates estimation of focal lengths, skew, and principal point of the camera; these parameters are combined in the intrinsic upper triangular matrix of size 3x3. Additionally, intrinsic calibration includes finding distortion parameters of the camera. The extrinsics parameters represent a relative rotation and translation between two cameras. Therefore, for \(N\) cameras, a sufficient amount of correctly selected pairs of estimated relative rotations and translations is \(N-1\), while extrinsics parameters for all possible pairs \(N^2 = N * (N-1) / 2\) could be derived from those that are estimated. More details about intrinsics calibration could be found in this tutorial Create calibration pattern, and its implementation cv::calibrateCamera.

After intrinsics and extrinsics calibration, the projection matrices of cameras are found by combing intrinsic, rotation matrices and translation. The projection matrices enable doing triangulation (3D reconstruction), rectification, finding epipolar geometry etc.

The following sections describes the individual algorithmic steps of the overall multi-camera calibration pipeline:

Briefly

The algorithm consists of three major steps that could be enumerated as follows:

  1. Calibrate intrinsics parameters (intrinsic matrix and distortion coefficients) for each camera independently.
  2. Calibrate pairwise cameras (using stereo calibration) using intrinsics parameters from the step 1.
  3. Do global optimization using all cameras simultaneously to refine extrinsic parameters.

How to run:

Assume we have N camera views, for each i-th view there are M images containing pattern points (e.g., checkerboard).

Python example

There are two options to run the sample code in Python (opencv/samples/python/multiview_calibration.py) either with raw images or provided points. The first option is to prepare N files where each file has path to image per line (images of a specific camera of the corresponding file). For example, a file for camera i should look like (file_i.txt):

/path/to/image_1_of_camera_i
...
/path/to/image_M_of_camera_i

Then sample program could be run via command line as follows:

$ python3 multiview_calibration.py --pattern_size W,H --pattern_type TYPE --fisheye IS_FISHEYE_1,...,IS_FISHEYE_N \
--pattern_distance DIST --filenames /path/to/file_1.txt,...,/path/to/file_N.txt

Replace W and H with size of the pattern points, TYPE with name of a type of the calibration grid (supported patterns: checkerboard, circles, acircles), IS_FISHEYE corresponds to the camera type (1 - is fisheye, 0 - pinhole), DIST is pattern distance (i.e., distance between two cells of checkerboard). The sample script automatically detects image points accordingly to the specified pattern type. By default detection is done in parallel, but this option could be turned off.

Additional (optional) flags to Python sample that could be used are as follows:

  • --winsize - pass values H,W to define window size for corners detection (default is 5,5).
  • --debug_corners - pass True or False. If True program shows several random images with detected corners for user to manually verify the detection (default is False).
  • --points_json_file - pass name of JSON file where image and pattern points could be saved after detection. Later this file could be used to run sample code. Default value is '' (nothing is saved).
  • --find_intrinsics_in_python - pass 0 or 1. If 1 then the Python sample automatically calibrates intrinsics parameters and reports reprojection errors. The multiview calibration is done only for extrinsics parameters. This flag aims to separate calibration process and make it easier to debug what goes wrong.
  • --path_to_save - path to save results in pickle file
  • --path_to_visualize - path to results pickle file needed to run visualization
  • --visualize - visualization flag (True or False), if True only runs visualization but path_to_visualize must be provided
  • --resize_image_detection - True / False, if True an image will be resized to speed-up corners detection

Alternatively, the Python sample could be run from JSON file that should contain image points, pattern points, and boolean indicator whether a camera is fisheye. The example of JSON file is in opencv_extra/testdata/python/multiview_calibration_data.json (currently under pull request 1001 in opencv_extra). Its format should be dictionary with the following items:

  • object_points - list of lists of pattern (object) points (size NUM_POINTS x 3).
  • image_points - list of lists of lists of lists of image points (size NUM_CAMERAS x NUM_FRAMES x NUM_POINTS x 2).
  • image_sizes - list of tuples (width x height) of image size.
  • is_fisheye - list of boolean values (true - fisheye camera, false - otherwise). Optionally:
  • Ks and distortions - intrinsics parameters. If they are provided in JSON file then the proposed method does not estimate intrinsics parameters. Ks (intrinsic matrices) is list of lists of lists (NUM_CAMERAS x 3 x 3), distortions is list of lists (NUM_CAMERAS x NUM_VALUES) of distortion parameters.
  • images_names - list of lists (NUM_CAMERAS x NUM_FRAMES x string) of image filenames for visualization of points after calibration.
$ python3 multiview_calibration.py --json_file /path/to/json

The description of flags could be found directly by running the sample script with help option:

python3 multiview_calibration.py --help

The expected output in Linux terminal for multiview_calibration_images data (from opencv_extra/testdata/python/ generated in Blender) should be the following:

The expected output for real-life calibration images in opencv_extra/testdata/python/real_multiview_calibration_images is the following:

Python visualization

Apart from estimated extrinsics / intrinsics, the python sample provides a comprehensive visualization. Firstly, the sample shows positions of cameras, checkerboard (of a random frame), and pairs of cameras connected by black lines explicitly demonstrating tuples that were used in the initial stage of stereo calibration. If images are not known, then a simple plot with arrows (from given point to the back-projected one) visualizing errors is shown. The color of arrows highlights the error values. Additionally, the title reports mean error on this frame, and its accuracy among other frames used in calibration. The following test instances were synthetically generated (see opencv/apps/python-calibration-generator/calibration_generator.py):

This instance has large Gaussian points noise.

Another example, with more complex tree structure is here, it shows a weak connection between two groups of cameras.

If files to images are provided, then the output is an image with plotted arrows:

Details Of The Algorithm

  1. If the intrinsics are not provided, the calibration procedure starts intrinsics calibration independently for each camera using OpenCV function cv::calibrateCamera.
  • a. If input is a combination of fisheye and pinhole cameras, then fisheye images are calibrated with the default OpenCV calibrate function. The reason is that stereo calibration in OpenCV does not support a mix of fisheye and pinhole cameras. The following flags are used in this scenario;
  • * i. cv::CALIB_RATIONAL_MODEL - it extends default (5 coefficients) distortion model and returns more parameters.
  • * ii. cv::CALIB_ZERO_TANGENT_DIST - it zeroes out tangential distortion coefficients, since the fisheye model does not have them.
  • * iii. cv::CALIB_FIX_K5, cv::CALIB_FIX_K6 - it zeroes out the fifth and sixth parameter, so in total 4 parameters are returned.
  • b. Output of intrinsic calibration is also rotation, translation vectors (transform of pattern points to camera frame), and errors per frame.
  • * i. For each frame, the index of the camera with the lowest error among all cameras is saved.
  1. Otherwise, if intrinsics are known, then the proposed algorithm runs perspective-n-point estimation (cv::solvePnP) to estimate rotation and translation vectors, and reprojection error for each frame.
  2. Assume that cameras can be represented as nodes of a connected graph. An edge between two cameras is created if there is any image overlap over all frames. If the graph does not connect all cameras (i.e., exists a camera that has no overlap with other cameras) then calibration is not possible. Otherwise, the next step consists of finding the maximum spanning tree (MST) of this graph. The MST captures all best pairwise camera connections. The weight of edges across all frames is a weighted combination of multiple factors:
  • a. The main contribution is a number of pattern points detected in both images (cameras).
  • b. Ratio of area of convex hull of projected points in the image to the image resolution.
  • c. Angle between cameras' optical axes (found from rotation vectors).
  • d. Angle between the camera's optical axis and the pattern's normal vector (found from 3 non-collinear pattern's points).
  1. The initial estimate of cameras' extrinsics is found by pairwise stereo calibration (see cv::stereoCalibrate). Without loss of generality, the 0-th camera’s rotation is fixed to identity and translation to zero vector, and the 0-th node becomes the root of the MST. The order of stereo calibration is selected by traversing MST in breadth first search, starting from the root. The total number of pairs (also number of edges of tree) is NUM_CAMERAS - 1, which is property of a tree graph.
  2. Given the initial estimate of extrinsics the aim is to polish results using global optimization (via Levenberq-Marquardt method, see cv::LevMarq class).
  • a. To reduce the total number of parameters, all rotation and translation vectors estimated in the first step from intrinsics calibration with the lowest error are transformed to be relative with respect to the root camera.
  • b. The total number of parameters is (NUM_CAMERAS - 1) x (3 + 3) + NUM_FRAMES x (3 + 3), where 3 stands for a rotation vector and 3 for a translation vector. The first part of parameters are extrinsics, and the second part is for rotation and translation vectors per frame.
  • c. Robust function is additionally applied to mitigate impact of outlier points during the optimization. The function has the shape of derivative of Gaussian, or it is $x * exp(-x/s)$ (efficiently implemented by approximation of the exp), where x is a square pixel error, and s is manually pre-defined scale. The choice of this function is that it is increasing on the interval of 0 to y pixel error, and it’s decreasing thereafter. The idea is that the function slightly decreases errors until it reaches y, and if error is too high (more than y) then its robust value limits to 0. The value of scale factor was found by exhaustive evaluation that forces robust function to almost linearly increase until the robust value of an error is 10 px and decrease afterwards (see graph of the function below). The value itself is equal to 30, but could be modified in OpenCV source code.

Method Input

The high-level input of the proposed method is as follows:

  • Pattern (object) points. (NUM_FRAMES x) NUM_PATTERN_POINTS x 3. Points may contain a copy of pattern points along frames.
  • Image points: NUM_CAMERAS x NUM_FRAMES x NUM_PATTERN_POINTS x 2.
  • Image sizes: NUM_CAMERAS x 2 (width and height).
  • Detection mask matrix of size NUM_CAMERAS x NUM_FRAMES that indicates whether pattern points are detected for specific camera and frame index.
  • Ks (optional) - intrinsic matrices per camera.
  • Distortions (optional).
  • use_intrinsics_guess - indicates whether intrinsics are provided.
  • Flags_intrinsics - flag for intrinsics estimation.

Method Output

The high-level output of the proposed method is the following:

  • Boolean indicator of success
  • Rotation and translation vectors of extrinsics parameters with respect to camera (relative) 0. Number of vectors is NUM_CAMERAS-1, for the first camera rotation and translation vectors are zero.
  • Intrinsic matrix for each camera.
  • Distortion coefficients for each camera.
  • Rotation and translation vectors of each frame pattern with respect to camera 0. The combination of rotation and translation is able to tranform the pattern points to the camera coordinate space, and hence with intrinsics parameters project 3D points to image.
  • Matrix of reprojection errors of size NUM_CAMERAS x NUM_FRAMES
  • Output pairs used for initial estimation of extrinsics, number of pairs is NUM_CAMERAS-1.

Pseudocode

The idea of the method could be demonstrated in a high-level pseudocode whereas the whole C++ implementation of the proposed approach is implemented in opencv/modules/calib/src/multiview_calibration.cpp file.

def mutiviewCalibration (pattern_points, image_points, detection_mask):
for cam_i = 1,…,NUMBER_CAMERAS:
if CALIBRATE_INTRINSICS:
K_i, distortion_i, rvecs_i, tvecs_i = calibrateCamera(pattern_points, image_points[cam_i])
else:
rvecs_i, tvecs_i = solvePnP(pattern_points, image_points[cam_i], K_i, distortion_i)
# Select best rvecs, tvecs based on reprojection errors. Process data:
pattern_img_area[cam_i][frame] = area(convexHull(image_points[cam_i][frame]
angle_to_board[cam_i][frame] = arccos(pattern_normal_frame * optical_axis_cam_i)
angle_cam_to_cam[cam_i][cam_j] = arccos(optical_axis_cam_i * optical_axis_cam_j)
graph = maximumSpanningTree(detection_mask, pattern_img_area, angle_to_board, angle_cam_to_cam)
camera_pairs = bread_first_search(graph, root_camera=0)
for pair in camera_pairs:
# find relative rotation, translation from camera i to j
R_ij, t_ij = stereoCalibrate(pattern_points, image_points[i], image_points[j])
R*, t* = optimizeLevenbergMarquardt(R, t, pattern_points, image_points, K, distortion)

Python sample API

To run the calibration procedure in Python follow the following steps (see sample code in samples/python/multiview_calibration.py):

  1. Prepare data:
if pattern_type.lower() == 'checkerboard' or pattern_type.lower() == 'charuco':
pattern = chessboard_points(grid_size, dist_m)
elif pattern_type.lower() == 'circles':
pattern = circles_grid_points(grid_size, dist_m)
elif pattern_type.lower() == 'acircles':
pattern = asym_circles_grid_points(grid_size, dist_m)
else:
raise NotImplementedError("Pattern type is not implemented!")
if pattern_type.lower() == 'charuco':
assert (board_dict_path is not None) and os.path.exists(board_dict_path)
board_dict = json.load(open(board_dict_path, 'r'))

The detection mask matrix is later built by checking the size of image points after detection:

  1. Detect pattern points on images:
if pattern_type.lower() == 'checkerboard':
ret, corners = cv.findChessboardCorners(
cv.cvtColor(img_detection, cv.COLOR_BGR2GRAY), grid_size, None
)
if ret:
if scale < 1.0:
corners /= scale
corners2 = cv.cornerSubPix(cv.cvtColor(img, cv.COLOR_BGR2GRAY),
corners, winsize, (-1,-1), criteria)
elif pattern_type.lower() == 'circles':
ret, corners = cv.findCirclesGrid(
img_detection, patternSize=grid_size, flags=cv.CALIB_CB_SYMMETRIC_GRID
)
if ret:
corners2 = corners / scale
elif pattern_type.lower() == 'acircles':
ret, corners = cv.findCirclesGrid(
img_detection, patternSize=grid_size, flags=cv.CALIB_CB_ASYMMETRIC_GRID
)
if ret:
corners2 = corners / scale
elif pattern_type.lower() == 'charuco':
dictionary = cv.aruco.getPredefinedDictionary(board_dict["dictionary"])
size=(grid_size[0] + 1, grid_size[1] + 1),
squareLength=board_dict["square_size"],
markerLength=board_dict["marker_size"],
dictionary=dictionary
)
# The found best practice is to refine detected Aruco marker with contour,
# then refine subpix with the board functions
detector_params = cv.aruco.DetectorParameters()
charuco_params = cv.aruco.CharucoParameters()
charuco_params.tryRefineMarkers = True
detector_params.cornerRefinementMethod = cv.aruco.CORNER_REFINE_CONTOUR
refine_params = cv.aruco.RefineParameters()
detector = cv.aruco.CharucoDetector(board, charuco_params, detector_params, refine_params)
charucoCorners, charucoIds, _, _ = detector.detectBoard(img_detection)
corners = np.ones([grid_size[0] * grid_size[1], 1, 2]) * -1
ret = (not charucoIds is None) and charucoIds.flatten().size > 3
if ret:
corners[charucoIds.flatten()] = cv.cornerSubPix(cv.cvtColor(img, cv.COLOR_BGR2GRAY),
charucoCorners / scale, winsize, (-1,-1), criteria)
corners2 = corners
else:
raise ValueError("Calibration pattern is not supported!")
ChArUco board is a planar chessboard where the markers are placed inside the white squares of a chess...
Definition aruco_board.hpp:135
Definition charuco_detector.hpp:33
bool findCirclesGrid(InputArray image, Size patternSize, OutputArray centers, int flags, const Ptr< FeatureDetector > &blobDetector, const CirclesGridFinderParameters &parameters)
Finds centers in the grid of circles.
bool findChessboardCorners(InputArray image, Size patternSize, OutputArray corners, int flags=CALIB_CB_ADAPTIVE_THRESH+CALIB_CB_NORMALIZE_IMAGE)
Finds the positions of internal corners of the chessboard.
void cvtColor(InputArray src, OutputArray dst, int code, int dstCn=0)
Converts an image from one color space to another.
void cornerSubPix(InputArray image, InputOutputArray corners, Size winSize, Size zeroZone, TermCriteria criteria)
Refines the corner locations.
Dictionary getPredefinedDictionary(PredefinedDictionaryType name)
Returns one of the predefined dictionaries defined in PredefinedDictionaryType.
Definition charuco_detector.hpp:15
struct DetectorParameters is used by ArucoDetector
Definition aruco_detector.hpp:25
struct RefineParameters is used by ArucoDetector
Definition aruco_detector.hpp:238
  1. Build detection mask matrix:
for i in range(len(image_points)):
for j in range(len(image_points[0])):
detection_mask[i,j] = int(len(image_points[i][j]) != 0)
  1. Finally, the calibration function is run as follows:
rmse, Rs, Ts, Ks, distortions, rvecs0, tvecs0, errors_per_frame, output_pairs = \
objPoints=pattern_points_all,
imagePoints=image_points,
imageSize=image_sizes,
detectionMask=detection_mask,
Rs=None,
Ts=None,
Ks=Ks,
distortions=distortions,
isFisheye=np.array(is_fisheye, dtype=np.uint8),
useIntrinsicsGuess=USE_INTRINSICS_GUESS,
flagsForIntrinsics=np.array([pinhole_flag if not is_fisheye[x] else fisheye_flag for x in range(num_cameras)], dtype=int),
)
double calibrateMultiview(InputArrayOfArrays objPoints, const std::vector< std::vector< Mat > > &imagePoints, const std::vector< Size > &imageSize, InputArray detectionMask, OutputArrayOfArrays Rs, OutputArrayOfArrays Ts, std::vector< Mat > &Ks, std::vector< Mat > &distortions, OutputArrayOfArrays rvecs0, OutputArrayOfArrays tvecs0, InputArray isFisheye, OutputArray perFrameErrors, OutputArray initializationPairs, bool useIntrinsicsGuess=false, InputArray flagsForIntrinsics=noArray())
Estimates intrinsics and extrinsics (camera pose) for multi-camera system a.k.a multiview calibraton.

C++ sample API

To run the calibration procedure in C++ follow the following steps (see sample code in opencv/samples/cpp/multiview_calibration_sample.cpp):

  1. Prepare data similarly to Python sample, ie., pattern size and scale, fisheye camera mask, files containing image filenames, and pass them to function:
static void detectPointsAndCalibrate (cv::Size pattern_size, float pattern_scale, const std::string &pattern_type,
const std::vector<bool> &is_fisheye, const std::vector<std::string> &filenames)
Template class for specifying the size of an image or rectangle.
Definition types.hpp:335
  1. Initialize data:
std::vector<cv::Point3f> board (pattern_size.area());
const int num_cameras = (int)is_fisheye.size();
std::vector<std::vector<cv::Mat>> image_points_all;
std::vector<cv::Size> image_sizes;
std::vector<cv::Mat> Ks, distortions, Ts, Rs;
cv::Mat rvecs0, tvecs0, errors_mat, output_pairs;
if (pattern_type == "checkerboard") {
for (int i = 0; i < pattern_size.height; i++) {
for (int j = 0; j < pattern_size.width; j++) {
board[i*pattern_size.width+j] = cv::Point3f((float)j, (float)i, 0.f) * pattern_scale;
}
}
} else if (pattern_type == "circles") {
for (int i = 0; i < pattern_size.height; i++) {
for (int j = 0; j < pattern_size.width; j++) {
board[i*pattern_size.width+j] = cv::Point3f((float)j, (float)i, 0.f) * pattern_scale;
}
}
} else if (pattern_type == "acircles") {
for (int i = 0; i < pattern_size.height; i++) {
for (int j = 0; j < pattern_size.width; j++) {
if (i % 2 == 1) {
board[i*pattern_size.width+j] = cv::Point3f((j + .5f)*pattern_scale, (i/2 + .5f) * pattern_scale, 0.f);
} else{
board[i*pattern_size.width+j] = cv::Point3f(j*pattern_scale, (i/2)*pattern_scale, 0);
}
}
}
}
else {
CV_Error(cv::Error::StsNotImplemented, "pattern_type is not implemented!");
}
n-dimensional dense array class
Definition mat.hpp:816
_Tp height
the height
Definition types.hpp:363
_Tp area() const
the area (width*height)
_Tp width
the width
Definition types.hpp:362
Point3_< float > Point3f
Definition types.hpp:290
#define CV_Error(code, msg)
Call the error handler.
Definition base.hpp:320
@ StsNotImplemented
the requested function/feature is not implemented
Definition base.hpp:113
  1. Detect pattern points on images:
int num_frames = -1;
for (const auto &filename : filenames) {
std::fstream file(filename);
CV_Assert(file.is_open());
std::string img_file;
std::vector<cv::Mat> image_points_cameras;
bool save_img_size = true;
while (std::getline(file, img_file)) {
if (img_file.empty())
break;
std::cout << img_file << "\n";
cv::Mat img = cv::imread(img_file), corners;
if (save_img_size) {
image_sizes.emplace_back(cv::Size(img.cols, img.rows));
save_img_size = false;
}
bool success = false;
if (pattern_type == "checkerboard") {
success = cv::findChessboardCorners(img, pattern_size, corners);
}
else if (pattern_type == "circles")
{
success = cv::findCirclesGrid(img, pattern_size, corners, cv::CALIB_CB_SYMMETRIC_GRID);
}
else if (pattern_type == "acircles")
{
success = cv::findCirclesGrid(img, pattern_size, corners, cv::CALIB_CB_ASYMMETRIC_GRID);
}
cv::Mat corners2;
corners.convertTo(corners2, CV_32FC2);
if (success && corners.rows == pattern_size.area())
image_points_cameras.emplace_back(corners2);
else
image_points_cameras.emplace_back(cv::Mat());
}
if (num_frames == -1)
num_frames = (int)image_points_cameras.size();
else
CV_Assert(num_frames == (int)image_points_cameras.size());
image_points_all.emplace_back(image_points_cameras);
}
int cols
Definition mat.hpp:2155
int rows
the number of rows and columns or (-1, -1) when the matrix has more than 2 dimensions
Definition mat.hpp:2155
void convertTo(OutputArray m, int rtype, double alpha=1, double beta=0) const
Converts an array to another data type with optional scaling.
@ CALIB_CB_SYMMETRIC_GRID
Definition calib.hpp:416
@ CALIB_CB_ASYMMETRIC_GRID
Definition calib.hpp:417
#define CV_32FC2
Definition interface.h:130
#define CV_Assert(expr)
Checks a condition at runtime and throws exception if it fails.
Definition base.hpp:342
CV_EXPORTS_W Mat imread(const String &filename, int flags=IMREAD_COLOR)
Loads an image from a file.
@ COLOR_BGR2GRAY
convert between RGB/BGR and grayscale, color conversions
Definition imgproc.hpp:559
  1. Build detection mask matrix:
cv::Mat visibility(num_cameras, num_frames, CV_8UC1);
for (int i = 0; i < num_cameras; i++) {
for (int j = 0; j < num_frames; j++) {
visibility.at<unsigned char>(i,j) = image_points_all[i][j].empty() ? 0 : 1;
}
}
#define CV_8UC1
Definition interface.h:99
  1. Run calibration:
const double rmse = calibrateMultiview(objPoints, image_points_all, image_sizes, visibility,
Rs, Ts, Ks, distortions, cv::noArray(), cv::noArray(),
is_fisheye, errors_mat, output_pairs);
InputOutputArray noArray()