OpenCV 5.0.0-pre
Open Source Computer Vision
Loading...
Searching...
No Matches
Multi-view Camera Calibration Tutorial

Prev Tutorial: Interactive camera calibration application
Next Tutorial: USAC: Improvement of Random Sample Consensus in OpenCV

Original author Maksym Ivashechkin, Linfei Pan
Compatibility OpenCV >= 5.0

Structure

This tutorial consists of the following sections:

  • Introduction
  • Briefly
  • How to run
  • Python example
  • Details Of The Algorithm
  • Method Input
  • Method Output
  • Method Input
  • Pseudocode
  • Python sample API
  • C++ sample API
  • Practical Debugging Techniques

Introduction

Multiview calibration is a very important task in computer vision. It is widely used in 3D reconstruction, structure from motion, autonomous driving, etc. The calibration procedure is often the first step for any vision task that must be done to obtain the intrinsics and extrinsics parameters of the cameras. The accuracy of camera calibration parameters directly influences all further computations and results, hence, estimating precise intrinsics and extrinsics is crucial.

The calibration algorithms require a set of images for each camera, where on the images a calibration pattern (e.g., checkerboard, ChArUco, etc.) is visible and detected. Additionally, to get results with a real scale, the 3D distance between two neighbor points of the calibration pattern grid should be measured. For extrinsics calibration, images must share the calibration pattern obtained from different views. An example setup can be found in the following figure.

Moreover, images that share the pattern grid have to be taken at the same moment, or in other words, cameras must be synchronized. Otherwise, the extrinsics calibration will fail. Note that if each pattern point can be uniquely determined (for example, if a ChArUco target is used, see cv::aruco::CharucoBoard), it is also possible to calibrate based only on partial observation. This is recommended as the overlapping field of view between camera pairs is usually limited in multiview-camera calibration, and it is generally difficult for them to observe the complete pattern at the same time.

The intrinsics calibration incorporates the estimation of focal lengths, skew, and the principal point of the camera; these parameters are combined in the intrinsic upper triangular matrix of size 3x3. Additionally, intrinsic calibration includes finding the distortion parameters of the camera.

The extrinsics parameters represent a relative rotation and translation between two cameras. For each frame, suppose the absolute camera pose for camera \(i\) is \(R_i, t_i\), and the relative camera pose between camera \(i\) and camera \(j\) is \(R_{ij}, t_{ij}\). Suppose \(R_1, t_1\), and \(R_{1i}\) for any \(i\not=1\) are known, then its pose can be calculated by

\[ R_i = R_{1i} R_1\]

\[ t_i = R_{1i} t_1 + t_{1i}\]

Since the relative pose between two cameras can be calculated by

\[ R_{ij} = R_j R_i^\top \]

\[ t_{ij} = -R_{ij} t_i + R_j \]

This implies that any other relative pose of the form \(R_{ij}, i\not=1\) is redundant. Therefore, for \(N\) cameras, a sufficient amount of correctly selected pairs of estimated relative rotations and translations is \(N-1\), while extrinsics parameters for all possible pairs \(N^2 = N * (N-1) / 2\) could be derived from those that are estimated. More details about intrinsics calibration can be found in this tutorial Create calibration pattern, and its implementation cv::calibrateCamera.

After intrinsics and extrinsics calibration, the projection matrices of cameras are found by combing intrinsic, rotation matrices, and translation. The projection matrices enable doing triangulation (3D reconstruction), rectification, finding epipolar geometry, etc.

The following sections describe the individual algorithmic steps of the overall multi-camera calibration pipeline:

Briefly

The algorithm consists of three major steps that could be enumerated as follows:

  1. Calibrate intrinsics parameters (intrinsic matrix and distortion coefficients) for each camera independently.
  2. Calibrate pairwise cameras (using camera pair registration) using intrinsics parameters from step 1.
  3. Do global optimization using all cameras simultaneously to refine extrinsic parameters.

How to run:

Assume we have N camera views, for each i-th view there are M images containing pattern points (e.g., checkerboard).

Python example

There are two options to run the sample code in Python (opencv/samples/python/multiview_calibration.py) either with raw images or provided points. The first option is to prepare N files where each file has the path to an image per line (images of a specific camera of the corresponding file). Leave the line empty, if there is no corresponding image for the camera in a certain frame. For example, a file for camera i should look like (file_i.txt):

/path/to/image_1_of_camera_i
/path/to/image_3_of_camera_i
...
/path/to/image_M_of_camera_i

The path to images should be a relative path concerning file_i.txt. Then sample program could be run via the command line as follows:

$ python3 multiview_calibration.py --pattern_size W,H --pattern_type TYPE --is_fisheye IS_FISHEYE_1,...,IS_FISHEYE_N \
--pattern_distance DIST --filenames /path/to/file_1.txt,...,/path/to/file_N.txt

Replace W and H with the size of the pattern points, TYPE with the name of a type of the calibration grid (supported patterns: checkerboard, circles, acircles), IS_FISHEYE corresponds to the camera type (1 - is fisheye, 0 - pinhole), DIST is pattern distance (i.e., the distance between two cells of a checkerboard). The sample script automatically detects image points according to the specified pattern type. By default, detection is done in parallel, but this option could be turned off.

Additional (optional) flags to the Python sample that could be used are as follows:

  • --winsize - pass values H,W to define window size for corners detection (default is 5,5).
  • --debug_corners - pass True or False. If True program shows several random images with detected corners for a user to manually verify the detection (default is False).
  • --points_json_file - pass name of JSON file where image and pattern points could be saved after detection. Later this file could be used to run sample code. The default value is '' (nothing is saved).
  • --find_intrinsics_in_python - pass 0 or 1. If 1 then the Python sample automatically calibrates intrinsics parameters and reports reprojection errors. The multiview calibration is done only for extrinsics parameters. This flag aims to separate the calibration process and make it easier to debug what goes wrong.
  • --path_to_save - path to save results in a pickle file
  • --path_to_visualize - path to results pickle file needed to run visualization
  • --visualize - visualization flag (True or False), if True only runs visualization but path_to_visualize must be provided
  • --resize_image_detection - True / False, if True an image will be resized to speed up corners detection
  • --gt_file - path to the file containing the ground truth. An example can be found in opencv_extra/testdata/python/hololens_multiview_calibration_images/HololensCapture4/gt.txt (currently in pull request 1089). It is in the format
    K_0 (3 x 3)
    distortion_0 (1 row),
    R_0 (3 x 3)
    t_0 (3 x 1)
    ...
    K_n (3 x 3)
    distortion_n (1 row),
    R_n (3 x 3)
    t_n (3 x 1)
    # (Optional, pose for each frame)
    R_f0 (3 x 3)
    t_f1 (3 x 1)
    ...
    R_fm (3 x 3)
    t_fm (3 x 1)

Alternatively, the Python sample could be run from a JSON file that should contain image points, pattern points, and a boolean indicator of whether a camera is fisheye. An example JSON file is in opencv_extra/testdata/python/multiview_calibration_data.json (current in pull request 1001). Its format should be a dictionary with the following items:

  • object_points - list of lists of pattern (object) points (size NUM_POINTS x 3).
  • image_points - list of lists of lists of lists of image points (size NUM_CAMERAS x NUM_FRAMES x NUM_POINTS x 2). Note that it is of fixed size. To have incomplete observation, set the corresponding image points to be invalid (for example, (-1, -1))
  • image_sizes - list of tuples (width x height) of image size.
  • is_fisheye - list of boolean values (true - fisheye camera, false - otherwise). Optionally:
  • Ks and distortions - intrinsics parameters. If they are provided in JSON file then the proposed method does not estimate intrinsics parameters. Ks (intrinsic matrices) is a list of lists of lists (NUM_CAMERAS x 3 x 3), distortions is a list of lists (NUM_CAMERAS x NUM_VALUES) of distortion parameters.
  • images_names - list of lists (NUM_CAMERAS x NUM_FRAMES x string) of image filenames for visualization of points after calibration.
$ python3 multiview_calibration.py --json_file /path/to/json

The description of flags can be found directly by running the sample script with the help option:

python3 multiview_calibration.py --help

The expected output in the Linux terminal for multiview_calibration_images data (from opencv_extra/testdata/python/ generated in Blender) should be the following:

The expected output for real-life calibration images in opencv_extra/testdata/python/real_multiview_calibration_images is the following:

The expected output for real-life calibration images in opencv_extra/testdata/python/hololens_multiview_calibration_images is the following

The command used

python3 multiview_calibration.py --filenames ../../results/hololens/HololensCapture1/output/cam_0.txt,../../results/hololens/HololensCapture1/output/cam_1.txt,../../results/hololens/HololensCapture1/output/cam_2.txt,../../results/hololens/HololensCapture1/output/cam_3.txt --pattern_size 6,10 --pattern_type charuco --fisheye 0,0,0,0 --pattern_distance 0.108 --board_dict_path ../../results/hololens/charuco_dict.json --gt_file ../../results/hololens/HololensCapture1/output/gt.txt

Details Of The Algorithm

  1. Intrinsics estimation, and rotation and translation initialization
    1. If the intrinsics are not provided, the calibration procedure starts intrinsics calibration independently for each camera using the OpenCV function cv::calibrateCamera.
      1. The following flags are used for the calibrating pinhole camera and fisheye camera
      2. To avoid degeneracy setting that all image points are collinear, a degeneracy check is performed by marking images with fewer than 4 observations or frames with less than 0.5% coverage as invalid.
      3. Output of intrinsic calibration also includes rotation, translation vectors (transform of pattern points to camera frame), and errors per frame. For each frame, the index of the camera with the lowest error among all cameras is saved.
    2. Otherwise, if intrinsics are known, then the proposed algorithm runs perspective-n-point estimation (cv::solvePnP, cv::fisheye::solvePnP) to estimate rotation and translation vectors, and reprojection error for each frame.
  2. Initialization of relative camera pose.
    1. If the initial relative poses are not assumed known (CALIB_USE_EXTRINSIC_GUESS flag not set), then the relative camera extrinsics are found by traversing a spanning tree and estimating pairwise relative camera pose.
      1. Miminal spanning tree establishment. Assume that cameras can be represented as nodes of a connected graph. An edge between two cameras is created if there is any concurrent observation over all frames. If the graph does not connect all cameras (i.e., exists a camera that has no overlap with other cameras) then calibration is not possible. Otherwise, the next step consists of finding the maximum spanning tree (MST) of this graph. The MST captures all the best pairwise camera connections. The weight of edges across all frames is a weighted combination of multiple factors:
        • (Major) The number of pattern points detected in both images (cameras)
        • Ratio of area of convex hull of projected points in the image to the image resolution.
        • Angle between cameras' optical axes (found from rotation vectors).
        • Angle between the camera's optical axis and the pattern's normal vector (found from 3 non-collinear pattern points).
      2. Initialization of relative camera pose. The initial estimate of cameras' extrinsics is found by pairwise camera registration (see cv::registerCameras). Without loss of generality, the 0-th camera’s rotation is fixed to identity and translation to zero vector, and the 0-th node becomes the root of the MST. The order of stereo calibration is selected by traversing MST in a breadth-first search, starting from the root. The total number of pairs (also the number of edges of the tree) is NUM_CAMERAS - 1, which is a property of a tree graph.
    2. Else if prior knowledge of camera pose is provided, this step can be skipped
  3. Global optimization. Given the initial estimate of extrinsics, the aim is to polish results using global optimization (via the Levenberq-Marquardt method, see cv::LevMarq class).
    • To reduce the total number of iterations, all rotation and translation vectors estimated in the first step from intrinsic calibration with the lowest error are transformed to be relative with respect to the root camera.
    • The total number of parameters is (NUM_CAMERAS - 1) x (3 + 3) + NUM_FRAMES x (3 + 3), where 3 stands for a rotation vector and 3 for a translation vector. The first part of the parameters is for extrinsics, and the second part is for rotation and translation vectors per frame. This can be seen from the illustrational plot in the introduction. For each frame, with the relative pose between cameras being fixed, no but one camera pose is needed to calculate the camera poses.
    • Robust function is additionally applied to mitigate the impact of outlier points during the optimization. The function has the shape of the derivative of Gaussian, or it is $x * exp(-x/s)$ (efficiently implemented by approximation of the exp), where x is a square pixel error, and s is manually pre-defined scale. The choice of this function is that it increases on the interval of 0 to y pixel error, and it decreases thereafter. The idea is that the function slightly decreases errors until it reaches y, and if the error is too high (more than y) then its robust value is limited to 0. The value of the scale factor was found by exhaustive evaluation that forces the robust function to almost linearly increase until the robust value of an error is 10 px and decreases afterward (see plot of the function below). The value itself is equal to 30 but could be modified in OpenCV source code.

Method Input

The high-level input of the proposed method is as follows:

  • Pattern (object) points: (NUM_FRAMES x) NUM_PATTERN_POINTS x 3. Points may contain a copy of pattern points along frames.
  • Image points: NUM_CAMERAS x NUM_FRAMES x NUM_PATTERN_POINTS x 2.
  • Image sizes: NUM_CAMERAS x 2 (width and height).
  • Detection mask: matrix of size NUM_CAMERAS x NUM_FRAMES that indicates whether pattern points are detected for specific camera and frame index.
  • Rs, Ts (Optional): (relative) rotations and translations with respect to camera 0. The number of vectors is NUM_CAMERAS-1, for the first camera rotation and translation vectors are zero.
  • Ks (optional): intrinsic matrices per camera.
  • Distortions (optional).
  • use_intrinsics_guess: indicates whether intrinsics are provided.
  • Flags_intrinsics: flag for intrinsics estimation.
  • use_extrinsic_guess: indicates whether extrinsics are provided.

Method Output

The high-level output of the proposed method is the following:

  • Rs, Ts: (relative) Rotation and translation vectors of extrinsics parameters with respect to camera 0. The number of vectors is NUM_CAMERAS-1, for the first camera rotation and translation vectors are zero.
  • Intrinsic matrix for each camera.
  • Distortion coefficients for each camera.
  • Rotation and translation vectors of each frame pattern with respect to camera 0. The combination of rotation and translation is able to transform the pattern points to the camera coordinate space, and hence with intrinsics parameters project 3D points to the image.
  • Matrix of reprojection errors of size NUM_CAMERAS x NUM_FRAMES
  • Output pairs used for initial estimation of extrinsics, the number of pairs is NUM_CAMERAS-1.

Pseudocode

The idea of the method could be demonstrated in high-level pseudocode whereas the whole C++ implementation of the proposed approach is implemented in the opencv/modules/calib/src/multiview_calibration.cpp file.

def mutiviewCalibration (pattern_points, image_points, detection_mask):
for cam_i = 1,…,NUMBER_CAMERAS:
if CALIBRATE_INTRINSICS:
K_i, distortion_i, rvecs_i, tvecs_i = calibrateCamera(pattern_points, image_points[cam_i])
else:
rvecs_i, tvecs_i = solvePnP(pattern_points, image_points[cam_i], K_i, distortion_i)
# Select best rvecs, tvecs based on reprojection errors. Process data:
if CALIBRATE_EXTRINSICS:
pattern_img_area[cam_i][frame] = area(convexHull(image_points[cam_i][frame]))
angle_to_board[cam_i][frame] = arccos(pattern_normal_frame * optical_axis_cam_i)
angle_cam_to_cam[cam_i][cam_j] = arccos(optical_axis_cam_i * optical_axis_cam_j)
graph = maximumSpanningTree(detection_mask, pattern_img_area, angle_to_board, angle_cam_to_cam)
camera_pairs = bread_first_search(graph, root_camera=0)
for pair in camera_pairs:
# find relative rotation, translation from camera i to j
R_ij, t_ij = registerCameras(pattern_points_i, pattern_points_j, image_points[i], image_points[j])
else:
pass
R*, t* = optimizeLevenbergMarquardt(R, t, pattern_points, image_points, K, distortion)

Python sample API

To run the calibration procedure in Python follow the following steps (see sample code in samples/python/multiview_calibration.py):

  1. Prepare data:

    if pattern_type.lower() == 'checkerboard' or pattern_type.lower() == 'charuco':
    pattern = chessboard_points(grid_size, dist_m)
    elif pattern_type.lower() == 'circles':
    pattern = circles_grid_points(grid_size, dist_m)
    elif pattern_type.lower() == 'acircles':
    pattern = asym_circles_grid_points(grid_size, dist_m)
    else:
    raise NotImplementedError("Pattern type is not implemented!")
    if pattern_type.lower() == 'charuco':
    assert (board_dict_path is not None) and os.path.exists(board_dict_path)
    board_dict = json.load(open(board_dict_path, 'r'))
    else:
    board_dict = None

    The detection mask matrix is later built by checking the size of image points after detection:

  2. Detect pattern points on images:

    if pattern_type.lower() == 'checkerboard':
    ret, corners = cv.findChessboardCorners(
    cv.cvtColor(img_detection, cv.COLOR_BGR2GRAY), grid_size, None
    )
    if ret:
    if scale < 1.0:
    corners /= scale
    corners2 = cv.cornerSubPix(cv.cvtColor(img, cv.COLOR_BGR2GRAY),
    corners, winsize, (-1,-1), criteria)
    elif pattern_type.lower() == 'circles':
    ret, corners = cv.findCirclesGrid(
    img_detection, patternSize=grid_size, flags=cv.CALIB_CB_SYMMETRIC_GRID
    )
    if ret:
    corners2 = corners / scale
    elif pattern_type.lower() == 'acircles':
    ret, corners = cv.findCirclesGrid(
    img_detection, patternSize=grid_size, flags=cv.CALIB_CB_ASYMMETRIC_GRID
    )
    if ret:
    corners2 = corners / scale
    elif pattern_type.lower() == 'charuco':
    dictionary = cv.aruco.getPredefinedDictionary(board_dict["dictionary"])
    size=(grid_size[0] + 1, grid_size[1] + 1),
    squareLength=board_dict["square_size"],
    markerLength=board_dict["marker_size"],
    dictionary=dictionary
    )
    # The found best practice is to refine detected Aruco marker with contour,
    # then refine subpix with the board functions
    detector_params = cv.aruco.DetectorParameters()
    charuco_params = cv.aruco.CharucoParameters()
    charuco_params.tryRefineMarkers = True
    detector_params.cornerRefinementMethod = cv.aruco.CORNER_REFINE_CONTOUR
    refine_params = cv.aruco.RefineParameters()
    detector = cv.aruco.CharucoDetector(board, charuco_params, detector_params, refine_params)
    charucoCorners, charucoIds, _, _ = detector.detectBoard(img_detection)
    corners = np.ones([grid_size[0] * grid_size[1], 1, 2]) * -1
    ret = (not charucoIds is None) and charucoIds.flatten().size > 3
    if ret:
    corners[charucoIds.flatten()] = cv.cornerSubPix(cv.cvtColor(img, cv.COLOR_BGR2GRAY),
    charucoCorners / scale, winsize, (-1,-1), criteria)
    corners2 = corners
    else:
    raise ValueError("Calibration pattern is not supported!")
  3. Build detection mask matrix:

    for i in range(len(image_points)):
    for j in range(len(image_points[0])):
    detection_mask[i,j] = int(len(image_points[i][j]) != 0)
  4. Finally, the calibration function is run as follows:

    rmse, Rs, Ts, Ks, distortions, rvecs0, tvecs0, errors_per_frame, output_pairs = \
    cv.calibrateMultiviewExtended(
    objPoints=pattern_points_all,
    imagePoints=image_points,
    imageSize=image_sizes,
    detectionMask=detection_mask,
    models=np.array(models, dtype=np.uint8),
    Rs=None,
    Ts=None,
    Ks=Ks,
    distortions=distortions,
    flagsForIntrinsics=np.array([pinhole_flag if models[x] == cv.CALIB_MODEL_PINHOLE else fisheye_flag for x in range(num_cameras)], dtype=int),
    flags = cv.CALIB_USE_INTRINSIC_GUESS if USE_INTRINSICS_GUESS else 0
    )

C++ sample API

To run the calibration procedure in C++ follow the following steps (see sample code in opencv/samples/cpp/multiview_calibration_sample.cpp):

  1. Prepare data similarly to Python sample, ie., pattern size and scale, fisheye camera mask, files containing image filenames, and pass them to function:

    static void detectPointsAndCalibrate (cv::Size pattern_size, float pattern_distance, const std::string &pattern_type,
    const std::vector<uchar> &models, const std::vector<std::string> &filenames,
    const cv::String* dict_path=nullptr)
  2. Initialize data:

    std::vector<cv::Point3f> board (pattern_size.area());
    const int num_cameras = (int)models.size();
    std::vector<std::vector<cv::Mat>> image_points_all;
    std::vector<cv::Size> image_sizes;
    std::vector<cv::Mat> Ks, distortions, Ts, Rs;
    if (pattern_type == "checkerboard" || pattern_type == "charuco") {
    for (int i = 0; i < pattern_size.height; i++) {
    for (int j = 0; j < pattern_size.width; j++) {
    board[i*pattern_size.width+j] = cv::Point3f((float)j, (float)i, 0.f) * pattern_distance;
    }
    }
    } else if (pattern_type == "circles") {
    for (int i = 0; i < pattern_size.height; i++) {
    for (int j = 0; j < pattern_size.width; j++) {
    board[i*pattern_size.width+j] = cv::Point3f((float)j, (float)i, 0.f) * pattern_distance;
    }
    }
    } else if (pattern_type == "acircles") {
    for (int i = 0; i < pattern_size.height; i++) {
    for (int j = 0; j < pattern_size.width; j++) {
    if (i % 2 == 1) {
    board[i*pattern_size.width+j] = cv::Point3f((j + .5f)*pattern_distance, (i/2 + .5f) * pattern_distance, 0.f);
    } else{
    board[i*pattern_size.width+j] = cv::Point3f(j*pattern_distance, (i/2)*pattern_distance, 0);
    }
    }
    }
    }
    else {
    CV_Error(cv::Error::StsNotImplemented, "pattern_type is not implemented!");
    }
  3. Set up ChArUco detector: optional, only needed if the pattern type is ChArUco

    if (pattern_type == "charuco") {
    CV_Assert(dict_path != nullptr);
    CV_Assert(fs.isOpened());
    int dict_int;
    double square_size, marker_size;
    fs["dictionary"] >> dict_int;
    fs["square_size"] >> square_size;
    fs["marker_size"] >> marker_size;
    auto dictionary = cv::aruco::getPredefinedDictionary(dict_int);
    // For charuco board, the size is defined to be the number of box (not inner corner)
    auto charuco_board = cv::aruco::CharucoBoard(
    cv::Size(pattern_size.width+1, pattern_size.height+1),
    static_cast<float>(square_size), static_cast<float>(marker_size), dictionary);
    // It is suggested to use refinement in detecting charuco board
    auto detector_params = cv::aruco::DetectorParameters();
    auto charuco_params = cv::aruco::CharucoParameters();
    charuco_params.tryRefineMarkers = true;
    detector_params.cornerRefinementMethod = cv::aruco::CORNER_REFINE_CONTOUR;
    detector = cv::makePtr<cv::aruco::CharucoDetector>(charuco_board, charuco_params, detector_params);
    }
  4. Detect pattern points on images:

    int num_frames = -1;
    for (const auto &filename : filenames) {
    std::fstream file(filename);
    CV_Assert(file.is_open());
    std::string img_file;
    std::vector<cv::Mat> image_points_cameras;
    bool save_img_size = true;
    while (std::getline(file, img_file)) {
    if (img_file.empty()){
    image_points_cameras.emplace_back(cv::Mat());
    continue;
    }
    cv::Mat img = cv::imread(img_file), corners;
    if (save_img_size) {
    image_sizes.emplace_back(cv::Size(img.cols, img.rows));
    save_img_size = false;
    }
    bool success = false;
    if (pattern_type == "checkerboard") {
    success = cv::findChessboardCorners(img, pattern_size, corners);
    }
    else if (pattern_type == "circles")
    {
    success = cv::findCirclesGrid(img, pattern_size, corners, cv::CALIB_CB_SYMMETRIC_GRID);
    }
    else if (pattern_type == "acircles")
    {
    success = cv::findCirclesGrid(img, pattern_size, corners, cv::CALIB_CB_ASYMMETRIC_GRID);
    }
    else if (pattern_type == "charuco")
    {
    std::vector<int> ids; cv::Mat corners_sub;
    detector->detectBoard(img, corners_sub, ids);
    corners.create(static_cast<int>(board.size()), 2, CV_32F);
    if (ids.size() < 4)
    success = false;
    else {
    success = true;
    int head = 0;
    for (int i = 0; i < static_cast<int>(board.size()); i++) {
    if (head < static_cast<int>(ids.size()) && ids[head] == i) {
    corners.at<float>(i, 0) = corners_sub.at<float>(head, 0);
    corners.at<float>(i, 1) = corners_sub.at<float>(head, 1);
    head++;
    } else {
    // points outside of frame border are dropped by calibrateMultiview
    corners.at<float>(i, 0) = -1.;
    corners.at<float>(i, 1) = -1.;
    }
    }
    }
    }
    cv::Mat corners2;
    corners.convertTo(corners2, CV_32FC2);
    if (success && corners.rows == pattern_size.area())
    image_points_cameras.emplace_back(corners2);
    else
    image_points_cameras.emplace_back(cv::Mat());
    }
    if (num_frames == -1)
    num_frames = (int)image_points_cameras.size();
    else
    CV_Assert(num_frames == (int)image_points_cameras.size());
    image_points_all.emplace_back(image_points_cameras);
    }
  5. Build detection mask matrix:

    cv::Mat visibility(num_cameras, num_frames, CV_8UC1);
    for (int i = 0; i < num_cameras; i++) {
    for (int j = 0; j < num_frames; j++) {
    visibility.at<unsigned char>(i,j) = image_points_all[i][j].empty() ? 0 : 1;
    }
    }
  6. Run calibration:

    const double rmse = calibrateMultiview(objPoints, image_points_all, image_sizes, visibility,
    models, Ks, distortions, Rs, Ts);

Practical Debugging Techniques

  1. Intrinsics calibration
    1. Choose the most suitable flags to perform calibration. For example, when the distortion of the pinhole camera model is not evident, it may not be necessary to use the cv::CALIB_RATIONAL_MODEL. For the fisheye camera model, it is recommended to use cv::CALIB_RECOMPUTE_EXTRINSIC and cv::CALIB_FIX_SKEW.
    2. Camera intrinsics can be better estimated when points are more scattered in the image. The following code can be used to plot out the heat map of the observed point

      def plotDetection(image_sizes, image_points):
      num_cameras = len(image_sizes)
      num_frames = len(image_points[0])
      for c in range(num_cameras):
      w, h = image_sizes[c]
      w = int(w / 10) + 1
      h = int(h / 10) + 1
      counts = np.zeros([h, w], dtype=np.int32)
      for f in range(num_frames):
      if len(image_points[c][f]):
      pos = np.floor(image_points[c][f] / 10).astype(np.int32)
      counts[pos[:,1], pos[:,0]] += 1
      vmax = np.max(counts)
      plt.figure()
      plt.imshow(counts, cmap='hot', interpolation='nearest',vmax=vmax)
      # Adding colorbar for reference
      plt.colorbar()
      plt.axis("off")
      savefile = "counts" + str(c) + ".png"
      print("Saving: " + savefile)
      plt.savefig(savefile, dpi=300, bbox_inches='tight')
      plt.close()

      The left example is not well scattered while the right example shows a better-scattered pattern

    3. Plot out the reprojection error to ensure the result is reasonable
    4. If ground truth camera intrinsics are available, a visualization of the estimated error on intrinsics is provided.

      x = np.linspace(0, width - 1, width)
      y = np.linspace(0, height - 1, height)
      # Generate the grid using np.meshgrid
      X, Y = np.meshgrid(x, y)
      points = np.concatenate([X[:,:,None], Y[:,:,None]], axis=2).reshape([-1, 1, 2])
      # Undistort the image points with the estimated distortions
      if models[cam] == cv.CALIB_MODEL_FISHEYE:
      points_undist = cv.fisheye.undistortPoints(points, Ks[cam],distortions[cam])
      else:
      points_undist = cv.undistortPoints(points, Ks[cam], distortions[cam])
      pt_norm = np.concatenate([points_undist, np.ones([points_undist.shape[0], 1, 1])], axis=2)
      # Distort the image points with the ground truth distortions
      if models[cam] == cv.CALIB_MODEL_FISHEYE:
      projected = cv.fisheye.projectPoints(pt_norm, np.zeros([3, 1]), np.zeros([3, 1]), Ks_gt[cam], distortions_gt[cam])[0]
      else:
      projected = cv.projectPoints(pt_norm, np.zeros([3, 1]), np.zeros([3, 1]), Ks_gt[cam], distortions_gt[cam])[0]
      errs_pt = np.linalg.norm(projected - points, axis=2)
      errs_pt = errs_pt.reshape([height, width])
      vmax = np.percentile(errs_pt, 95)
      plt.figure()
      plt.imshow(errs_pt, cmap='hot', interpolation='nearest',vmax=vmax)
      # Adding colorbar for reference
      plt.colorbar()
      savefile = "errors" + str(cam) + ".png"
      print("Saving: " + savefile)
      plt.savefig(savefile,dpi=300, bbox_inches='tight')

      resulting visualization would look similar to

  1. Multiview calibration
    1. Use plotCamerasPosition in samples/python/multiview_calibration.py to plot out the graph established for multiview calibration. shows positions of cameras, checkerboard (of a random frame), and pairs of cameras connected by black lines explicitly demonstrating tuples that were used in the initial stage of stereo calibration. The dashed gray lines demonstrate the non-spanning tree edges that are also used in the optimization. The width of these lines indicates the number of co-visible frames i.e. the strength of connection. It is more desired if the edges in the graph are dense and thick. For the right tree, the connection for camera four is rather limited and can be strengthened
    2. Visulization method for showing the reprojection error with arrows (from a given point to the back-projected one) is provided (see plotProjection in samples/python/multiview_calibration.py). The color of the arrows highlights the error values. Additionally, the title reports mean error on this frame and its accuracy among other frames used in calibration.