OpenCV
4.5.3
Open Source Computer Vision

This article describes an ICP algorithm used in depth fusion pipelines such as KinectFusion.
The goal of ICP is to align two point clouds, the old one (the existing points and normals in 3D model) and new one (new points and normals, what we want to integrate to the exising model). ICP returns rotation+translation transform between these two point clouds.
The Iterative Closest Point (ICP) minimizes the objective function which is the Point to Plane Distance (PPD) between the corresponding points in two point clouds:
Specifically, for each corresponding points P and Q, it is the distance from the point P to the plane determined by the point Q and the normal N located in the point Q. Two points P and Q are considered correspondent if given current camera pose they are projected in the same pixel.
p  i'th point in the new point cloud
q  i'th point in the old point cloud
n  normal in the point q in the old point cloud
Therefore, ppd(...) can be expressed as the dot product of (difference between p and q) and (n):
T(p) is a rigid transform of point p:
where R  rotation, t  translation.
T is the transform we search by ICP, its purpose is to bring each point p closer to the corresponding point q in terms of point to plane distance.
We use the GaussNewton method for the function minimization.
In GaussNewton method we do sequential steps by changing R and t in the direction of the function E decrease, i.e. in the direction of its gradient:
Let's approximate it in infinitesimal neighborhood.
Here's a formula we're going to minimize by changing R and t:
While the point to plane distance is linear to both R and t, the rotation space is not linear by itself. You can see this in how R is generated from its rotation angles:
But since we have infinitesimal rotations, R can be approximated in the following form:
where I  unit matrix, A  member of the threedimensional special orthogonal group so(3).
By approaching all sin(t) and cos(t) terms to their limits where t –> 0 we get the following representation:
Substituting the approximation of R back into E expression, we get:
Let's introduce a function f which approximates transform shift:
E_approx is minimal when its differential (i.e. derivative by argument increase) is zero, so let's find that differential:
Let's differentiate ppd:
Here's what we get for all variables x_j from vector x:
Let new variable:
f(x, p) can be represented as a matrixvector multiplication. To prove that, we have to remember that
:
G(p) is introduced for simplification.
Since
we get
.
Let a new value:
Let's make a replacement:
By solving this equation we get rigid transform shift for each GaussNewton iteration.
We generate rotation and translation matrix from the shift and then multiply the current pose matrix by the one we've got.
While the translational part of the shift contributes to the resulting matrix asis, the rotational part is generated a bit trickier. The rotation shift is converted from so(3) to SO(3) by exponentiation. In fact, the 3by1 rshift vector represents rotation axis multiplied by the rotation angle. We use Rodrigues transform to get rotation matrix from that. For more details, see wiki page.