OpenCV  4.0.0-rc
Open Source Computer Vision
Classes
3D object recognition and pose estimation API

Classes

class  cv::cnn_3dobj::descriptorExtractor
 Caffe based 3D images descriptor. A class to extract features from an image. The so obtained descriptors can be used for classification and pose estimation goals [218]. More...
 
class  cv::cnn_3dobj::icoSphere
 Icosohedron based camera view data generator. The class create some sphere views of camera towards a 3D object meshed from .ply files [89] . More...
 

Detailed Description

As CNN based learning algorithm shows better performance on the classification issues, the rich labeled data could be more useful in the training stage. 3D object classification and pose estimation is a jointed mission aimming at seperate different posed apart in the descriptor form.

In the training stage, we prepare 2D training images generated from our module with their class label and pose label. We fully exploit the information lies in their labels by using a triplet and pair-wise jointed loss function in CNN training.

As CNN based learning algorithm shows better performance on the classification issues, the rich labeled data could be more useful in the training stage. 3D object classification and pose estimation is a jointed mission aiming at separate different posea apart in the descriptor form.

In the training stage, we prepare 2D training images generated from our module with their class label and pose label. We fully exploit the information that lies in their labels by using a triplet and pair-wise jointed loss function in CNN training.

Both class and pose label are in consideration in the triplet loss. The loss score will be smaller when features from the same class and same pose is more similar and features from different classes or different poses will lead to a much larger loss score.

This loss is also jointed with a pair wise component to make sure the loss is never be zero and have a restriction on the model scale.

About the training and feature extraction process, it is a rough implementation by using OpenCV and Caffe from the idea of Paul Wohlhart. The principal purpose of this API is constructing a well labeled database from .ply models for CNN training with triplet loss and extracting features with the constructed model for prediction or other purpose of pattern recognition, algorithms into two main Class:

icoSphere: methods belonging to this class generates 2D images from a 3D model, together with their class and pose from camera view labels.

descriptorExtractor: methods belonging to this class extract descriptors from 2D images which is discriminant on category prediction and pose estimation.

Note
This API need Caffe with triplet version which is designed for this module https://github.com/Wangyida/caffe/tree/cnn_triplet.