OpenCV
3.0.0
Open Source Computer Vision
|
OCRTesseract class provides an interface with the tesseract-ocr API (v3.02.02) in C++. More...
#include "ocr.hpp"
Public Member Functions | |
virtual void | run (Mat &image, std::string &output_text, std::vector< Rect > *component_rects=NULL, std::vector< std::string > *component_texts=NULL, std::vector< float > *component_confidences=NULL, int component_level=0) |
Recognize text using the tesseract-ocr API. More... | |
Public Member Functions inherited from cv::text::BaseOCR | |
virtual | ~BaseOCR () |
Static Public Member Functions | |
static Ptr< OCRTesseract > | create (const char *datapath=NULL, const char *language=NULL, const char *char_whitelist=NULL, int oem=3, int psmode=3) |
Creates an instance of the OCRTesseract class. Initializes Tesseract. More... | |
OCRTesseract class provides an interface with the tesseract-ocr API (v3.02.02) in C++.
Notice that it is compiled only when tesseract-ocr is correctly installed.
|
static |
Creates an instance of the OCRTesseract class. Initializes Tesseract.
datapath | the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory. |
language | an ISO 639-3 code or NULL will default to "eng". |
char_whitelist | specifies the list of characters used for recognition. NULL defaults to "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ". |
oem | tesseract-ocr offers different OCR Engine Modes (OEM), by deffault tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. |
psmode | tesseract-ocr offers different Page Segmentation Modes (PSM) tesseract::PSM_AUTO (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values. |
|
virtual |
Recognize text using the tesseract-ocr API.
Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.
image | Input image CV_8UC1 or CV_8UC3 |
output_text | Output text of the tesseract-ocr. |
component_rects | If provided the method will output a list of Rects for the individual text elements found (e.g. words or text lines). |
component_texts | If provided the method will output a list of text strings for the recognition of individual text elements found (e.g. words or text lines). |
component_confidences | If provided the method will output a list of confidence values for the recognition of individual text elements found (e.g. words or text lines). |
component_level | OCR_LEVEL_WORD (by default), or OCR_LEVEL_TEXT_LINE. |
Implements cv::text::BaseOCR.