Class OCRTesseract


  • public class OCRTesseract
    extends BaseOCR
    OCRTesseract class provides an interface with the tesseract-ocr API (v3.02.02) in C++. Notice that it is compiled only when tesseract-ocr is correctly installed. Note:
    • (C++) An example of OCRTesseract recognition combined with scene text detection can be found at the end_to_end_recognition demo: <https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp>
      • (C++) Another example of OCRTesseract recognition combined with scene text detection can be found at the webcam_demo: <https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/webcam_demo.cpp>
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected OCRTesseract​(long addr)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static OCRTesseract __fromPtr__​(long addr)  
      static OCRTesseract create()
      Creates an instance of the OCRTesseract class.
      static OCRTesseract create​(java.lang.String datapath)
      Creates an instance of the OCRTesseract class.
      static OCRTesseract create​(java.lang.String datapath, java.lang.String language)
      Creates an instance of the OCRTesseract class.
      static OCRTesseract create​(java.lang.String datapath, java.lang.String language, java.lang.String char_whitelist)
      Creates an instance of the OCRTesseract class.
      static OCRTesseract create​(java.lang.String datapath, java.lang.String language, java.lang.String char_whitelist, int oem)
      Creates an instance of the OCRTesseract class.
      static OCRTesseract create​(java.lang.String datapath, java.lang.String language, java.lang.String char_whitelist, int oem, int psmode)
      Creates an instance of the OCRTesseract class.
      protected void finalize()  
      java.lang.String run​(Mat image, int min_confidence)
      Recognize text using the tesseract-ocr API.
      java.lang.String run​(Mat image, int min_confidence, int component_level)
      Recognize text using the tesseract-ocr API.
      java.lang.String run​(Mat image, Mat mask, int min_confidence)  
      java.lang.String run​(Mat image, Mat mask, int min_confidence, int component_level)  
      void setWhiteList​(java.lang.String char_whitelist)  
      • Methods inherited from class java.lang.Object

        clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • OCRTesseract

        protected OCRTesseract​(long addr)
    • Method Detail

      • __fromPtr__

        public static OCRTesseract __fromPtr__​(long addr)
      • run

        public java.lang.String run​(Mat image,
                                    int min_confidence,
                                    int component_level)
        Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.
        Parameters:
        image - Input image CV_8UC1 or CV_8UC3 text elements found (e.g. words or text lines). recognition of individual text elements found (e.g. words or text lines). for the recognition of individual text elements found (e.g. words or text lines).
        component_level - OCR_LEVEL_WORD (by default), or OCR_LEVEL_TEXTLINE.
        min_confidence - automatically generated
        Returns:
        automatically generated
      • run

        public java.lang.String run​(Mat image,
                                    int min_confidence)
        Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.
        Parameters:
        image - Input image CV_8UC1 or CV_8UC3 text elements found (e.g. words or text lines). recognition of individual text elements found (e.g. words or text lines). for the recognition of individual text elements found (e.g. words or text lines).
        min_confidence - automatically generated
        Returns:
        automatically generated
      • run

        public java.lang.String run​(Mat image,
                                    Mat mask,
                                    int min_confidence,
                                    int component_level)
      • run

        public java.lang.String run​(Mat image,
                                    Mat mask,
                                    int min_confidence)
      • setWhiteList

        public void setWhiteList​(java.lang.String char_whitelist)
      • create

        public static OCRTesseract create​(java.lang.String datapath,
                                          java.lang.String language,
                                          java.lang.String char_whitelist,
                                          int oem,
                                          int psmode)
        Creates an instance of the OCRTesseract class. Initializes Tesseract.
        Parameters:
        datapath - the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.
        language - an ISO 639-3 code or NULL will default to "eng".
        char_whitelist - specifies the list of characters used for recognition. NULL defaults to "" (All characters will be used for recognition).
        oem - tesseract-ocr offers different OCR Engine Modes (OEM), by default tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values.
        psmode - tesseract-ocr offers different Page Segmentation Modes (PSM) tesseract::PSM_AUTO (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values. Note: The char_whitelist default is changed after OpenCV 4.7.0/3.19.0 from "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" to "".
        Returns:
        automatically generated
      • create

        public static OCRTesseract create​(java.lang.String datapath,
                                          java.lang.String language,
                                          java.lang.String char_whitelist,
                                          int oem)
        Creates an instance of the OCRTesseract class. Initializes Tesseract.
        Parameters:
        datapath - the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.
        language - an ISO 639-3 code or NULL will default to "eng".
        char_whitelist - specifies the list of characters used for recognition. NULL defaults to "" (All characters will be used for recognition).
        oem - tesseract-ocr offers different OCR Engine Modes (OEM), by default tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values. Note: The char_whitelist default is changed after OpenCV 4.7.0/3.19.0 from "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" to "".
        Returns:
        automatically generated
      • create

        public static OCRTesseract create​(java.lang.String datapath,
                                          java.lang.String language,
                                          java.lang.String char_whitelist)
        Creates an instance of the OCRTesseract class. Initializes Tesseract.
        Parameters:
        datapath - the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.
        language - an ISO 639-3 code or NULL will default to "eng".
        char_whitelist - specifies the list of characters used for recognition. NULL defaults to "" (All characters will be used for recognition). tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values. Note: The char_whitelist default is changed after OpenCV 4.7.0/3.19.0 from "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" to "".
        Returns:
        automatically generated
      • create

        public static OCRTesseract create​(java.lang.String datapath,
                                          java.lang.String language)
        Creates an instance of the OCRTesseract class. Initializes Tesseract.
        Parameters:
        datapath - the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.
        language - an ISO 639-3 code or NULL will default to "eng". (All characters will be used for recognition). tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values. Note: The char_whitelist default is changed after OpenCV 4.7.0/3.19.0 from "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" to "".
        Returns:
        automatically generated
      • create

        public static OCRTesseract create​(java.lang.String datapath)
        Creates an instance of the OCRTesseract class. Initializes Tesseract.
        Parameters:
        datapath - the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory. (All characters will be used for recognition). tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values. Note: The char_whitelist default is changed after OpenCV 4.7.0/3.19.0 from "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" to "".
        Returns:
        automatically generated
      • create

        public static OCRTesseract create()
        Creates an instance of the OCRTesseract class. Initializes Tesseract. system's default directory. (All characters will be used for recognition). tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values. Note: The char_whitelist default is changed after OpenCV 4.7.0/3.19.0 from "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" to "".
        Returns:
        automatically generated
      • finalize

        protected void finalize()
                         throws java.lang.Throwable
        Overrides:
        finalize in class BaseOCR
        Throws:
        java.lang.Throwable