Difference between revisions of "Upgrade Tesseract OCR"

From LogicalDOC Community Wiki
Jump to navigationJump to search
(Created page with "Starting from LogicalDOC 8.3.4 TESTS WERE CARRIED OUT ON A NEW VERSION of the Tesseract integrated OCR. More precisely, tests were conducted on version 4.1 of Tesseract. This...")
 
Line 22: Line 22:
 
Note: in the latest versions of Ubuntu this is already available also for many versions of Linux Debian and CentOS it is possible to use packages already available. <br>
 
Note: in the latest versions of Ubuntu this is already available also for many versions of Linux Debian and CentOS it is possible to use packages already available. <br>
 
For more information: https://github.com/tesseract-ocr/tesseract/wiki</li>
 
For more information: https://github.com/tesseract-ocr/tesseract/wiki</li>
<li>check the configuration of the OCR in LogicalDOC checking that it points to the correct path</li>
+
<li>check the configuration of Tesseract OCR in LogicalDOC b y verifiying that it points to the path of tesseract command</li>
 
</ol>
 
</ol>

Revision as of 08:36, 27 April 2020

Starting from LogicalDOC 8.3.4 TESTS WERE CARRIED OUT ON A NEW VERSION of the Tesseract integrated OCR. More precisely, tests were conducted on version 4.1 of Tesseract. This new version is much more precise in text recognition and also faster, so with a very simple action you will get 2 important benefits: a faster OCR recognition that uses less system resources and above all better quality in character recognition. Note: the version of Tesseract 4.1 that we propose to install is perfectly compatible with LogicalDOC starting from LD 6.8.4 Starting from LogicalDOC 8.4.1 this is the version that is distributed by default, so if you have installed your system in version 8.4.1 or 8.4.2 you don't need to upgrade

Windows systems

The change is very simple, let's talk about a simple replacement

  1. Rename the tesseract folder present in our installation to tesseractOLD
    eg: C:\LogicalDOC\tesseract will become C:\LogicalDOC\tesseractOLD
  2. download the file to the following address
  3. extract the contents of the archive into the folder C:\LogicalDOC


Linux systems

  1. remove tesseract if previously installed
  2. check the availability of tesseract 4.1 and install it
    Note: in the latest versions of Ubuntu this is already available also for many versions of Linux Debian and CentOS it is possible to use packages already available.
    For more information: https://github.com/tesseract-ocr/tesseract/wiki
  3. check the configuration of Tesseract OCR in LogicalDOC b y verifiying that it points to the path of tesseract command