Category: AutoOCR

AutoOCR-CS-CL – Command line application for AutoOCR via Web-Service

AutoOCR-CS-CL is a command line add-on application available free of charge for the AutoOCR server. AutoOCR-CS-CL enables the conversion of image PDF, TIF/TIFF, JPG/JPEG, PNG, BMP, GIF files into searchable PDF or PDF/A files. Communication with the AutoOCR server is done with http/https via the AutoOCR SOAP web service interface. The AutoOCR server can be addressed locally, in the same network or via an Internet connection.

Features AutoOCR-CS-CL:

  • Free command line application for AutoOCR to generate searchable PDF(/A) via OCR from Image-PDF, TIF/TIFF, JPG/JPEG, PNG, BMP, GIF.
  • Processing takes place via SOAP web service on a (remote) AutoOCR server via http/https.
  • Processes individual files, entire folders / folder structures as well as lists of files / folders from TXT files.
  • Selection of the processing parameters by specifying an OCR profile stored on the AutoOCR server.
  • Parallel multiple upload / download to AutoOCR server configurable for optimal throughput.

   

Download – AutoOCR-CS-CL –  Command line application for AutoOCR via Web-Service >>>
Download – Readme / Help – AutoOCR-CS-CL  >>>

OmniPage OCR engine for AutoOCR & AutoOCRLight from 2.0.7

Benefits of OmniPage OCR:

  • Recognition accuracy at the highest level, even for difficult documents
  • Fastest OCR processing – much faster and more powerful than anything we’ve tested and implemented so far. 1-2 seconds to create a searchable PDF per page are possible.
  • Affordable – 25,000 pages license with lower licensing costs than the previous 10,000-page Abbyy license
  • Easier activation of the (demo) license – The OmniPage OCR engine can be activated via our license server including a 30-day demo version together with the basic application.

Please note that for licensing reasons the OmniPage OCR Engine can only be installed on client operating systems – Windows 7/10 but not on Microsoft Server 2008, 2012, 2016 or 2019. The setup can only be performed on Windows 7/10. In terms of performance and stability, this is not a disadvantage. OCR processes are performance-intensive and should be executed on an own hardware (eg Intel NUC) with as many CPU cores and SSD disk as possible for optimal throughput.

The OmniPage OCR engine can be activated and is included in AutoOCR setup for AutoOCR or AutoOCRLight from version 2.0.7 as an option in addition to IOCR (Tesseract OCR). For AutoOCRLight, the OmniPage OCR can be separately downloaded and installed.

Download – OmniPage OCR Engine as Option for AutoOCRLight (ca. 235MB) >>>

Download – AutoOCRLight – Low Cost OCR Server (ca. 410MB) >>>

Download – AutoOCR – OCR Server incl. OmniPage OCR (ca. 640MB) >>>

AutoOCR / AutoOCR light 2.0 – iOCR innovations

Innovations AutoOCR / AutoOCR light Version 2.0:

  • The iOCR Standard OCR component of AutoOCR / AutoOCR light is now based on the new Tesseract OCR Version 4.0.
  • Multiple languages can be selected for OCR recognition.
  • Additional option to select OCR accuracy / speed.

  • Configurable parameter for splitting documents into smaller single documents to handle large page number documents with small / limited memory resources.

  • Conditional OCR processing – via file format, page number, page format – width / height in mm or pixels, resolution, file size and color depth – configurable for folder monitoring and web service processing. This can be controlled by file format per criteria – whether an OCR processing or conversion to a PDF image or whether the OCR processing of such files should be blocked. This can be prevented that the OCR processing is blocked by “meaningless” processing. for example if large JPEG photos get into the OCR processing process and handwriting recognition makes no sense.

  

Further information on the AutoOCR / AutoOCR light – iOCR extensions can be found here >>>

Download – AutoOCR – OCR Server (ca. 410MB) >>>

Download – AutoOCRLight – Low Cost OCR Server (ca. 410MB) >>>
For the update of the AutoOCR light version 1.x to 2.x a new license is required.

Folder Monitoring – “File System Event” / “Blockwise Processing”

An application to monitor folders. For example AutoOCR / AutoOCRlight and so on, there are options that determine how the files are recognized for processing from the folders and when their processing is started.

File-System Event:

An operating system function is used to detect changes to files as well as new files in a folder / folder structure and to start processing immediately. This option should only be used for local folder / folder structures, but not for processing network shares.

Blockwise Processing:

The folder is read in “block by block”. That It always blocks in the set maximum number of files are read in and processed. After processing a “block” the next “block” starts etc. until all files have been processed. If no further files are found then the folder will be every 10msec. asked for new files. The “blockwise processing” should be used for folder monitoring of network drives.

Start of processing / delayed Start:

The processing of a newly detected file usually starts immediately, but there may also be reasons to delay the processing. For each watched folder, there is a setting to delay the start of processing by x seconds. First, the set time is waited in seconds and then processing is started – see also >>> – After the delay, the files to be processed are checked whether they are already free and not read-only. Here again max. 10sec. waited for the release of the file.

AutoOCR Version 1.17.2 – New function – delete empty pages

With AutoOCR version 1.17.2 there is an option to delete empty pages before OCR processing. The detection of a page as a “Blank” via a set threshold. The default value is 1%. A page is recognized as “empty” if less than 1% of the pixels of a page are “not white”. This value must be adjusted if necessary to be processed scans, as it may be when scanning with impurities also that a blank page having “more pixels”, and certain pages are then not detected as empty. However, if the threshold is set too high, it may be that pages with little content are also recognized as empty and thus deleted.

Download – AutoOCR – OCR Server (ca. 140MB) >>>

AutoOCR Version 1.17.2 – New iOCR Option – Do not or only partially embed fonts

Until now the used font was completely embedded in the created PDF’s. Therefore, especially with input files with one page always quite large PDF output files were generated. However, since the PDFs generated by AutoOCR use an image for display in the foreground and do not require fonts to display, we have changed that. By default, iOCR does not embed PDF fonts. There is the option to embed only the used part of the fonts. Thus, especially for documents that consist of only one or a few pages significantly smaller PDF files are created without embedded fonts.

Download – AutoOCR – OCR Server (ca. 140MB) >>>

AutoOCR Version 1.16.1 – New Option – Delayed Start of the Processing

An option was implemented in version 1.16.1, which made it possible to start the processing per monitored folder with a delay. This option is needed especially for multifunction devices which scan or copy a PDF or an image file directly into an AutoOCR monitored folder.

Certain multifunction devices create a file with 0-Byte, right at the start of a scan process, and either ‘fill’ it step by step with data or collect the scans locally on the device, to copy the finished complete file into the destination directory in the end. This process can take, depending on data volume, number of pages or speed of the data connection, between a few seconds and up to ten or more minutes.

So far, if the start delay is not activated (parameter = 0), AutoOCR starts the processing right away, when a file is created. If the file is not completely created or it is not ready for the processing, it will, within short intervals and with every access or process trial by AutoOCR, result in an error message in the log or in an error email. It can also lead to internal crashes of the OCR processing, which in turn triggers error messages, processing repeats and moves the input file to the error folder.

With this parameter it is possible to configure per folder, by how many seconds (0 to 999) the start of the processing should be delayed. A value has to be found, which complies with the requirements.

Download – AutoOCR – OCR Server (ca. 140MB) >>>

iOCR / vsOCR Setup divided into standard and additional languages

The iOCR / vsOCR setup containing the language and dictionary files of our standard OCR engine is more than 270MB in size. In order to make the downloads and the setups smaller, we decided to split the iOCR / vsOCR into a “base” and an “additional setup”. The basic setup, which is available through our applications, eg. AutoOCR, FileConverterPro, or PDFmdx now only contains a selection of major European languages and has been reduced to 127MB.

If all available languages are to be installed, this is possible at any time. The additionally available “exotic languages” can be installed via a separate setup.

iOCR Basic-languages:

Danish, German, English, Finnish, French, Italian, Catalan, New Greek, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Swedish, Slovakian, Slovenian, Spanish, Czech, Turkish, Ukrainian, Hungarian

iOCR extende languages:

Afrikaanis, Albanian, Arabic, Azerbaijani, Bahasa Indonesian, Bengali, Bulgarian, Cherokee, Chinese – Traditional, Chinese – Simplified, Estonian, Franconian, Gallic, Hebrew, Hindi, Icelandic, Japanese, Korean, Croatian, Latvian, Lithuanian, Macedonian, Malay , Serbian, Swahili, Tagalog, Tamil, Telugu, Thai, Vietnamese, Belarusian

Download – iOCR (vsOCR) Setup – Basis Sprachen (ca. 127MB) >>>

Download – iOCR (vsOCR) Setup – zusätzliche Sprachen (ca. 200MB) >>>

 

AutoOCR – Installation requirements from version 1.15.3

With the AutoOCR installation from version 1.15.3, modified installation prerequisites are checked – if these are fulfilled, these installation steps are skipped by the setup and are not executed.

The following components are checked and, if necessary, post-installed:

If these components are already installed, they are not reloaded and only AutoOCR is installed. If all or individual components are not available or not in the appropriate version, the AutoOCR Setup tries to download them from our FTP server and install them. If an installation is to be made without an Internet connection, the setups of these components should be downloaded and installed beforehand.

The AutoOCR settings and the license are retained when uninstalling / updating the new version.

AutoOCR can be operated with one or more different OCR engines. The iOCR (vsOCR) processing is standard.

Download – AutoOCR – OCR Server (ca. 10MB) >>>

In addition, or even only, the Abbyy OCR for AutoOCR can be used as an option. However, an additional Abbyy setup must be downloaded and installedFor the Abbyy OCR Engine version 10 demo licenses are available for 30 days or 500 pages – which you can request from us.

If only the Abbyy OCR engine is to be used, the download and installation of iOCR can be skipped during setup.

Setup Option - iOCR herunterladen und installieren

Download – Abbyy FineReader 10.x Rel 4 OCR Engine Setup (ca. 460MB) >>>