AutoOCRLight – PDF News – PDF/A, Archivierung, OCR, DMS, Dokumentenmanagment, Scan to PDF, ECM, PDF Convert, Free PDF printerdriver, freier PDF Druckertreiber, SDK, API, PDF softwaredevelopment

Category: AutoOCRLight

AutoOCR & AutoOCR light Version 2.0.41

2021-11-22

Innovations AutoOCR & AutoOCR light 2.0.41:

Target file name / folder via variables: Some variables can now be used for the output file name and for the output folder. The configuration takes place in the field for output file names. By using “\”, a folder structure can be specified in the field, which will be created under the selected output start folder.

Eliminating PDF that already contain text: The “intelligent” OCR processing can recognize whether a PDF requires OCR processing or not, but all PDFs from the monitored input folders are always processed and output in the target folder. However, if you only want to output PDFs that really require OCR processing, this distinction was not previously possible. With this new option, only those PDFs that have actually been OCR processed are output in the output folder. All other PDFs are, depending on the configuration, moved from the input folder, e.g .: directly to the archive folder and therefore do not end up in the output folder.

Detect and correct defective text in PDF: Sometimes PDF contains text, but it is “defective”. The problem lies in the incorrect creation of the PDF. Texts / fonts are coded incorrectly or incompletely. The problem often occurs if an existing PDF is printed out again from a display program using a PDF printer driver in order to generate a PDF from it again.

In this case you can mark and copy the text in the PDF, but the extracted text cannot be used and only contains special characters and hieroglyphs. Such PDFs cannot be processed further in a meaningful way. No information can be obtained from the PDF, the PDF cannot be searched and the document cannot be found using full-text search or search engines. This cannot be seen from the outside. The PDF can be opened, viewed and printed out without any error messages.

The only way to restore such PDF and encode the text correctly is through OCR. The PDF or only the affected page is “rendered” and the text is regenerated using the OCR processing.

AutoOCR Version Version 2.0.41 offers this possibility for both the iOCR and the OmniPage OCR engine. It can be found out for each page of the PDF whether it contains “defective” text or not. If such a page is recognized, the text is regenerated using the OCR function; pages with correct text are not subjected to any further OCR processing.

Download – AutoOCR – OCR Server incl. OmniPage OCR (ca. 640MB) >>>
Download – AutoOCR light – Low Cost OCR Server (ca. 410MB) >>>
Download – iOCR (vsOCR) Setup – additional languages (ca. 1200MB) >>>

AutoOCR & AutoOCR light Version 2.0.36

2021-03-30

Innovations AutoOCR & AutoOCR light 2.0.36:

iOCR – obtain images unchanged:If this new standard option is active, the PDF is only rendered internally for the OCR process. The original PDF and the images it contains are transferred 1: 1 to the PDF to be generated. The OCR process only inserts the recognized text. The images remain unchanged in terms of their resolution, color depth and compression.
This is important because many MFP scanners are already able to generate highly optimized and very compact PDF color scans via MRC (Mixed Raster Content). Color documents are already divided into different levels by the scanner. Each image layer is created with a different resolution and the best possible compression depending on the color depth. If such MRC PDF files are rendered again, the MRC data structure is lost. The result file would be larger and lose quality.

Installs and uses the iOCR / vsOCR – Version 1.1.6 with the basic set of the most common languages in Europe.

All languages supported by iOCR can be installed via an additional iOCR / vsOCR setup (1.2GB in size).

Update of the AutoOCR basic component – iOCR, ImageProcessing, PDFCompressor, PDF2PDFA to the current status.

New presettings for JPEG2000 compression – for PDF rendering / ImageProcessing / PDFCompressor – in order to generate the smallest possible PDF files without great loss of display quality.

- High / Medium / Low – Compression
- Color – 1:60 / 1:30 / 1:15
- Greyscales – 1:30 / 1:15 / 1:13

The higher the value – the higher the compression, which means that the smaller files are created, but with a decreasing image quality. “Medium” compression is preset as the standard value.

AutoOCR Version 2.0.30 – JPEG2000 compression creates compact PDF files

2020-10-22

Color scans usually produce quite large files. At 300dpi, color, JPEG compression requires approx. 300kB of storage space per page. In order to create the smallest, compact PDF output files possible, the JPEG2000 compression for AutoOCR / iOCR has been improved and an additional parameter has been added. This JPEG2000 compression allows the size of the color images contained in the PDF to be reduced considerably, making the searchable PDF files considerably smaller. The JPEG2000 compression has no influence on the OCR recognition rate.

With JPEG2000, both “lossless” and “lossy” compression are available. Normally one should use the “lossy” (lossy) JPEG2000 compression to create small files – there is an additional parameter (ratio: 1 to 999) with which the compression rate and thus the size and visual quality can be controlled.

In the following table a test was made with different settings for the JPEG / JPEG2000 compression to see what effects these parameters have on the PDF file size. A scan, 300 dpi, 24 bit color, JPEG compression, 7 pages with 2082 kB was used as the initial file.

This shows that with JPEG2000, depending on the parameters, you can achieve a file size reduction of between 30 and 80%.

JPEG2000 / lossy / 75-100 = high quality / larger files – 32-49% reduction
JPEG2000 / lossy / 125-150 = medium quality / medium file size – 59-65% reduction
JPEG2000 / lossy / 200 – 250 = low quality / small files- 74-79% reduction

Download – AutoOCR – OCR Server inkl. OmniPage OCR (ca. 640MB) >>>

AutoOCR – General overview – Product video

2020-01-29

New video for AutoOCR – general overview of functions and processing:

German, English

Configure folders (inbox, output, archive, errors)
Select engine
PDF/A function
Service account (network resources)
Processing options (actions, folder monitoring)
Web service option
Log

AutoOCR / AutoOCR light Version 2.0.15

2019-12-19

new features AutoOCR Version 2.0.15:

New functions / tabs for “Image processing”, “PDF info fields”, “PDF / A” and “PDF Compressor”

Image Processing: The image processing functions have been significantly expanded and are available outside of the OCR engines. This allows the scans to be optimized and improved before OCR recognition in order to increase recognition accuracy and improve image quality. Image processing is also part of AutoOCR light.

Image processing functions:

- Several functions can be carried out one after the other in a predetermined order.
- The selected functions, their parameters and processing sequence are managed via profiles.
- Profile functions: New, Copy, Delete, Rename, Export to file, Import from file.
- Option to process PDF scans / pages only with image information or all PDF pages.
- Load a master page and test the image processing commands with a preview of the source and result file.

Individual functions of image processing:

- Detect and remove blank pages.
- Automatically rotate pages
- Align sides
- Invert images (black to white)
- Remove black border
- Trim the edge
- Remove impurities
- Remove perforations
- Remove lines
- Convert color / grayscale to black and white

PDFCompressor integrated: This enables the PDF files generated by the OCR process to be optimized and compressed to a minimum. The input for OCR processing should always be the best possible scan with a correspondingly high quality and resolution (300dpi for black / white and 200-300dpi color). This is good for OCR recognition, but it creates large result files. In order to create the smallest possible PDF files in the end result after OCR processing, PDFCompressor processing can be added to the OCR process in order to reduce the resolution of the images to e.g. 150dpi decrease. Good OCR recognition can be achieved with the smallest possible output files. 150dpi offers sufficient readability, but would be too low for OCR recognition. The PDFCompressor is available as an option for AutoOCR.

PDF information fields: The PDF information fields are now also available independently of the PDF/A function in a separate tab in all AutoOCR variants.

PDF2PDFA Konvertercomponent integrated: This means that all functions of the PDF2PDFA converter component are available in AutoOCR.

Archive folder configuration: New variables for date and time are available for the archive folder configuration.

Download – AutoOCRLight – Low Cost OCR Server (ca. 410MB) >>>
Download – AutoOCR – OCR Server inkl. OmniPage OCR (ca. 640MB) >>>

OmniPage OCR engine for AutoOCR & AutoOCRLight from 2.0.7

2019-06-25

Benefits of OmniPage OCR:

Recognition accuracy at the highest level, even for difficult documents
Fastest OCR processing – much faster and more powerful than anything we’ve tested and implemented so far. 1-2 seconds to create a searchable PDF per page are possible.
Affordable – 25,000 pages license with lower licensing costs than the previous 10,000-page Abbyy license
Easier activation of the (demo) license – The OmniPage OCR engine can be activated via our license server including a 30-day demo version together with the basic application.

Please note that for licensing reasons the OmniPage OCR Engine can only be installed on client operating systems – Windows 7/10 but not on Microsoft Server 2008, 2012, 2016 or 2019. The setup can only be performed on Windows 7/10. In terms of performance and stability, this is not a disadvantage. OCR processes are performance-intensive and should be executed on an own hardware (eg Intel NUC) with as many CPU cores and SSD disk as possible for optimal throughput.

The OmniPage OCR engine can be activated and is included in AutoOCR setup for AutoOCR or AutoOCRLight from version 2.0.7 as an option in addition to IOCR (Tesseract OCR). For AutoOCRLight, the OmniPage OCR can be separately downloaded and installed.

Download – OmniPage OCR Engine as Option for AutoOCRLight (ca. 235MB) >>>

Download – AutoOCRLight – Low Cost OCR Server (ca. 410MB) >>>

Download – AutoOCR – OCR Server incl. OmniPage OCR (ca. 640MB) >>>

AutoOCR / AutoOCR light 2.0 – iOCR innovations

2019-04-15

Innovations AutoOCR / AutoOCR light Version 2.0:

The iOCR Standard OCR component of AutoOCR / AutoOCR light is now based on the new Tesseract OCR Version 4.0.
Multiple languages can be selected for OCR recognition.
Additional option to select OCR accuracy / speed.

Configurable parameter for splitting documents into smaller single documents to handle large page number documents with small / limited memory resources.

Conditional OCR processing – via file format, page number, page format – width / height in mm or pixels, resolution, file size and color depth – configurable for folder monitoring and web service processing. This can be controlled by file format per criteria – whether an OCR processing or conversion to a PDF image or whether the OCR processing of such files should be blocked. This can be prevented that the OCR processing is blocked by “meaningless” processing. for example if large JPEG photos get into the OCR processing process and handwriting recognition makes no sense.

Further information on the AutoOCR / AutoOCR light – iOCR extensions can be found here >>>

Download – AutoOCR – OCR Server (ca. 410MB) >>>

Download – AutoOCRLight – Low Cost OCR Server (ca. 410MB) >>>
For the update of the AutoOCR light version 1.x to 2.x a new license is required.

Folder Monitoring – “File System Event” / “Blockwise Processing”

2019-04-11

An application to monitor folders. For example AutoOCR / AutoOCRlight and so on, there are options that determine how the files are recognized for processing from the folders and when their processing is started.

File-System Event:

An operating system function is used to detect changes to files as well as new files in a folder / folder structure and to start processing immediately. This option should only be used for local folder / folder structures, but not for processing network shares.

Blockwise Processing:

The folder is read in “block by block”. That It always blocks in the set maximum number of files are read in and processed. After processing a “block” the next “block” starts etc. until all files have been processed. If no further files are found then the folder will be every 10msec. asked for new files. The “blockwise processing” should be used for folder monitoring of network drives.

Start of processing / delayed Start:

The processing of a newly detected file usually starts immediately, but there may also be reasons to delay the processing. For each watched folder, there is a setting to delay the start of processing by x seconds. First, the set time is waited in seconds and then processing is started – see also >>> – After the delay, the files to be processed are checked whether they are already free and not read-only. Here again max. 10sec. waited for the release of the file.

AutoOCRLight Version 1.17.2 available

2018-05-29

Since AutoOCR and AutoOCRLight are based on the same basis, AutoOCRLight version 1.17.2 is now also available together with the AutoOCR version 1.17.2. The “Light” version differs from the AutoOCR full version due to the following limitations:

Only one folder can be monitored
Only the iOCS / vsOCR and not the Abbyy OCR can be used
No PDF/A output is possible
The Light version has no web service (REST / SOAP) interface

All other functions are the same as the standard AutoOCR Server

Download – AutoOCRLight – Low Cost OCR Server >>>

iOCR / vsOCR Setup divided into standard and additional languages

2017-03-16

The iOCR / vsOCR setup containing the language and dictionary files of our standard OCR engine is more than 270MB in size. In order to make the downloads and the setups smaller, we decided to split the iOCR / vsOCR into a “base” and an “additional setup”. The basic setup, which is available through our applications, eg. AutoOCR, FileConverterPro, or PDFmdx now only contains a selection of major European languages and has been reduced to 127MB.

If all available languages are to be installed, this is possible at any time. The additionally available “exotic languages” can be installed via a separate setup.

iOCR Basic-languages:

Danish, German, English, Finnish, French, Italian, Catalan, New Greek, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Swedish, Slovakian, Slovenian, Spanish, Czech, Turkish, Ukrainian, Hungarian

iOCR extende languages:

Afrikaanis, Albanian, Arabic, Azerbaijani, Bahasa Indonesian, Bengali, Bulgarian, Cherokee, Chinese – Traditional, Chinese – Simplified, Estonian, Franconian, Gallic, Hebrew, Hindi, Icelandic, Japanese, Korean, Croatian, Latvian, Lithuanian, Macedonian, Malay , Serbian, Swahili, Tagalog, Tamil, Telugu, Thai, Vietnamese, Belarusian

Download – iOCR (vsOCR) Setup – Basis Sprachen (ca. 127MB) >>>

Download – iOCR (vsOCR) Setup – zusätzliche Sprachen (ca. 200MB) >>>