AutoOCR – Page 4 – PDF News – PDF/A, Archivierung, OCR, DMS, Dokumentenmanagment, Scan to PDF, ECM, PDF Convert, Free PDF printerdriver, freier PDF Druckertreiber, SDK, API, PDF softwaredevelopment

Category: AutoOCR

ifresco Tools – RepoWorker scripts – convert Alfresco documents to searchable PDF or PDF/A automatically

2014-06-25

The module ifresco Tools offers the following functions for the Alfresco ECM / DMS:

ifresco-RepoWorker – enables time-controlled execution of a repository-JavaScript on a definable amount of documents.
ifresco-ScriptAction – enables the definition of share-actions which execute Repository-JavaScript on documents.

RepoWorker – scripts integrate AutoOCR and FileConverterPro:

With the RepoWorker we created an extension for the ifresco Transformer based on scripts. With that all existing and / or newly added documents of specific content- or MIME-types of an Alfresco server are converted to searchable PDF or PDF/A documents. The user doesn’t has to be concerned with it, the conversion takes place at the server automatically, indepent of how the documents are added into the ECM / DMS.

Functions:

time-controlled execution of JavaScript on a definable amount of documents
existing documents of a specific content- and MIME-type get converted to searchable PDF or PDF/A and replace the source-documents.
processed documents get marked with the “Transform” aspect to prevent a repeated processing.
singular or in definable time intervals repeated execution of scripts e.g. every 5 min
scripts can easily and quickly be adjusted to new conditions and requirements.
easy installation and configuration

Description – RepoWorker scripts for AutoOCR / FileConverterPro >>>

GitHub – RepoWorker scripts for AutoOCR / FileConverterPro >>>

Requirements:

Alfresco 4.x,
AutoOCR or FileConverterPro ,
ifresco Transformer (AMP).
ifresco Tools (AMP)

A demo installation can also be found on our ifresco / Alfresco testserver (admin / admin)

FileConverterPro & AutoOCR – test website available

2014-06-11

To test the functions of FileConverterPro and AutoOCR and to run own conversion without having to install the software we made a server with FileConverterPro and AutoOCR, accessible via the internet for free.

Under MS-Windows the applications DropConvert (for FileConverterPro) and/or DropOCR (for AutoOCR) can be installed to carry out processings and to be able to run tests with these applications.

These Services can be used without installation of a client software and from any platform with only a browser. Therefor we have set up own test-websites to upload documents and convert them to PDF or PDF/A and/or run a PDF-OCR conversion.

FileConverterPro – test website:

URL: http://autoocr.may.co.at:3000/fcpro

Supported input-document formats:

DOC, DOCX, DOCM, RTF, TXT, ODT
XLS, XLSX, XLSM
PPT, PPTX, PPS, PPSX,
FDF, XFDF (Adobe Formulare),
XML
PNG, BMP, TIF, TIFF, JPG, JPEG, GIF
ZIP, RAR, 7Z,
MSG, EML,
PDF,
HTM, HTML, MHTML,
PMTX (PDFMerge)
DWG, DXF, DWF
Abbyy: PDF, TIF, TIFF, PNG, JPG, JPEG, BMP, GIF, PCX, DCX, JP2, JPC, DJV, DJVU, WDP
iOCR: PDF, TIFF, JPEG, PNG

Processing profiles:

At all profiles placeholder pages get inserted when conversion errors occur and for not convertible file formats.

Default – direct conversion without MS-Office 2010, no OCR processing
Direct + iOCR German – direct conversion without MS-Office 2010, iOCR german
Direct – no OCR – PDFA – direct conversion without MS-Office 2010, PDF/A, no OCR processing
Direct – no OCR – with draft stamp and overlay – direct conversion without MS-Office 2010, stamps top left with filename / date / time, watermark (stamp) “Draft”, Sample stationery is underlayed, no OCR processing
MS-Office + Abbyy + PDFA – conversion of the Office documents via MS-Office 2010, PDF/A-1b output, Abbyy OCR – german & english
MS-Office + Abbyy – conversion of the Office documents via MS-Office 2010, Abbyy OCR – german & english
MS-Office – no OCR – PDFA – conversion of the Office documents via MS-Office 2010, PDF/A-1b output, no OCR processing

AutoOCR – test website:

URL: http://autoocr.may.co.at:3000/autoocr

Supported input-document formats:

Abbyy: PDF, TIF, TIFF, PNG, JPG, JPEG, BMP, GIF, PCX, DCX, JP2, JPC, DJV, DJVU, WDP
iOCR: PDF, TIFF, JPEG, PNG

Processing profiles:

Abbyy PDFA – German & English – PDF/A output, languages – english & german
AbbyyFR10 – english & german – no PDF/A, languages – english & german
iOCR – English – PDFA – PDF/A – output, language – english
iOCR – English – no PDF/A, language – english
iOCR – German – no PDF/A, language – german

On the test-sites it can be switched between the FileConverterPro and the AutoOCR test-site directly.

Node.js as base for the test websites:

For the implementing of the test websites for the FileConverterPro and AutoOCR we used the currently most modern tools for web-software-development. The programming was realized with JavaScript only, client- as well as server side.

The following components come to use:

Node.js – JavaScript for the server – http://nodejs.org/
Node.js FileConverterPro / AutoOCR Library – https://github.com/XKEYGmbH/node-fcpro
Bootstrap – http://getbootstrap.com/
AngularJS – https://angularjs.org/

DropOCR – version 1.2.5 available

2014-06-10

Innovations DropOCR version 1.2.5:

Direct selection of the AutoOCR processing profile through the context menu of the icon tray application
function “Cancel all jobs” – with that currently running transfers and processes can be canceled immediatly
The “AutoStart” Option is now activated by default
The max. page amount is now preset to 1000 by default
The connection data of the AutoOCR testserver are already preassigned with the installation

Download – DropOCR >>>

DropOCR – version 1.2.1 available

2014-04-03

Innovations DropOCR version 1.2.1 :

Userinterface switchable between german and english
HTTP and HTTPS support
Logging of the conversion processes, deleting of the log file
AutoStart function to start the application when the PC is started
Doubleclick on the Drop Zone opens the destination folder
AutoOCR testserver preconfigured

Our AutoOCR testserver is reachable vie the following URL and may be used for testing purposes:

https://autoocr.may.co.at:8001/AutoOCRService2/
User: admin
Password: autoocr

Download – DropOCR >>>

ifresco AutoOCR – Version 1.20 – supports the recent Alfresco versions

2013-12-10

The ifresco AutoOCR Transformer for Alfresco is now available in the version 1.20 and supports the recent Alfresco versions.

For the following Alfresco versions adjusted AMP installation files are available:

Alfresco EE – Enterprise Edition: 4.0.1, 4.0.2, 4,0d, 4.1.1, 4.1.2, 4.1.4, 4.1.6
Alfresco CE – Community Edition: 4.2b, 4.2c, 4.2e

AutoOCR version 1.10.11 – run subsequent processing through DLL

2013-10-28

With the version 1.10.9 a new function was implemented to run a subsequent action after the OCR and the creation of the destination file. This could take place at monitored folders as well as at the processing via web-services as C# or VB.NET scripts.

With the AutoOCR version 1.10.11 this possibility got further extended – Now it is also possible to use external DLL’s to run subsequent functions.

Via a checkbox it can be switched between source code (script) and DLL processing and via a selective list the DLL can be chosen.

For that there is a new interface action IAction2 which is inherited from IAction. For the DLL to be available to choose it has to be copied into the AutoOCR installation folder. All DLL’s which end with %NAME%.AutoOCRPlugin.dll get referenced. Please keep in mind that with the installation of AutoOCR as windows service no message boxes or other user interactions are possible and therefor can’t be used.

For the additional tab to show up and be configurable AutoOCR has to be started with the commandline parameter /ShowAction.

Download – sample project – DLL action – C# / .NET >>>
Download – AutoOCR – OCR Server incl. iOCR engine (ca. 150MB) >>>

For the Abbyy OCR engine version 10 there are demo licenses for 30 days or 500 pages available – these can be requested from us

Download- Abbyy FineReader 10.x Rel 4 OCR Engine Setup (ca. 460MB) >>>
Request demolicensekey for FineReader OCR engine

ifresco AutoOCR – Version 1.18 available

2013-09-13

With the Version 1.18 of ifresco AutoOCR – the OCR server integration for Alfresco, there are new functions and extensions:

implementation of the new paging API for the Jobs-list of the AutoOCR server – page browsing (back/forth), deleting of all jobs, deleting older than x days, sort jobs, select jobs by date.
free configurable run-time transformer. File-, as well as Pipe-IO based commandline tools can be used to configure additional transformers.
Like the commandline based run-time transformators, also Transformer can be used through JavaScripts.
AutoOCR Content Model extension for the OCR status (aspect) gets installed to be able to deposit and request the OCR status of a file as metadata.
The optional ifresco Tools AMP – allows the background OCR processing in defined intervals for the primary processing of existing document collections or for the following processing of the newly added documents. The detection of the documents which should be processed, as well as the processing itself happens via JavaScripts, which are executed, on the server, batch oriented and timed in the background. Thereby also additional Alfresco Share – document actions can be configured and executed through JavaScript e.g. to convert the chosen PDF and image documents to searchable PDF(/A)’s through the AutoOCR server and automatically replace the input files with them. With the ifresco Tools there are, through JavaScripts, AutoOCR functions independend from the configured Alfresco transformer available, for the mass-batch- as well as the interactive single processing.

AMP of the version 1.18 are available for the following Alfresco versions: 4.0.1 EE, 4.0.2 EE, 4.0d CE, 4.1.1 EE, 4.1.2 EE, 4.1.3 EE, 4.1.4 EE, 4.2b CE, 4.2c CE
AMP of the ifresco Tools 1.1 for: 4.2c CE, 4.2d CE

Download – ifresco AutoOCR – Runtime Transformer description >>>
Download – ifresco AutoOCR – Transformer through JavaScript description>>>
Download – ifresco AutoOCR – Example JavaScript Transformer >>>

ifresco AutoOCR – JavaScript Binding for Alfresco

2013-09-13

Alfresco and AutoOCR are, with the installation of the AMP’s, integrated through a REST web-service interface. Server-based JavaScript offers an easy, flexible and quickly implemented Possibility to expand and adjust Alfresco functions.

JavaScripts can be initiated timed as batch processes to e.g. process a bigger amount of documents in the background. But they also can be called by the client e.g. Alfresco Share, to be used as document-actions for single or multiple documents.

The JavaScript Binding of the AutoOCR functions allows direct access to the AutoOCR service from Alfresco scripts. In Repository JavaScripts (WebScript-controller scripts, scripted actions) all functions of the AutoOCR API can be called. This API is completely independent from the integration of the AutoOCR-service as Alfresco-transformer. It gives the possibility of using OCR functions out of JavaScripts which, deposited in Alfresco, are executed directly on the server.

Download – Documentation JavaScript Binding for Alfresco >>>
Download – extensive demo script >>>

New Web-Site – www.OCRServer.at – online

2013-09-05

We summarized all of our OCR Products on our newly created Website

You can get more information about the following products there:

AutoOCR
AutoOCR light
DropOCR
FineOCR
ifresco Transformer
FileConverter (pro)
ifresco Profiler + Plugins

Intelligent PDF OCR processing via AutoOCR for Abbyy and iOCR

2013-07-11

PDF documents can be generated in different ways. PDFs are able to summarize various contents and sources in one document. Pages can be constructed from “normal” PDF content consisting of text, images, and vector graphics, and typically already have textual content that can be used for full text indexing and search. However, a PDF document can also contain scanned pages in black and white or color. Such pages or documents must undergo OCR recognition to insert the textual information for indexing and searching.

So there are certain PDF documents which either should not be subjected to any OCR processing, or only individual pages or all of them have to be processed because they were generated by a scan process.

Normally, all these types of PDF documents occur in business processes and the user can not distinguish whether or not a document needs to become OCR – viewed from the outside via the Adobe Reader or on the printer, this can not be immediately recognized and distinguished.

If you would generally process every PDF document / page in the same way, regardless of how they are structured and whether an OCR processing makes sense or not, there would be some disadvantages:

Each PDF page is “rasterized” again, regardless of the structure and content, ie converted into an image and then processed OCR. This is like printing the document, scanning it again and then subjecting it to OCR processing. This gives you a picture from a “normal” PDF page with underlying text recognized by the OCR engine.

the quality is not the same as before
the documents become bigger
special PDF properties are lost (bookmarks, links, etc.)
Processing time and resources are consumed
OCR page licenses are consumed unnecessarily

A PDF OCR processing should therefore be “intelligent”, so that in the process and by the user does not have to decide with difficulty whether a PDF document must be subjected to OCR processing or not. Even more difficult is when a single PDF document consists of mixed normal and scanned parts.

That’s why we’ve integrated intelligent OCR processing into AutoOCR, which works in the same way with both the Abbyy and the iOCR OCR engine. This can be controlled per input folder or for the web service interface via the OCR profile and is available for both PDF> PDF and PDF> TXT processing.

Highlights – Intelligent PDF OCR processing:

works for both PDF> PDF and PDF> TXT processing
for the Abbyy OCR and iOCR engine
at the folder as well as for the web service processing
the PDF document as well as every single page are analyzed and only those pages OCR are processed that do not contain any text – these are usually scanned pages that have not yet been processed by OCR.
existing normal PDF documents and pages are taken over unchanged and not processed
OCRed documents and pages are not processed again.
in the case of PDF> TXT processing, the text is extracted from the normal PDF pages and OCR is only performed on pages without text.
PDF functions and bookmarks are retained and are included in the target document.
Saves processing time and Abbyy OCR page licenses
the files are not enlarged
the quality of the PDF pages is preserved.

The “intelligent PDF OCR processing” can be found in addition to AutoOCR in all other of our software products that support OCR processing z.b. ifresco Profiler, FileConverter, DropOCR, PDFMerge, etc.