iOCR / vsOCR Setup divided into standard and additional languages

The iOCR / vsOCR setup containing the language and dictionary files of our standard OCR engine is more than 270MB in size. In order to make the downloads and the setups smaller, we decided to split the iOCR / vsOCR into a “base” and an “additional setup”. The basic setup, which is available through our applications, eg. AutoOCR, FileConverterPro, or PDFmdx now only contains a selection of major European languages and has been reduced to 127MB.

If all available languages are to be installed, this is possible at any time. The additionally available “exotic languages” can be installed via a separate setup.

iOCR Basic-languages:

Danish, German, English, Finnish, French, Italian, Catalan, New Greek, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Swedish, Slovakian, Slovenian, Spanish, Czech, Turkish, Ukrainian, Hungarian

iOCR extende languages:

Afrikaanis, Albanian, Arabic, Azerbaijani, Bahasa Indonesian, Bengali, Bulgarian, Cherokee, Chinese – Traditional, Chinese – Simplified, Estonian, Franconian, Gallic, Hebrew, Hindi, Icelandic, Japanese, Korean, Croatian, Latvian, Lithuanian, Macedonian, Malay , Serbian, Swahili, Tagalog, Tamil, Telugu, Thai, Vietnamese, Belarusian

Download – iOCR (vsOCR) Setup – Basis Sprachen (ca. 127MB) >>>

Download – iOCR (vsOCR) Setup – zusätzliche Sprachen (ca. 200MB) >>>

 

FileConverterPro & AutoOCR – test website available

To test the functions of FileConverterPro and AutoOCR and to run own conversion without having to install the software we made a server with FileConverterPro and AutoOCR, accessible via the internet for free.

Under MS-Windows the applications DropConvert (for FileConverterPro) and/or DropOCR (for AutoOCR) can be installed to carry out processings and to be able to run tests with these applications.

These Services can be used without installation of a client software and from any platform with only a browser. Therefor we have set up own test-websites to upload documents and convert them to PDF or PDF/A and/or run a PDF-OCR conversion.

FileConverterPro – test website:

URL: http://autoocr.may.co.at:3000/fcpro

Supported input-document formats:

  • DOC, DOCX, DOCM, RTF, TXT, ODT
  • XLS, XLSX, XLSM
  • PPT, PPTX, PPS, PPSX,
  • FDF, XFDF (Adobe Formulare),
  • XML
  • PNG, BMP, TIF, TIFF, JPG, JPEG, GIF
  • ZIP, RAR, 7Z,
  • MSG, EML,
  • PDF,
  • HTM, HTML, MHTML,
  • PMTX (PDFMerge)
  • DWG, DXF, DWF
  • Abbyy: PDF, TIF, TIFF, PNG, JPG, JPEG, BMP, GIF, PCX, DCX, JP2, JPC, DJV, DJVU, WDP
  • iOCR:  PDF, TIFF, JPEG, PNG

Processing profiles:

At all profiles placeholder pages get inserted when conversion errors occur and for not convertible file formats.

  • Default – direct conversion without MS-Office 2010, no OCR processing
  • Direct + iOCR German – direct conversion without MS-Office 2010, iOCR german
  • Direct – no OCR – PDFA – direct conversion without MS-Office 2010, PDF/A, no OCR processing
  • Direct – no OCR – with draft stamp and overlay – direct conversion without MS-Office 2010, stamps top left with filename / date / time, watermark (stamp) “Draft”, Sample stationery is underlayed, no OCR processing
  • MS-Office + Abbyy + PDFA – conversion of the Office documents via MS-Office 2010, PDF/A-1b output, Abbyy OCR – german & english
  • MS-Office + Abbyy – conversion of the Office documents via MS-Office 2010, Abbyy OCR – german & english
  • MS-Office – no OCR – PDFA – conversion of the Office documents via MS-Office 2010, PDF/A-1b output, no OCR processing

 

AutoOCR – test website:

URL: http://autoocr.may.co.at:3000/autoocr

Supported input-document formats:

  • Abbyy: PDF, TIF, TIFF, PNG, JPG, JPEG, BMP, GIF, PCX, DCX, JP2, JPC, DJV, DJVU, WDP
  • iOCR:  PDF, TIFF, JPEG, PNG

Processing profiles:

  • Abbyy PDFA – German & English – PDF/A output, languages – english & german
  • AbbyyFR10 – english & german – no PDF/A, languages – english & german
  • iOCR – English – PDFA – PDF/A – output, language – english
  • iOCR – English – no PDF/A, language – english
  • iOCR – German no PDF/A, language – german

On the test-sites it can be switched between the FileConverterPro and the AutoOCR test-site directly.

 

Node.js as base for the test websites:

For the implementing of the test websites for the FileConverterPro and AutoOCR we used the currently most modern tools for web-software-development. The programming was realized with JavaScript only, client- as well as server side.

The following components come to use:

  1.  Node.js – JavaScript for the server – http://nodejs.org/
  2. Node.js  FileConverterPro / AutoOCR Libraryhttps://github.com/XKEYGmbH/node-fcpro
  3. Bootstraphttp://getbootstrap.com/
  4. AngularJShttps://angularjs.org/

1_FileConverterPro - Test Site - Dokumente hochladen und nach PDF bzw. PDFA konvertieren3_Die eingefügten Dateien werden in der Liste angezeigt - die Auswahl des Verarbeitungsprofils ist pro Datei möglich   4_Mit Start der Konvertierung - werden die Dateien auf den Testserver hochgeladen und gleich konvertiert  5_Nach der Konvertierung können die erzeugten PDFs über den Download Link abgerufen werden  2_AutoOCR Test Site - Scans, Images und PDF hochladen und in durchsuchbare PDF bzw.PDFA konvertieren

DropOCR – version 1.2.5 available

Innovations DropOCR version 1.2.5:

  • Direct selection of the AutoOCR processing profile through the context menu of the icon tray application
  • function “Cancel all jobs” – with that currently running transfers and processes can be canceled immediatly
  • The “AutoStart” Option is now activated by default
  • The max. page amount is now preset to 1000 by default
  • The connection data of the AutoOCR testserver are already preassigned with the installation

DropOCR - Context Menu - Icon Tray Anwendung  DropOCR - Konfigurationseinstellungen 1.2.5

Download – DropOCR >>>

FileConverterPro (FCpro) – PDF(/A) conversion service with SOAP / REST – web-service

The FileConverterPro is installed as Windows service and offers functions for the conversion of the most important document formats to PDF, PDF/A incl OCR through a web-service interface (SOAP or REST).

For FCpro the same base components as for the FileConverterPDFMerge and AutoOCR are used. Adjustments and extensions are therefor available the same for all these applications.

The web-service interface of FCpro is compatible to the web-service interface of AutoOCR by which all applications are runnable without adjustment with both services. With that e.g. our Alfreso / ifresco Transformer integration can be operated with AutoOCR – for pure OCR processing – as well as with FCpro – to process all document formats incl. OCR.

As well as for AutoOCR also for the FCpro service is a ready-to-use .NET / C# sample application with EXE and source code available. With it the FCpro functions can be tested immediatly or the code can be used as a base for the integration in own applications.

PDF or PDF/A conversion – all important file formats – MS-Office, image, e-mail, HTML and so on get converted to PDF or PDF/A automatically. Normally no other components or MS-Office are required. The conversion takes place directly without additional applications or printer drivers. Optionally also MS-Office 2010/2013 can be used as converter component if available or a “high fidelity” conversion for office formats is required. Image and PDF documents can be made searchable via the integrated iOCR. Optional through a additional license also the Abbyy OCR engine can be used.

Supported source file formats:

  • DOC, DOCX, RTF, TXT,
  • XLS, XLSX,
  • PPT, PPTX, PPS, PPSX,
  • FDF, XFDF (Adobe forms),
  • XML
  • PNG, BMP, TIF, TIFF, JPG, JPEG, GIF
  • ZIP, RAR, 7Z,
  • MSG, EML,
  • PDF,
  • HTM, HTML, MHTML,
  • PMTX (PDFMerge)

Functions – general:

  • MS-Windows service application with SOAP / REST web-service interface for document conversion from Office, PDF, image, HTML, ZIP, MSG and e-mail to PDF or PDF/A. The communication takes place encrypted via HTTPS.
  • Processing profiles – all settings can be preconfigured and retrieved and used through profiles.
  • Direct conversion without usage of additional required original applications.
  • For the “high fidelity” conversion of MS-Office documents also MS-office 2010 / 2013 can be installed and used.
  • Dissolving and conversion of container files – ZIP, 7ZIP, RAR, MSG, EML, PMTX – to build overall files. Structures are displayed as bookmarks, for not convertible documents placeholder sites are inserted
  • images and scans (TIF, JPEG, PNG, BMP, GIF, PDF) can be converted to searchable PDF’s with the integrated iOCR – Abbyy OCR engine as option.
  • Parallel processing with configurable amount of processes and priorities – allows the optimal usage of the hardware and guarantees a quick work off.

Special functions:

  • With ZIP/RAR/7Z conatainers all contained and supported documents get extracted automatically, converted and merged to a single PDF overall document. The folder structure contained in the container gets displayed in the PDF output document as the bookmark structure.
  • MSG / EML – e-mails can contain any also interlaced attachements. This documents get extracted also, for not convertible formats placeholder pages get inserted and the structure gets displayed via PDF bookmarks.
  • PDF/A conversion – the FCpro is also a PDF to PDF/A converter. The converted documents can be produced as PDF/A-1b or also with embedded source documents as ISO standardized PDF/A-3b format. Therefor the FCpro service is suited ideally for long-term archiving of documents and e-mails.
  • PMTX – is a XML data format from PDFMerge which contains structure and processing information as well as the documents themselves. From it PDFMerge creates a single overall PDF file which consists of the converted and merged single files. The PDFMerge structure gets displayed via the PDF bookmarks.
  • FDF, XFDF – PDF form data – can be fused with PDF forms and converted to a “normal” PDF.
  • Stamps, watermarks, stationery – can be configured and applied
  • Intelligent OCR of PDF – PDF documents get analyzed page per page if OCR is required or not – pages which already contain text don’t get OCR processed again, bookmarks and links stay preserved. This saves time and resources and increases the quality.

Functions – PDF-export settings – part of the processing profiles

  • Infilling of PDF profile fields with fixed values or variables (origin values, profile name, date, time, PC name, user, file name, application, pages, PDF-level, user variables)
  • Web-optimization (yes / no)
  • Preserve existing bookmarks (yes / no)
  • Settings for opening the PDF
  • Security settings – password-opening, system, restrictions
  • Pagination – position, start, offset, text (current page, pages), font, color, masking underlying area
  • Stationery / PDF watermarks – underlay / overlay, file selection, opacity (%), position
  • Text stamp – one or more stamps, text or variables (like profile fields incl. bookmarks), start, offset, font, style (incl. outline), size, color, opacity (%), angle

FCpro user interface:

UI1_FCpro - Config of web-service endpoints UI2_FCpro - Conversion profiles UI3_FCpro - Advanced settings UI4_FCpro - Advanced settings - web-service user config and rights UI5_FCpro - Advanced settings - service account config UI6_FCpro - Advanced settings - MIME types config UI7_FCpro - Icon tray functions

FCpro – conversion profile:

CO1_FCpro - Conversion profile config - office documents CO2_FCpro - Conversion profile config - image documentsCO3_FCpro - Conversion profile config - HTML documents CO4_FCpro - Conversion profile config - XML CO5_FCpro - Conversion profile config - PDFA and PDFExport settings CO6_FCpro - Conversion profile config - FDF XFDF forms CO7_FCpro - Conversion profile config - OCR settings

FCpro – conversion profile – OCR:

OC1_FCpro - Conversion profile config - iOCR settings #1 OC2_FCpro - Conversion profile config - iOCR settings #2 - image processing OC3_FCpro - Conversion profile config - iOCR settings #3 - language selection OC4_FCpro - Conversion profile config - iOCR settings #4 - language selection OC5_FCpro - Conversion profile config - Abbyy OCR settings - predefined profiles OC6_FCpro - Conversion profile config - Abbyy OCR settings - general settings OC7_FCpro - Conversion profile config - Abbyy OCR settings - recognition - image processing OC8_FCpro - Conversion profile config - Abbyy OCR settings - recognition - page analysis OC9_FCpro - Conversion profile config - Abbyy OCR settings - recognition - page synthesis OC10_FCpro - Conversion profile config - Abbyy OCR settings - PDF export parameter

Available FCpro applications / clients:

The FCpro server provides its functionality through a SOAP /REST – web-service interface to other applications. The following applications and integrations are available for the FCpro or use its functions:

1.)    FileConverterPro – WCF service sample – this client application is additionally installed with the FCpro setup. With it all function availabel via the web-service can be tried and tested. Beside the EXE is this application also available as C# source code to be able to use FCpro functions from own applications quick and easily.

2.)    DropConvert – convert documents to PDF(/A) via drag & drop or folder monitoring. DropConvert is a Windows client application which communicates with the FCpro service to convert documents which are dragged into the always “on top” displayed “DropZone” or into a monitored folder to PDF or PDF/A. The result documents are transfered back to the client and deposited in a configurable output folder. The FCpro server is called https encrypted through the local net or also external through the internet.

3.)    EMail Archiver – The EMail Archiver is a MS Outlook 2010 / 2013 plug-in with which single or multiple marked e-mails or even whole e-mail folders and subfolders with all contained e-mail masseges can be converted to PDF or PDF/A directly out of MS Outlook. The processing and conversion of the e-mail runs via the FCpro server which is called encrypted via https on the local network or external through the internet. The resulting PDF(/A)’s get deposited in a configured start-folder and path into the file system with the variable information extracted from the e-mail.

PDF/A and especially PDF/A-3 are particularly good for the archiving or for ISO standardized long-term archiving of e-mails. With PDF/A-3 the original MSG / EML messages get embedded also in the PDF container.

4.)    Alfresco / ifresco – Transformer – the installation of the “ifresco Transformer” AMP’s for Alfresco allows the PDF(/A) conversion and the OCR processing through the FileConverterPro server. If only OCR is required the AutoOCR server can be used instead. The processing of the supported document formats to PDF, PDF/A and/or with OCR is then available via Java, JavaScript, REST, the “transform” action through folders and in Alfresco Share as “transform” document action.

FCpro – versions, licensing, scope

The FileConverterPro is available in a basic version as well as in an extended version incl. PDF/A and OCR. With the extended version optionally the Abby OCR engine can be licensed additionally to the iOCR. Abbyy licenses are available page (monthly or overall volume) or processor dependent. The FCpro standard license is per server but there are also “Enterprise” for any amount of servers per company and “OEM” licenses for the integration through developers in their own applications.

Also containing in the FCpro server is the – WCF service sample application incl. source code and the MS-Windows application “DropConvert” – this can be installed and used on any workplace without restrictions.

Download – FileConverterPro (FCpro) ~250MB >>>

DropOCR – version 1.2.1 available

Innovations DropOCR version 1.2.1 :

  • Userinterface switchable between german and english
  • HTTP and HTTPS support
  • Logging of the conversion processes, deleting of the log file
  • AutoStart function to start the application when the PC is started
  • Doubleclick on the Drop Zone opens the destination folder
  • AutoOCR testserver preconfigured

Our AutoOCR testserver is reachable vie the following URL and may be used for testing purposes:

  • https://autoocr.may.co.at:8001/AutoOCRService2/
  • User: admin
  • Password: autoocr

DropOCR Konfiguration DropOCR - Context Menü - DropZone

Download – DropOCR >>>

AutoOCR version 1.10.11 – run subsequent processing through DLL

With the version 1.10.9 a new function was implemented to run a subsequent action after the OCR and the creation of the destination file. This could take place at monitored folders as well as at the processing via web-services as C# or VB.NET scripts.

With the AutoOCR version 1.10.11 this possibility got further extended – Now it is also possible to use external DLL’s to run subsequent functions.

Via a checkbox it can be switched between source code (script) and DLL processing and via a selective list the DLL can be chosen.

For that there is a new interface action IAction2 which is inherited from IAction. For the DLL to be available to choose it has to be copied into the AutoOCR installation folder. All DLL’s which end with %NAME%.AutoOCRPlugin.dll get referenced. Please keep in mind that with the installation of AutoOCR as windows service no message boxes or other user interactions are possible and therefor can’t be used.

For the additional tab to show up and be configurable AutoOCR has to be started with the commandline parameter /ShowAction.

Zusätzlicher Tab bei den Ordner Eigenschaften für Aktionen über DLL oder Script  Zusätzlicher Tab bei den OCR Profilen für die Web-Service Schnittstelle Aktionen über DLL oder Script

Download – sample project – DLL action – C# / .NET >>>
Download – AutoOCR – OCR Server incl. iOCR engine (ca. 150MB) >>>

For the Abbyy OCR engine version 10 there are demo licenses for 30 days or 500 pages available – these can be requested from us

Download- Abbyy FineReader 10.x Rel 4 OCR Engine Setup (ca. 460MB) >>>
Request demolicensekey for FineReader OCR engine

ifresco AutoOCR – Version 1.18 available

With the Version 1.18 of ifresco AutoOCR – the OCR server integration for Alfresco, there are new functions and extensions:

  • implementation of the new paging API for the Jobs-list of the AutoOCR server – page browsing (back/forth), deleting of all jobs, deleting older than x days, sort jobs, select jobs by date.
  • free configurable run-time transformer. File-, as well as Pipe-IO based commandline tools can be used to configure additional transformers.
  • Like the commandline based run-time transformators, also Transformer can be used through JavaScripts.
  • AutoOCR Content Model extension for the OCR status (aspect) gets installed to be able to deposit and request the OCR status of a file as metadata.
  • The optional ifresco Tools AMP – allows the background OCR processing in defined intervals for the primary processing of existing document collections or for the following processing of the newly added documents. The detection of the documents which should be processed, as well as the processing itself happens via JavaScripts, which are executed, on the server, batch oriented and timed in the background. Thereby also additional Alfresco Share – document actions can be configured and executed through JavaScript e.g. to convert the chosen PDF and image documents to searchable PDF(/A)’s through the AutoOCR server and automatically replace the input files with them. With the ifresco Tools there are, through JavaScripts, AutoOCR functions independend from the configured Alfresco transformer available, for the mass-batch- as well as the interactive single processing.

AMP of the version 1.18 are available for the following Alfresco versions: 4.0.1 EE, 4.0.2 EE, 4.0d CE, 4.1.1 EE, 4.1.2 EE, 4.1.3 EE, 4.1.4 EE, 4.2b CE, 4.2c CE
AMP of the ifresco Tools 1.1 for: 4.2c CE, 4.2d CE

ifresco AutoOCR - New Job functions  ifresco AutoOCR - Runmtime transformer  ifresco AutoOCR - Transformer configuration Content Model for ifresco-AutoOCR

Download – ifresco AutoOCR – Runtime Transformer description >>>
Download – ifresco AutoOCR – Transformer through JavaScript description>>>
Download – ifresco AutoOCR – Example JavaScript Transformer >>>

ifresco AutoOCR – JavaScript Binding for Alfresco

Alfresco and AutoOCR are, with the installation of the AMP’s, integrated through a REST web-service interface. Server-based JavaScript offers an easy, flexible and quickly implemented Possibility to expand and adjust Alfresco functions.

JavaScripts can be initiated timed as batch processes to e.g. process a bigger amount of documents in the background. But they also can be called by the client e.g. Alfresco Share, to be used as document-actions for single or multiple documents.

The JavaScript Binding  of the AutoOCR functions allows direct access to the AutoOCR service from Alfresco scripts. In Repository JavaScripts (WebScript-controller scripts, scripted actions) all functions of the AutoOCR API can be called. This API is completely independent from the integration of the AutoOCR-service as Alfresco-transformer. It gives the possibility of using OCR functions out of JavaScripts which, deposited in Alfresco, are executed directly on the server.

Download – Documentation JavaScript Binding for Alfresco >>>
Download – extensive demo script >>>

Low Cost OCR Server – AutoOCRLight – OCR processing without limits

Starting from our, since many years approved and tested OCR Server – AutoOCR, we now, with “AutoOCRLight”, offer a low cost variant. It has, compared to the AutoOCR full product, a lower price but also a limited functionality.

Differences AutoOCRLight to AutoOCR:

  • only one in / out folder can be configured
  • with iOCR, only one OCR engine is available – the Abbyy OCR engine isn’t supported
  • no  PDF/A support – only PDF snd TXT output
  • no SOAP / REST web-service interface and so no usage of the free AutoOCR additional applications DropOCR, FineOCR and ifresco Transformer.

Advantages / Highlights AutoOCRLight:

  • Installable as Windows Service or as normal application under 32 and 64bit OS
  • Folder – monitoring – new added files automatically get recognized and processed
  • Processes – PDF or image files (TIFF, JPEG) – black&white, grayscale, color
  • iOCR – OCR engine without page limit for generation of searchable PDF or TXT
  • Image prozessing functions for improvement of the source documents – automatic – turn – pageorientation recognizing, straighten, crop edges, remove impurities, remove perforation, remove lines.
  • Intelligent PDF-OCR processing of mixed documents – checked page by page if an OCR processing is necessary.
  • High throughput by parallel processing

1 AutoOCR light - Userinterface  2 AutoOCR light - iOCR Settings  3 AutoOCR light - iOCR - Image processing  4 AutoOCR light Settings  5 AutoOCR light - Processing options  6 AutoOCR light - Archive und error folder configuration  7 AutoOCR light - E-mail configuration for error notifications  8 AutoOCR light - Logging

Download – AutoOCRLight – Low Cost OCR Server >>>