Author: Wolfgang May

ifresco AutoOCR – JavaScript Binding for Alfresco

Alfresco and AutoOCR are, with the installation of the AMP’s, integrated through a REST web-service interface. Server-based JavaScript offers an easy, flexible and quickly implemented Possibility to expand and adjust Alfresco functions.

JavaScripts can be initiated timed as batch processes to e.g. process a bigger amount of documents in the background. But they also can be called by the client e.g. Alfresco Share, to be used as document-actions for single or multiple documents.

The JavaScript Binding  of the AutoOCR functions allows direct access to the AutoOCR service from Alfresco scripts. In Repository JavaScripts (WebScript-controller scripts, scripted actions) all functions of the AutoOCR API can be called. This API is completely independent from the integration of the AutoOCR-service as Alfresco-transformer. It gives the possibility of using OCR functions out of JavaScripts which, deposited in Alfresco, are executed directly on the server.

Download – Documentation JavaScript Binding for Alfresco >>>
Download – extensive demo script >>>

Low Cost OCR Server – AutoOCRLight – OCR processing without limits

Starting from our, since many years approved and tested OCR Server – AutoOCR, we now, with “AutoOCRLight”, offer a low cost variant. It has, compared to the AutoOCR full product, a lower price but also a limited functionality.

Differences AutoOCRLight to AutoOCR:

  • only one in / out folder can be configured
  • with iOCR, only one OCR engine is available – the Abbyy OCR engine isn’t supported
  • no  PDF/A support – only PDF snd TXT output
  • no SOAP / REST web-service interface and so no usage of the free AutoOCR additional applications DropOCR, FineOCR and ifresco Transformer.

Advantages / Highlights AutoOCRLight:

  • Installable as Windows Service or as normal application under 32 and 64bit OS
  • Folder – monitoring – new added files automatically get recognized and processed
  • Processes – PDF or image files (TIFF, JPEG) – black&white, grayscale, color
  • iOCR – OCR engine without page limit for generation of searchable PDF or TXT
  • Image prozessing functions for improvement of the source documents – automatic – turn – pageorientation recognizing, straighten, crop edges, remove impurities, remove perforation, remove lines.
  • Intelligent PDF-OCR processing of mixed documents – checked page by page if an OCR processing is necessary.
  • High throughput by parallel processing

1 AutoOCR light - Userinterface  2 AutoOCR light - iOCR Settings  3 AutoOCR light - iOCR - Image processing  4 AutoOCR light Settings  5 AutoOCR light - Processing options  6 AutoOCR light - Archive und error folder configuration  7 AutoOCR light - E-mail configuration for error notifications  8 AutoOCR light - Logging

Download – AutoOCRLight – Low Cost OCR Server >>>

PDF2TIFF – colored-PDF to black&white – TIFF Group 4 – color represantation with grayscale-grid

Normally it’s not easily possible to convert a colored document to black and white because within such a conversion the color information gets lost. Colors only can be displayed in grayscales, but also grayscales are only possible to simulate with a corresponding gridding. So grids are the only possibility to display colors in b&w documents with a little disadvantage – the files can’t be compressed that good – which makes them bigger than usual b&w documents with TIFF Gr. 4 compression.

In the course of a customer project, we developed the PDF2TIFF converter to convert colored PDF’s to monochrome TIFF Gr. 4 documents with quality as high as possible. Colored images and grafics also should, with highest possible quality, stay read- and printable.

Functions PDF2TIFF:

  • Service under MS-Windows (32 and 64bit OS) with folder/subfolder-monitoring on PDF documents
  • Added PDF documents get gridded and stored into the outfolder as b&w-TIFF Gr.4 single pages.
  • Parallel processing for high throughput and optimal usage of the ressources.
  • Configuration of the output resolution and page format (standard- 300dpi / A4) for the created TIFF files, automatic recognition of the page orientation.
  • Deleting or archiving of the PDF´s after succesful processing.
  • Service-configuration – local system account or own user
  • Logging of the processing

1_PDF2TIFF_Folder_config  2_PDF2TIFF_processing_profiles_1  3_PDF2TIFF_processing_profiles_2  4_PDF2TIFF_processing_profiles_3  7_PDF2TIFF_logging  6_PDF2TIFF_processing_options

Comparison PDF2TIFF to the normal b&w conversion without gridding:

Comparison PDF2TIFF to the normal b&w conversion without gridding

Download – PDF2TIFF >>>
Download – PDF2TIFF – PDF Source document in color >>>
Download – PDF2TIFF – TIFF results in comparison >>>

ifresco OpenSource Client for Alfresco available as already installed VM Appliance

To use our ifresco Client for Alfresco with minimal outlay and without the required installation and configuration, there now is a already installed VMWare Appliance.

After the installation only the IP and port of the Alfresco Server has to be inserted an the ifresco Client can already be used together with the Alfresco Server. With an optional update agreement you’ll get access to the constantly actualized versions directly from our SVN repository. With this variant the Alfresco Server is used seperatedly from ifresco. Alternativly we also offer a combined Alfresco Community Edition + ifresco Client Appliance – both Systems already installed, configured and optimized on an Ubuntu 64bit LINUX VM Server.

Pre installed ifresco Client VMWare Appliance

  • Debian LINUX 32bit
  • PHP 5
  • Apache 2.2
  • MySQL 5
  • Installed PHP Extensions – PDO, MB-String, XML, SOAP, Iconv
  • All necessary PHP settings done

Requirements: VMWare Workstation, Player, etc. – min. 2GB RAM, 10GB HD

Optional: SVN Update for users with SW-maintenance

Configurationquery at the first start: Alfresco Server IP, Port. After that, the IP Adresse uner which the ifresco Client is reachable in your browser (optimally: Google Chrome) is shown.

Priceinformations in our web-shop >>>

New Web-Site – www.OCRServer.at – online

We summarized all of our OCR Products on our newly created Website

You can get more information about the following products there:

  • AutoOCR
  • AutoOCR light
  • DropOCR
  • FineOCR
  • ifresco Transformer
  • FileConverter (pro)
  • ifresco Profiler + Plugins

FileConverter – automatically convert documents and e-mails from folders or e-mail boxes to PDF, PDF/A and TIFF

The FileConverter is an application, installable as service in MS-Windows (32 and 64bit), to monitor folders and e-mail boxes and automatically convert the contained documents to the PDF, PDF/A or TIFF file format. With that, multiple folders or also MS-Exchange and POP3 mailboxes can be configured and monitored.

The following input-documentformats are supported:

  • DOC, DOCX, RTF, TXT,
  • XLS, XLSX,
  • PPT, PPTX,
  • XFDF, FDF,
  • PNG, BMP, TIF, TIFF, JPG, JPEG
  • ZIP, RAR, 7Z,
  • MSG, EML,
  • PDF,
  • HTM, HTML, MHTML,
  • PMT, PMTX

file format – features:

  • With ZIP/RAR/7Z containers, all containing and supported documents get automatically extracted and converted. The containing folder structure of the container gets build in de output directory.
  • PMT and PMTX – are PDFMerge XML dataformats – which contain hierarchic structure information as well as links to the documents or the documents themself. The FileConverter produces from this files, like the PDFMerge server, a single total PDF file, which is merged from the to PDF converted single documents. The structure defined in the XML gets displayed as PDF-bookmarks.

Conversion:

  • The PDF/TIFF conversion takes place directly without the usage of the source application. So for the processing, no installation of MS-Office or Adobe Acrobat is necessary. Optional, the PDF’s also can be exported in the ISO standardized PDF/A-1b format.
  • In the standard scope also the iOCR engine, for creation of searchable PDF(/A)’s out of PDF or image documents, is implemented. Optional – also Abbyy, the most efficient OCR engine at the moment, can be installed. With the OCR processing, PDF documents get analyzed page by page and only documents which don’t include text information yet get processed (intelligent OCR processing) – this saves resources and increases the quality and the processing speed.

Functions – general:

  • MS-Windows service application for document conversion of MS-Office, PDF, image, HTML, ZIP, MSG and e-mail to PDF, PDF/A or TIFF
  • Multiple folders as well as MS-Exchange and POP3 e-mail boxes can be monitored and processed parallel.
  • Direct conversion without usage of additional necessary source applications (MS-Office, Adobe Acrobat)  or printer drivers.
  • Flattened of filled PDF forms: PDF forms (XFDF,FDF) can be converted into normal PDF documents. The forms either can be deposited fixed or newly loaded every time.
  • Parallel processing with configurable amount of processes – allows the optimal exploitation of the hardware und garants the fast processing.
  • Logging of all conversion instances, forwarding of failed e-mail conversions or sending of error – e-mails via SMTP

In / out folder processing:

  • Processing of files and folders out of configured in / out – folders via time lapse or “ready” file, incl. subfolder processing (one level)
  • Erstellen einer Index-Text-Datei über alle bei einem Verarbeitungsvorgang erzeugten Dateien.
  • After the processing: deleting, moving into archive folder, renaming – of the files or folders (.con / .err)
  • Configuration of the filename extension which shouldn’t be converted – these get ignored and not processed. E-mails with attachments and not identifyable extensions get handled as errors and forwarded to an e-mail address.
  • Single page output with configurable amount of locations for the site index
  • Configuration of the TIFF conversion – compression / color depth / resolution / JPEG-quality
  • extensive parameters for the OCR processing – iOCR or Abbyy – the FileConverter has the same OCR functions as AutoOCR
  • Parameters for the HTML conversion – page size and margins – HTML document and e-mails get scaled automatically.

Processing of e-mail boxes:

  • Processing of POP3 / MS-Exchange e-mail boxes – forwarding  or deleting at successful or incorrect processing, or moving into an archive / error folder under MS-Exchange. Direct access to MS-Exchange 2007/2010/2013 through the SOAP web-service-interface.
  • EML and MSG – body and attachments get converted – generation of the e-mail header information in the body document – from, date, to, subject
  • Output of a XML-file with the processed e-mails with the metadata and file-links – configurable: from, to , cc, bcc, received, subject, body, attachments
  • Output per e-mail in separated subfolders or “flat” in the destination folder.

 

1_FileConverter - general settings - email & folder processing 2_FileConverter - processing options  3_FileConverter - service configuration  4_Fileconverter - SMTP server configuration  5_FileConverter - configuration folder processing  6_FileConverter - configuration e-mail box processing  7_FileConverter - MS-Exchange configuration  8_FileConverter - POP3 configuration  9_FileConverter - TIFF conversion settings  10_FileConverter - OCR settings  11_FileConverter - HTML conversion settings  12_FileConverter - Log

  Download – FileConverter – documents & e-mails to PDF, PDF/A and TIFF >>>

ifresco Profiler – splitting of documents – manual, per page, area-OCR, per barcode

The ifresco Profiler offers easily usable functions to split document stacks in various ways very fast. The following functions are available:

  • Manual split – The site / thumbnail where the document should be splitted gets selected – and by a key combination the document gets splitted at the current page, named automatically and afterwards the new document selected for further split actions.
  • Split by page numbers – With this function the whole document can be splitted by a page number in single documents with the same amount of pages.
  • Split with area OCR – an area gets selected in the preview and via area OCR the text gets recognized – the document gets splitted at this page and the recognized text is used as name.
  • Split by barcode – 1D barcodes get recognized and can be used to split the documents as well as for the file names. 18 different barcodes are supported, orientiation and position on the site doesn’t matter. Sites with barcode can be deleted, filtering by strings, lists and valuation is also supported.

ifresco Profiler – demo plugin – capturing of incoming invoices through Barcode and OCR

There is now a demo plugin for the ifresco Profiler to show, how easy and fast incoming invoices can be captured in the Alfresco ECM/DMS. Incoming invoices captured this way can e.g. in Alfresco get continued processing from IT-Novum through the Alfresco-SAP Integration. We lately presented this solution in an Alfresco-Webinar (next appointment 8.10.2013) together with IT-Novum  and Alfresco.

Functions demo plugin – incoming invoices capturing:

  • Capturing of incoming Invoices through scan, PDF-printerdriver, folders, drag&drop (TIFF, PDF)
  • Manual splitting of document-stacks
  • Barcode recognition with splitting of documents as well as barcode filter-function and deleting of barcode pages.
  • Capturing of metadata with profile mask –  Beleg-ID (“Invoice-ID” / =Barcode), Lieferant (“supplier”), Straße (“street”), PLZ (“postal code”), Ort (“city”), Belegnummer (“invoice number”), Belegdatum(“invoice date”), Rechnungsbetrag (“invoice amount”) – search for supplier number and name through an external XLS table with selection of the linked information – street, postal code, city
  • Area-OCR to adopt values from the shown document into a field.
  • call and capture tags
  • Batch – background processing for PDF-OCR and Alfresco upload
  • AutoOCR integration to store searchable PDF documents into Alfresco
  • Automatic naming or building of the folder structure from the captured metadata – number, company, invoice type, year, invoice number, invoice date

ifresco Demo plugin - Eingangsrechnungserfassung

The demo plugin is to be considered as example and can be adjusted and extended functional as well as from the data model to individual requirements.

Description of the installation and access data for the demo server as well as the process of the working steps >>>

Download – ifresco Profiler base software >>>
Download – ifresco demo plugin >>>
Download – ifresco demo plugin add on >>>

Webshop