Low Cost OCR Server – AutoOCRLight – OCR processing without limits

Starting from our, since many years approved and tested OCR Server – AutoOCR, we now, with “AutoOCRLight”, offer a low cost variant. It has, compared to the AutoOCR full product, a lower price but also a limited functionality.

Differences AutoOCRLight to AutoOCR:

  • only one in / out folder can be configured
  • with iOCR, only one OCR engine is available – the Abbyy OCR engine isn’t supported
  • no  PDF/A support – only PDF snd TXT output
  • no SOAP / REST web-service interface and so no usage of the free AutoOCR additional applications DropOCR, FineOCR and ifresco Transformer.

Advantages / Highlights AutoOCRLight:

  • Installable as Windows Service or as normal application under 32 and 64bit OS
  • Folder – monitoring – new added files automatically get recognized and processed
  • Processes – PDF or image files (TIFF, JPEG) – black&white, grayscale, color
  • iOCR – OCR engine without page limit for generation of searchable PDF or TXT
  • Image prozessing functions for improvement of the source documents – automatic – turn – pageorientation recognizing, straighten, crop edges, remove impurities, remove perforation, remove lines.
  • Intelligent PDF-OCR processing of mixed documents – checked page by page if an OCR processing is necessary.
  • High throughput by parallel processing

1 AutoOCR light - Userinterface  2 AutoOCR light - iOCR Settings  3 AutoOCR light - iOCR - Image processing  4 AutoOCR light Settings  5 AutoOCR light - Processing options  6 AutoOCR light - Archive und error folder configuration  7 AutoOCR light - E-mail configuration for error notifications  8 AutoOCR light - Logging

Download – AutoOCRLight – Low Cost OCR Server >>>

PDF2TIFF – colored-PDF to black&white – TIFF Group 4 – color represantation with grayscale-grid

Normally it’s not easily possible to convert a colored document to black and white because within such a conversion the color information gets lost. Colors only can be displayed in grayscales, but also grayscales are only possible to simulate with a corresponding gridding. So grids are the only possibility to display colors in b&w documents with a little disadvantage – the files can’t be compressed that good – which makes them bigger than usual b&w documents with TIFF Gr. 4 compression.

In the course of a customer project, we developed the PDF2TIFF converter to convert colored PDF’s to monochrome TIFF Gr. 4 documents with quality as high as possible. Colored images and grafics also should, with highest possible quality, stay read- and printable.

Functions PDF2TIFF:

  • Service under MS-Windows (32 and 64bit OS) with folder/subfolder-monitoring on PDF documents
  • Added PDF documents get gridded and stored into the outfolder as b&w-TIFF Gr.4 single pages.
  • Parallel processing for high throughput and optimal usage of the ressources.
  • Configuration of the output resolution and page format (standard- 300dpi / A4) for the created TIFF files, automatic recognition of the page orientation.
  • Deleting or archiving of the PDF´s after succesful processing.
  • Service-configuration – local system account or own user
  • Logging of the processing

1_PDF2TIFF_Folder_config  2_PDF2TIFF_processing_profiles_1  3_PDF2TIFF_processing_profiles_2  4_PDF2TIFF_processing_profiles_3  7_PDF2TIFF_logging  6_PDF2TIFF_processing_options

Comparison PDF2TIFF to the normal b&w conversion without gridding:

Comparison PDF2TIFF to the normal b&w conversion without gridding

Download – PDF2TIFF >>>
Download – PDF2TIFF – PDF Source document in color >>>
Download – PDF2TIFF – TIFF results in comparison >>>

ifresco OpenSource Client for Alfresco available as already installed VM Appliance

To use our ifresco Client for Alfresco with minimal outlay and without the required installation and configuration, there now is a already installed VMWare Appliance.

After the installation only the IP and port of the Alfresco Server has to be inserted an the ifresco Client can already be used together with the Alfresco Server. With an optional update agreement you’ll get access to the constantly actualized versions directly from our SVN repository. With this variant the Alfresco Server is used seperatedly from ifresco. Alternativly we also offer a combined Alfresco Community Edition + ifresco Client Appliance – both Systems already installed, configured and optimized on an Ubuntu 64bit LINUX VM Server.

Pre installed ifresco Client VMWare Appliance

  • Debian LINUX 32bit
  • PHP 5
  • Apache 2.2
  • MySQL 5
  • Installed PHP Extensions – PDO, MB-String, XML, SOAP, Iconv
  • All necessary PHP settings done

Requirements: VMWare Workstation, Player, etc. – min. 2GB RAM, 10GB HD

Optional: SVN Update for users with SW-maintenance

Configurationquery at the first start: Alfresco Server IP, Port. After that, the IP Adresse uner which the ifresco Client is reachable in your browser (optimally: Google Chrome) is shown.

Priceinformations in our web-shop >>>

New Web-Site – www.OCRServer.at – online

We summarized all of our OCR Products on our newly created Website

You can get more information about the following products there:

  • AutoOCR
  • AutoOCR light
  • DropOCR
  • FineOCR
  • ifresco Transformer
  • FileConverter (pro)
  • ifresco Profiler + Plugins

FileConverter – automatically convert documents and e-mails from folders or e-mail boxes to PDF, PDF/A and TIFF

The FileConverter is an application, installable as service in MS-Windows (32 and 64bit), to monitor folders and e-mail boxes and automatically convert the contained documents to the PDF, PDF/A or TIFF file format. With that, multiple folders or also MS-Exchange and POP3 mailboxes can be configured and monitored.

The following input-documentformats are supported:

  • DOC, DOCX, RTF, TXT,
  • XLS, XLSX,
  • PPT, PPTX,
  • XFDF, FDF,
  • PNG, BMP, TIF, TIFF, JPG, JPEG
  • ZIP, RAR, 7Z,
  • MSG, EML,
  • PDF,
  • HTM, HTML, MHTML,
  • PMT, PMTX

file format – features:

  • With ZIP/RAR/7Z containers, all containing and supported documents get automatically extracted and converted. The containing folder structure of the container gets build in de output directory.
  • PMT and PMTX – are PDFMerge XML dataformats – which contain hierarchic structure information as well as links to the documents or the documents themself. The FileConverter produces from this files, like the PDFMerge server, a single total PDF file, which is merged from the to PDF converted single documents. The structure defined in the XML gets displayed as PDF-bookmarks.

Conversion:

  • The PDF/TIFF conversion takes place directly without the usage of the source application. So for the processing, no installation of MS-Office or Adobe Acrobat is necessary. Optional, the PDF’s also can be exported in the ISO standardized PDF/A-1b format.
  • In the standard scope also the iOCR engine, for creation of searchable PDF(/A)’s out of PDF or image documents, is implemented. Optional – also Abbyy, the most efficient OCR engine at the moment, can be installed. With the OCR processing, PDF documents get analyzed page by page and only documents which don’t include text information yet get processed (intelligent OCR processing) – this saves resources and increases the quality and the processing speed.

Functions – general:

  • MS-Windows service application for document conversion of MS-Office, PDF, image, HTML, ZIP, MSG and e-mail to PDF, PDF/A or TIFF
  • Multiple folders as well as MS-Exchange and POP3 e-mail boxes can be monitored and processed parallel.
  • Direct conversion without usage of additional necessary source applications (MS-Office, Adobe Acrobat)  or printer drivers.
  • Flattened of filled PDF forms: PDF forms (XFDF,FDF) can be converted into normal PDF documents. The forms either can be deposited fixed or newly loaded every time.
  • Parallel processing with configurable amount of processes – allows the optimal exploitation of the hardware und garants the fast processing.
  • Logging of all conversion instances, forwarding of failed e-mail conversions or sending of error – e-mails via SMTP

In / out folder processing:

  • Processing of files and folders out of configured in / out – folders via time lapse or “ready” file, incl. subfolder processing (one level)
  • Erstellen einer Index-Text-Datei über alle bei einem Verarbeitungsvorgang erzeugten Dateien.
  • After the processing: deleting, moving into archive folder, renaming – of the files or folders (.con / .err)
  • Configuration of the filename extension which shouldn’t be converted – these get ignored and not processed. E-mails with attachments and not identifyable extensions get handled as errors and forwarded to an e-mail address.
  • Single page output with configurable amount of locations for the site index
  • Configuration of the TIFF conversion – compression / color depth / resolution / JPEG-quality
  • extensive parameters for the OCR processing – iOCR or Abbyy – the FileConverter has the same OCR functions as AutoOCR
  • Parameters for the HTML conversion – page size and margins – HTML document and e-mails get scaled automatically.

Processing of e-mail boxes:

  • Processing of POP3 / MS-Exchange e-mail boxes – forwarding  or deleting at successful or incorrect processing, or moving into an archive / error folder under MS-Exchange. Direct access to MS-Exchange 2007/2010/2013 through the SOAP web-service-interface.
  • EML and MSG – body and attachments get converted – generation of the e-mail header information in the body document – from, date, to, subject
  • Output of a XML-file with the processed e-mails with the metadata and file-links – configurable: from, to , cc, bcc, received, subject, body, attachments
  • Output per e-mail in separated subfolders or “flat” in the destination folder.

 

1_FileConverter - general settings - email & folder processing 2_FileConverter - processing options  3_FileConverter - service configuration  4_Fileconverter - SMTP server configuration  5_FileConverter - configuration folder processing  6_FileConverter - configuration e-mail box processing  7_FileConverter - MS-Exchange configuration  8_FileConverter - POP3 configuration  9_FileConverter - TIFF conversion settings  10_FileConverter - OCR settings  11_FileConverter - HTML conversion settings  12_FileConverter - Log

  Download – FileConverter – documents & e-mails to PDF, PDF/A and TIFF >>>

ifresco Profiler – splitting of documents – manual, per page, area-OCR, per barcode

The ifresco Profiler offers easily usable functions to split document stacks in various ways very fast. The following functions are available:

  • Manual split – The site / thumbnail where the document should be splitted gets selected – and by a key combination the document gets splitted at the current page, named automatically and afterwards the new document selected for further split actions.
  • Split by page numbers – With this function the whole document can be splitted by a page number in single documents with the same amount of pages.
  • Split with area OCR – an area gets selected in the preview and via area OCR the text gets recognized – the document gets splitted at this page and the recognized text is used as name.
  • Split by barcode – 1D barcodes get recognized and can be used to split the documents as well as for the file names. 18 different barcodes are supported, orientiation and position on the site doesn’t matter. Sites with barcode can be deleted, filtering by strings, lists and valuation is also supported.

ifresco Profiler – demo plugin – capturing of incoming invoices through Barcode and OCR

There is now a demo plugin for the ifresco Profiler to show, how easy and fast incoming invoices can be captured in the Alfresco ECM/DMS. Incoming invoices captured this way can e.g. in Alfresco get continued processing from IT-Novum through the Alfresco-SAP Integration. We lately presented this solution in an Alfresco-Webinar (next appointment 8.10.2013) together with IT-Novum  and Alfresco.

Functions demo plugin – incoming invoices capturing:

  • Capturing of incoming Invoices through scan, PDF-printerdriver, folders, drag&drop (TIFF, PDF)
  • Manual splitting of document-stacks
  • Barcode recognition with splitting of documents as well as barcode filter-function and deleting of barcode pages.
  • Capturing of metadata with profile mask –  Beleg-ID (“Invoice-ID” / =Barcode), Lieferant (“supplier”), Straße (“street”), PLZ (“postal code”), Ort (“city”), Belegnummer (“invoice number”), Belegdatum(“invoice date”), Rechnungsbetrag (“invoice amount”) – search for supplier number and name through an external XLS table with selection of the linked information – street, postal code, city
  • Area-OCR to adopt values from the shown document into a field.
  • call and capture tags
  • Batch – background processing for PDF-OCR and Alfresco upload
  • AutoOCR integration to store searchable PDF documents into Alfresco
  • Automatic naming or building of the folder structure from the captured metadata – number, company, invoice type, year, invoice number, invoice date

ifresco Demo plugin - Eingangsrechnungserfassung

The demo plugin is to be considered as example and can be adjusted and extended functional as well as from the data model to individual requirements.

Description of the installation and access data for the demo server as well as the process of the working steps >>>

Download – ifresco Profiler base software >>>
Download – ifresco demo plugin >>>
Download – ifresco demo plugin add on >>>

ifresco Profiler – scan, edit, OCR, barcode, capture metadata, Alfresco integration

The ifresco Profiler provides important, easy-to-use page oriented document processing functions for PDF and image document, on any workplace. It enables the possibility of quickly and easily storing documents with metadata with individual and specific profiling masks in Alfresco as searchable PDF’s. Area OCR with an integrated OCR Engine, creating of searchable PDF’s on export with the integrated OCR or extern AutoOCR Server, barcode recognition for file names and document split, export to folder, as e-mail attachment or via installed plugin’s to Alfresco together with metadata, are some essential features of the Software.

The application consists of 2 components – the profiler basic software, which contains all general functions, and one or more installable plugins. These plugins represent the interface to Alfresco and allow to use individually to the requirements and field of application adjusted profiling masks. The complete logic for the metadata, filing structure and naming is displayed in a plugin.

ifresco Profiler base:

  • Processes PDF and image files – black & white, grayscale, color – without having to keep an eye on file format and color – all functions are implemented across.
  • Integrated scan function to scan documents via local connected scanners. The scan settings can be chosen directly by preconfigured scan-profiles.
  • Capturing of documents out of folders – display as document list e.g. for multifunction devices, network scanners wiht scan to folder function or via printer driver created, also as to process via e-mail received documents.
  • Quick changing of the document names – with automatic selection of the next file in the list, after finish of the change.
  • Area OCR by local integrated OCR engine, to assign file names.
  • Deleting / cutting out areas
  • Page preview – zoom, paging, turning – as well as thumbnail miniatures of the whole document
  • Page oriented document processing – turn pages left/right, delete pages, page moving in the thumbnail view via drag&drop.
  • split total document – at marked pages, after x pages, after barcode.
  • Merge single documents to a total one – specifying the order, automatic deleting of the single documents.
  • Export – into a folder, send as e-mail attachment or store into Alfresco with metadata via profiling – in the native format as PDF image or PDF-OCR
  • Within the export – generating of searchable PDF-OCR documents by the local integrated iOCR engine or by the via web-service integrated AutoOCR Server with Abbyy OCR
  • Intelligent OCR processing – only image pages get OCR processed – normal PDF pages get taken over without changes .

ifresco Profiler plugins:

The profile form and the logic of the profiling for the deposit of the documents in Alfresco gets realized from the ifresco Profiler via plugins. Because every company has it’s own data model and deposit logic, the plugins get developed and implemented individually after specifications. Here it’s possible to fall back to already realized plugins. For testing and illustration of the possibilities there is a demo plugin as well as already realized plugins available.

  • installable plugins – for profiling and capturing of metadata for filing of documents in Alfresco.
  • One or more plugins can be installed, chosen and with that, also switched to other Alfresco servers – each plugin contains its own individual logic for the profiling as stand-alone installed .NET / C# application which inserts itself into the ifresco Profiler base framework and uses its functions.
  • parallel displaying of the profile mask and the document preview with the capturing of the metadata.
  • free programmable logic and functions of the profiling mask with e.g. extern XML template rules with dynamic fields to always build the name / title the same, access to external data sources – MS-XLS, SQL, web-service (e.g. SugarCRM), linked tables and pre-assignment of fields with values of the table, type ahead part-string search over single or combined fields, usage of Alfresco categories as lookup’s, assignment of existing Alfresco tags, automatic new applyment of tags, automatic creation of the Alfresco folder structure as well as the file names out of profile field values, searching for folders in Alfresco, counter via web-service, stamping of the document with informations from the metadata before the upload, searching for in Alfresco available documents and takeover of profile values and so on.
  • interactive processing – with OCR and upload or alternative
  • background / batch processing – for PDF-OCR conversion and Alfresco upload – the user is already able to continue working while the OCR processing and the Alfresco upload takes place in the background.
  • preserve existing profile values / delete mask
  • automatic loading of the next document in the list – processed document gets deleted ore moved into the archiving area after upload.

Download ifresco Profiler >>>

 

Intelligent PDF OCR processing via AutoOCR for Abbyy and iOCR

PDF documents can be generated in different ways. PDFs are able to summarize various contents and sources in one document. Pages can be constructed from “normal” PDF content consisting of text, images, and vector graphics, and typically already have textual content that can be used for full text indexing and search. However, a PDF document can also contain scanned pages in black and white or color. Such pages or documents must undergo OCR recognition to insert the textual information for indexing and searching.

So there are certain PDF documents which either should not be subjected to any OCR processing, or only individual pages or all of them have to be processed because they were generated by a scan process.

Normally, all these types of PDF documents occur in business processes and the user can not distinguish whether or not a document needs to become OCR – viewed from the outside via the Adobe Reader or on the printer, this can not be immediately recognized and distinguished.

If you would generally process every PDF document / page in the same way, regardless of how they are structured and whether an OCR processing makes sense or not, there would be some disadvantages:

Each PDF page is “rasterized” again, regardless of the structure and content, ie converted into an image and then processed OCR. This is like printing the document, scanning it again and then subjecting it to OCR processing. This gives you a picture from a “normal” PDF page with underlying text recognized by the OCR engine.

  • the quality is not the same as before
  • the documents become bigger
  • special PDF properties are lost (bookmarks, links, etc.)
  • Processing time and resources are consumed
  • OCR page licenses are consumed unnecessarily

A PDF OCR processing should therefore be “intelligent”, so that in the process and by the user does not have to decide with difficulty whether a PDF document must be subjected to OCR processing or not. Even more difficult is when a single PDF document consists of mixed normal and scanned parts.

That’s why we’ve integrated intelligent OCR processing into AutoOCR, which works in the same way with both the Abbyy and the iOCR OCR engine. This can be controlled per input folder or for the web service interface via the OCR profile and is available for both PDF> PDF and PDF> TXT processing.

AutoOCR Abbyy - Intelligente OCR Verarbeitung  iOCR - intelligente PDF Verarbeitung

Highlights – Intelligent PDF OCR processing:

  • works for both PDF> PDF and PDF> TXT processing
  • for the Abbyy OCR and iOCR engine
  • at the folder as well as for the web service processing
  • the PDF document as well as every single page are analyzed and only those pages OCR are processed that do not contain any text – these are usually scanned pages that have not yet been processed by OCR.
  • existing normal PDF documents and pages are taken over unchanged and not processed
  • OCRed documents and pages are not processed again.
  • in the case of PDF> TXT processing, the text is extracted from the normal PDF pages and OCR is only performed on pages without text.
  • PDF functions and bookmarks are retained and are included in the target document.
  • Saves processing time and Abbyy OCR page licenses
  • the files are not enlarged
  • the quality of the PDF pages is preserved.

The “intelligent PDF OCR processing” can be found in addition to AutoOCR in all other of our software products that support OCR processing z.b. ifresco Profiler, FileConverter, DropOCR, PDFMerge, etc.