Wolfgang May – Page 37 – PDF News – PDF/A, Archivierung, OCR, DMS, Dokumentenmanagment, Scan to PDF, ECM, PDF Convert, Free PDF printerdriver, freier PDF Druckertreiber, SDK, API, PDF softwaredevelopment

Author: Wolfgang May

ifresco AutoOCR – JavaScript Binding for Alfresco

2013-09-13

Alfresco and AutoOCR are, with the installation of the AMP’s, integrated through a REST web-service interface. Server-based JavaScript offers an easy, flexible and quickly implemented Possibility to expand and adjust Alfresco functions.

JavaScripts can be initiated timed as batch processes to e.g. process a bigger amount of documents in the background. But they also can be called by the client e.g. Alfresco Share, to be used as document-actions for single or multiple documents.

The JavaScript Binding of the AutoOCR functions allows direct access to the AutoOCR service from Alfresco scripts. In Repository JavaScripts (WebScript-controller scripts, scripted actions) all functions of the AutoOCR API can be called. This API is completely independent from the integration of the AutoOCR-service as Alfresco-transformer. It gives the possibility of using OCR functions out of JavaScripts which, deposited in Alfresco, are executed directly on the server.

Download – Documentation JavaScript Binding for Alfresco >>>
Download – extensive demo script >>>

ifresco Client & Alfresco 4.2 CE – Testsystem online

2013-09-06

In case of testing the current version of our ifresco Client or ifresco Profiler you can use our Testserver:

ifresco – http://testalf.may.co.at/login
Alfresco Share – http://testalf.may.co.at:8080/share
ifresco Profiler plugin URL: http://testalf.may.co.at:8080/alfresco

FTP: ftp://testalf.may.co.at
WebDAV: http://testalf.may.co.at:8080/alfresco/webdav
User: admin / Password: admin

Low Cost OCR Server – AutoOCRLight – OCR processing without limits

2013-09-06

Starting from our, since many years approved and tested OCR Server – AutoOCR, we now, with “AutoOCRLight”, offer a low cost variant. It has, compared to the AutoOCR full product, a lower price but also a limited functionality.

Differences AutoOCRLight to AutoOCR:

only one in / out folder can be configured
with iOCR, only one OCR engine is available – the Abbyy OCR engine isn’t supported
no PDF/A support – only PDF snd TXT output
no SOAP / REST web-service interface and so no usage of the free AutoOCR additional applications DropOCR, FineOCR and ifresco Transformer.

Advantages / Highlights AutoOCRLight:

Installable as Windows Service or as normal application under 32 and 64bit OS
Folder – monitoring – new added files automatically get recognized and processed
Processes – PDF or image files (TIFF, JPEG) – black&white, grayscale, color
iOCR – OCR engine without page limit for generation of searchable PDF or TXT
Image prozessing functions for improvement of the source documents – automatic – turn – pageorientation recognizing, straighten, crop edges, remove impurities, remove perforation, remove lines.
Intelligent PDF-OCR processing of mixed documents – checked page by page if an OCR processing is necessary.
High throughput by parallel processing

Download – AutoOCRLight – Low Cost OCR Server >>>

PDF2TIFF – colored-PDF to black&white – TIFF Group 4 – color represantation with grayscale-grid

2013-09-06

Normally it’s not easily possible to convert a colored document to black and white because within such a conversion the color information gets lost. Colors only can be displayed in grayscales, but also grayscales are only possible to simulate with a corresponding gridding. So grids are the only possibility to display colors in b&w documents with a little disadvantage – the files can’t be compressed that good – which makes them bigger than usual b&w documents with TIFF Gr. 4 compression.

In the course of a customer project, we developed the PDF2TIFF converter to convert colored PDF’s to monochrome TIFF Gr. 4 documents with quality as high as possible. Colored images and grafics also should, with highest possible quality, stay read- and printable.

Functions PDF2TIFF:

Service under MS-Windows (32 and 64bit OS) with folder/subfolder-monitoring on PDF documents
Added PDF documents get gridded and stored into the outfolder as b&w-TIFF Gr.4 single pages.
Parallel processing for high throughput and optimal usage of the ressources.
Configuration of the output resolution and page format (standard- 300dpi / A4) for the created TIFF files, automatic recognition of the page orientation.
Deleting or archiving of the PDF´s after succesful processing.
Service-configuration – local system account or own user
Logging of the processing

Comparison PDF2TIFF to the normal b&w conversion without gridding:

Download – PDF2TIFF >>>
Download – PDF2TIFF – PDF Source document in color >>>
Download – PDF2TIFF – TIFF results in comparison >>>

ifresco OpenSource client for Alfresco – Download now at GitHub

2013-09-06

The current version to download of our ifresco client for Alfresco is now always found at GitHub under – https://github.com/XKEYGmbH

ifresco OpenSource Client for Alfresco available as already installed VM Appliance

2013-09-06

To use our ifresco Client for Alfresco with minimal outlay and without the required installation and configuration, there now is a already installed VMWare Appliance.

After the installation only the IP and port of the Alfresco Server has to be inserted an the ifresco Client can already be used together with the Alfresco Server. With an optional update agreement you’ll get access to the constantly actualized versions directly from our SVN repository. With this variant the Alfresco Server is used seperatedly from ifresco. Alternativly we also offer a combined Alfresco Community Edition + ifresco Client Appliance – both Systems already installed, configured and optimized on an Ubuntu 64bit LINUX VM Server.

Pre installed ifresco Client VMWare Appliance

Debian LINUX 32bit
PHP 5
Apache 2.2
MySQL 5
Installed PHP Extensions – PDO, MB-String, XML, SOAP, Iconv
All necessary PHP settings done

Requirements: VMWare Workstation, Player, etc. – min. 2GB RAM, 10GB HD

Optional: SVN Update for users with SW-maintenance

Configurationquery at the first start: Alfresco Server IP, Port. After that, the IP Adresse uner which the ifresco Client is reachable in your browser (optimally: Google Chrome) is shown.

Priceinformations in our web-shop >>>

New Web-Site – www.OCRServer.at – online

2013-09-05

We summarized all of our OCR Products on our newly created Website

You can get more information about the following products there:

AutoOCR
AutoOCR light
DropOCR
FineOCR
ifresco Transformer
FileConverter (pro)
ifresco Profiler + Plugins

FileConverter – automatically convert documents and e-mails from folders or e-mail boxes to PDF, PDF/A and TIFF

2013-08-29

The FileConverter is an application, installable as service in MS-Windows (32 and 64bit), to monitor folders and e-mail boxes and automatically convert the contained documents to the PDF, PDF/A or TIFF file format. With that, multiple folders or also MS-Exchange and POP3 mailboxes can be configured and monitored.

The following input-documentformats are supported:

DOC, DOCX, RTF, TXT,
XLS, XLSX,
PPT, PPTX,
XFDF, FDF,
PNG, BMP, TIF, TIFF, JPG, JPEG
ZIP, RAR, 7Z,
MSG, EML,
PDF,
HTM, HTML, MHTML,
PMT, PMTX

file format – features:

With ZIP/RAR/7Z containers, all containing and supported documents get automatically extracted and converted. The containing folder structure of the container gets build in de output directory.
PMT and PMTX – are PDFMerge XML dataformats – which contain hierarchic structure information as well as links to the documents or the documents themself. The FileConverter produces from this files, like the PDFMerge server, a single total PDF file, which is merged from the to PDF converted single documents. The structure defined in the XML gets displayed as PDF-bookmarks.

Conversion:

The PDF/TIFF conversion takes place directly without the usage of the source application. So for the processing, no installation of MS-Office or Adobe Acrobat is necessary. Optional, the PDF’s also can be exported in the ISO standardized PDF/A-1b format.
In the standard scope also the iOCR engine, for creation of searchable PDF(/A)’s out of PDF or image documents, is implemented. Optional – also Abbyy, the most efficient OCR engine at the moment, can be installed. With the OCR processing, PDF documents get analyzed page by page and only documents which don’t include text information yet get processed (intelligent OCR processing) – this saves resources and increases the quality and the processing speed.

Functions – general:

MS-Windows service application for document conversion of MS-Office, PDF, image, HTML, ZIP, MSG and e-mail to PDF, PDF/A or TIFF
Multiple folders as well as MS-Exchange and POP3 e-mail boxes can be monitored and processed parallel.
Direct conversion without usage of additional necessary source applications (MS-Office, Adobe Acrobat) or printer drivers.
Flattened of filled PDF forms: PDF forms (XFDF,FDF) can be converted into normal PDF documents. The forms either can be deposited fixed or newly loaded every time.
Parallel processing with configurable amount of processes – allows the optimal exploitation of the hardware und garants the fast processing.
Logging of all conversion instances, forwarding of failed e-mail conversions or sending of error – e-mails via SMTP

In / out folder processing:

Processing of files and folders out of configured in / out – folders via time lapse or “ready” file, incl. subfolder processing (one level)
Erstellen einer Index-Text-Datei über alle bei einem Verarbeitungsvorgang erzeugten Dateien.
After the processing: deleting, moving into archive folder, renaming – of the files or folders (.con / .err)
Configuration of the filename extension which shouldn’t be converted – these get ignored and not processed. E-mails with attachments and not identifyable extensions get handled as errors and forwarded to an e-mail address.
Single page output with configurable amount of locations for the site index
Configuration of the TIFF conversion – compression / color depth / resolution / JPEG-quality
extensive parameters for the OCR processing – iOCR or Abbyy – the FileConverter has the same OCR functions as AutoOCR
Parameters for the HTML conversion – page size and margins – HTML document and e-mails get scaled automatically.

Processing of e-mail boxes:

Processing of POP3 / MS-Exchange e-mail boxes – forwarding or deleting at successful or incorrect processing, or moving into an archive / error folder under MS-Exchange. Direct access to MS-Exchange 2007/2010/2013 through the SOAP web-service-interface.
EML and MSG – body and attachments get converted – generation of the e-mail header information in the body document – from, date, to, subject
Output of a XML-file with the processed e-mails with the metadata and file-links – configurable: from, to , cc, bcc, received, subject, body, attachments
Output per e-mail in separated subfolders or “flat” in the destination folder.

Download – FileConverter – documents & e-mails to PDF, PDF/A and TIFF >>>

ifresco Profiler – splitting of documents – manual, per page, area-OCR, per barcode

2013-08-28

The ifresco Profiler offers easily usable functions to split document stacks in various ways very fast. The following functions are available:

Manual split – The site / thumbnail where the document should be splitted gets selected – and by a key combination the document gets splitted at the current page, named automatically and afterwards the new document selected for further split actions.
Split by page numbers – With this function the whole document can be splitted by a page number in single documents with the same amount of pages.
Split with area OCR – an area gets selected in the preview and via area OCR the text gets recognized – the document gets splitted at this page and the recognized text is used as name.
Split by barcode – 1D barcodes get recognized and can be used to split the documents as well as for the file names. 18 different barcodes are supported, orientiation and position on the site doesn’t matter. Sites with barcode can be deleted, filtering by strings, lists and valuation is also supported.

ifresco Profiler – demo plugin – capturing of incoming invoices through Barcode and OCR

2013-08-06

There is now a demo plugin for the ifresco Profiler to show, how easy and fast incoming invoices can be captured in the Alfresco ECM/DMS. Incoming invoices captured this way can e.g. in Alfresco get continued processing from IT-Novum through the Alfresco-SAP Integration. We lately presented this solution in an Alfresco-Webinar (next appointment 8.10.2013) together with IT-Novum and Alfresco.

Functions demo plugin – incoming invoices capturing:

Capturing of incoming Invoices through scan, PDF-printerdriver, folders, drag&drop (TIFF, PDF)
Manual splitting of document-stacks
Barcode recognition with splitting of documents as well as barcode filter-function and deleting of barcode pages.
Capturing of metadata with profile mask – Beleg-ID (“Invoice-ID” / =Barcode), Lieferant (“supplier”), Straße (“street”), PLZ (“postal code”), Ort (“city”), Belegnummer (“invoice number”), Belegdatum(“invoice date”), Rechnungsbetrag (“invoice amount”) – search for supplier number and name through an external XLS table with selection of the linked information – street, postal code, city
Area-OCR to adopt values from the shown document into a field.
call and capture tags
Batch – background processing for PDF-OCR and Alfresco upload
AutoOCR integration to store searchable PDF documents into Alfresco
Automatic naming or building of the folder structure from the captured metadata – number, company, invoice type, year, invoice number, invoice date

The demo plugin is to be considered as example and can be adjusted and extended functional as well as from the data model to individual requirements.

Description of the installation and access data for the demo server as well as the process of the working steps >>>

Download – ifresco Profiler base software >>>
Download – ifresco demo plugin >>>
Download – ifresco demo plugin add on >>>