September 2013 – PDF News – PDF/A, Archivierung, OCR, DMS, Dokumentenmanagment, Scan to PDF, ECM, PDF Convert, Free PDF printerdriver, freier PDF Druckertreiber, SDK, API, PDF softwaredevelopment

Month: September 2013

ZUGFeRD – eDocPrintPro PDF/A-3 printer driver for electronic invoices available

2013-09-26

The ZUGFeRD data format is a, based on the PDF/A level 3, format for electronic invoices in germany. Thereby the PDF document is used for archiving, print and visual representation and the data gets additionally embedded as XML. The XML contains sector-neutral information and metadata for the invoice.

For a ZUGFeRD invoice also a PDF/A-3 has to be created as well as a XML file has to be available. The XML usually gets created from the ERP. The ZUGFeRD eDoc printer driver allows with a simple print process to create such invoices out of any application.

Possible courses: the XML is available in any cases, the print process is done via the ZUGFeRD eDoc driver:

interactive choice of the XML file via file dialog
“silent” processing of the XML with pre defined path and file name.
XML can also be deleted after embedding automatically to create a defined state.
Starting of the print process via the eDoc SDK, the destinaion path and names of the PDF’s is adjusted through the SDK, the document printed, the XML gets embedded automatically and afterwards an event about the finishing of the process is passed back to the application.

ZUGFeRD – customized – specifically interesting for software-developer:

The ZUGFeRD eDocPrintPro PDF/A-3 printer driver is specifically also for developers of software solutions because with it the creation of ZUGFeRD conform electronic invoices can be implemented fast and easily. The software only has to create the ZUGFeRD conform XML file, the rest is done by the ZUGFeRD eDoc printer driver. Software provider can implement the solution with their name and use it without any additional license costs (royalty-free).

Function range ZUGFeRD – customized:

ZUGFeRD – eDocPrintPro PDF/A-3 printer driver + setup with own name / logo / links for the lizensefree (royalty-free) usage together with the own software solution
32 and 64bit version – for MS-Windows XP / 7 / 8 / MS-Windows Server 2008 / CITRIX and MS-Terminalserver
eDocPrintPro SDK – to be able to automate the print process and to integrate it in the own application.
ZUGFeRD – XML extractor – to be able to extract the XML file from the PDF (C# .NET or Commandline Tool)

ZUGFeRD PDF Druckertreiber

Download – eDocPrintPro ZUGFeRD – PDF/A-3 printer driver >>>
Download – ZUGFeRD – sample invoice >>>
Download – ZUGFeRD – sample XML >>>

ZUGFeRD – information package >>>

Overview of the PDF/A standards

2013-09-18

The document format PDF got developed by the company Adobe in the early 90’s, on the base of the page description language “Postscript”. At first it was a proprietary but disclosed file format and in in 2008 submitted to the ISO and since them builds, in version 1.7, the ISO standard 32000.

PDF/A – The PDF for archiving:

PDF/A is the appellation for the ISO norm 19005 and defines a standard document format for the long term archiving of electronical documents. The norm ensures which PDF function have to be contained or not to archive documents in the long term.

Important: The PDF/A standard is “constitutive” – if a document is PDF/A-1 conform it is automatically also covered in the PDF/A-2 and PDF/A-3 standard – the higher standards allow more PDF functions. But there is no “better” and “worse” PDF/A level but you take the required level and standards to assign the required functions.

PDF/A-1 (since 2006)

For PDF/A-1 there are 2 levels:

PDF/A-1b: basic – this one is for the explicit visual peproducability of PDF/A documents.
PDF/A-1a: accesible – like 1b – but has to also include the content structuring of the document (tagged PDF) – this level can’t be created automated through direct conversion, scan, OCR or printer drivers – technically yes but the content structuring usually has to be created and completed manually already in the source application.

PDF/A-2 (since Juni 2011)

For PDF/A-2 there are 3 levels:

PDF/A-2b: basic – consistent with the 1b – with extensions of the level 2
PDF/A-2a: accessible – consistent with the 1a – with extensions of the level 2
PDF/A-2u: unicode – hierzu gibt es keine Entsprechung im Level 1 – entspricht dem Level 2b – jedoch muss der eingebettete Text im UniCode Standard abgebildet sein.

Extensions compared to PDF/A-1 :

JPEG2000 compression
Transparency
Layers
OpenType-font
digital signatures as PAdes (PDF Advanced Electronic Signatures)
Container: PDF/A-1 files can be implemented in PDF/A-2 files
the page limit got extende to 381 x 381 km

PDF/A-3 (since October 2012)

The essential extensions of the PDF/A level 3 is, that it is possible to embed any files into the PDF/A. With that, for the archiving, a PDF file can be combined with the archiving of the source file, for searching, displaying and printing. Would you only archive the PDF file for a MS-EXCEL, eventually important additional informations like the formulas which it’s based on, would get lost. The embedded (source) files can be extracted from the PDF at any time.

More ISO normalized PDF standards are:

PDF/E – PDF for Engineering: ISO 24517 – PDF/E-documents implement: Layers for installation- and construction plans as well as three-dimensional models inclusive predefined 3D-views.
PDF/H (Healthcare) – PDF in the health system (best practice) for the diagnostics by imaging and for the storage of patients data and medical reports.
PDF/X (Exchange) für Druckvorlagen: ISO 15929 / 15930 – The PDF/X-standard got developed for the exchange of announcement data for newspapers as well as for the transfering of print models and jobs. PDF/X is available in the following levels: 1a, 2, 3, 4, 5, 5g, 5gp, 5n
PDF/UA (Universal Accessibility) – ISO 14289 – for universal accessible documents, z. B. as reading help for visually handicapped people.
PDF/VT (Variable Transactional) – ISO 16612-2 – for the “printing of variables or transactional document contents”.
PDF Level 1,7 – ISO 32000: The ISO has approved the Portable Document Format (PDF) 1.7 as international standard.

GhostScript 9.10 – base for eDocPrintPro from version 3.19.0 on

2013-09-17

From the version 3.19.0 of eDocPrintPro on, only the current version 9.10 of GhostScript is used. The eDocPrintPro setup detects if the required version is already installed and if not automatically downloads GS from our FTP server and installs it. For that, an active internet connection as well as the authorization to do a FTP download is required. If this isn’t possible the GhostScript setup has to be downloaded and installed manually before.

GhostScript 9.10 setup:

Download – GhostScript 9.10 MSI Setup – 32bit (ca. 16MB) >>>
Download – GhostScript 9.10 MSI Setup – 64bit (ca.16MB) >>>

ifresco AutoOCR – Version 1.18 available

2013-09-13

With the Version 1.18 of ifresco AutoOCR – the OCR server integration for Alfresco, there are new functions and extensions:

implementation of the new paging API for the Jobs-list of the AutoOCR server – page browsing (back/forth), deleting of all jobs, deleting older than x days, sort jobs, select jobs by date.
free configurable run-time transformer. File-, as well as Pipe-IO based commandline tools can be used to configure additional transformers.
Like the commandline based run-time transformators, also Transformer can be used through JavaScripts.
AutoOCR Content Model extension for the OCR status (aspect) gets installed to be able to deposit and request the OCR status of a file as metadata.
The optional ifresco Tools AMP – allows the background OCR processing in defined intervals for the primary processing of existing document collections or for the following processing of the newly added documents. The detection of the documents which should be processed, as well as the processing itself happens via JavaScripts, which are executed, on the server, batch oriented and timed in the background. Thereby also additional Alfresco Share – document actions can be configured and executed through JavaScript e.g. to convert the chosen PDF and image documents to searchable PDF(/A)’s through the AutoOCR server and automatically replace the input files with them. With the ifresco Tools there are, through JavaScripts, AutoOCR functions independend from the configured Alfresco transformer available, for the mass-batch- as well as the interactive single processing.

AMP of the version 1.18 are available for the following Alfresco versions: 4.0.1 EE, 4.0.2 EE, 4.0d CE, 4.1.1 EE, 4.1.2 EE, 4.1.3 EE, 4.1.4 EE, 4.2b CE, 4.2c CE
AMP of the ifresco Tools 1.1 for: 4.2c CE, 4.2d CE

Download – ifresco AutoOCR – Runtime Transformer description >>>
Download – ifresco AutoOCR – Transformer through JavaScript description>>>
Download – ifresco AutoOCR – Example JavaScript Transformer >>>

ifresco AutoOCR – JavaScript Binding for Alfresco

2013-09-13

Alfresco and AutoOCR are, with the installation of the AMP’s, integrated through a REST web-service interface. Server-based JavaScript offers an easy, flexible and quickly implemented Possibility to expand and adjust Alfresco functions.

JavaScripts can be initiated timed as batch processes to e.g. process a bigger amount of documents in the background. But they also can be called by the client e.g. Alfresco Share, to be used as document-actions for single or multiple documents.

The JavaScript Binding of the AutoOCR functions allows direct access to the AutoOCR service from Alfresco scripts. In Repository JavaScripts (WebScript-controller scripts, scripted actions) all functions of the AutoOCR API can be called. This API is completely independent from the integration of the AutoOCR-service as Alfresco-transformer. It gives the possibility of using OCR functions out of JavaScripts which, deposited in Alfresco, are executed directly on the server.

Download – Documentation JavaScript Binding for Alfresco >>>
Download – extensive demo script >>>

ifresco Client & Alfresco 4.2 CE – Testsystem online

2013-09-06

In case of testing the current version of our ifresco Client or ifresco Profiler you can use our Testserver:

ifresco – http://testalf.may.co.at/login
Alfresco Share – http://testalf.may.co.at:8080/share
ifresco Profiler plugin URL: http://testalf.may.co.at:8080/alfresco

FTP: ftp://testalf.may.co.at
WebDAV: http://testalf.may.co.at:8080/alfresco/webdav
User: admin / Password: admin

Low Cost OCR Server – AutoOCRLight – OCR processing without limits

2013-09-06

Starting from our, since many years approved and tested OCR Server – AutoOCR, we now, with “AutoOCRLight”, offer a low cost variant. It has, compared to the AutoOCR full product, a lower price but also a limited functionality.

Differences AutoOCRLight to AutoOCR:

only one in / out folder can be configured
with iOCR, only one OCR engine is available – the Abbyy OCR engine isn’t supported
no PDF/A support – only PDF snd TXT output
no SOAP / REST web-service interface and so no usage of the free AutoOCR additional applications DropOCR, FineOCR and ifresco Transformer.

Advantages / Highlights AutoOCRLight:

Installable as Windows Service or as normal application under 32 and 64bit OS
Folder – monitoring – new added files automatically get recognized and processed
Processes – PDF or image files (TIFF, JPEG) – black&white, grayscale, color
iOCR – OCR engine without page limit for generation of searchable PDF or TXT
Image prozessing functions for improvement of the source documents – automatic – turn – pageorientation recognizing, straighten, crop edges, remove impurities, remove perforation, remove lines.
Intelligent PDF-OCR processing of mixed documents – checked page by page if an OCR processing is necessary.
High throughput by parallel processing

Download – AutoOCRLight – Low Cost OCR Server >>>

PDF2TIFF – colored-PDF to black&white – TIFF Group 4 – color represantation with grayscale-grid

2013-09-06

Normally it’s not easily possible to convert a colored document to black and white because within such a conversion the color information gets lost. Colors only can be displayed in grayscales, but also grayscales are only possible to simulate with a corresponding gridding. So grids are the only possibility to display colors in b&w documents with a little disadvantage – the files can’t be compressed that good – which makes them bigger than usual b&w documents with TIFF Gr. 4 compression.

In the course of a customer project, we developed the PDF2TIFF converter to convert colored PDF’s to monochrome TIFF Gr. 4 documents with quality as high as possible. Colored images and grafics also should, with highest possible quality, stay read- and printable.

Functions PDF2TIFF:

Service under MS-Windows (32 and 64bit OS) with folder/subfolder-monitoring on PDF documents
Added PDF documents get gridded and stored into the outfolder as b&w-TIFF Gr.4 single pages.
Parallel processing for high throughput and optimal usage of the ressources.
Configuration of the output resolution and page format (standard- 300dpi / A4) for the created TIFF files, automatic recognition of the page orientation.
Deleting or archiving of the PDF´s after succesful processing.
Service-configuration – local system account or own user
Logging of the processing

Comparison PDF2TIFF to the normal b&w conversion without gridding:

Download – PDF2TIFF >>>
Download – PDF2TIFF – PDF Source document in color >>>
Download – PDF2TIFF – TIFF results in comparison >>>

ifresco OpenSource client for Alfresco – Download now at GitHub

2013-09-06

The current version to download of our ifresco client for Alfresco is now always found at GitHub under – https://github.com/XKEYGmbH

ifresco OpenSource Client for Alfresco available as already installed VM Appliance

2013-09-06

To use our ifresco Client for Alfresco with minimal outlay and without the required installation and configuration, there now is a already installed VMWare Appliance.

After the installation only the IP and port of the Alfresco Server has to be inserted an the ifresco Client can already be used together with the Alfresco Server. With an optional update agreement you’ll get access to the constantly actualized versions directly from our SVN repository. With this variant the Alfresco Server is used seperatedly from ifresco. Alternativly we also offer a combined Alfresco Community Edition + ifresco Client Appliance – both Systems already installed, configured and optimized on an Ubuntu 64bit LINUX VM Server.

Pre installed ifresco Client VMWare Appliance

Debian LINUX 32bit
PHP 5
Apache 2.2
MySQL 5
Installed PHP Extensions – PDO, MB-String, XML, SOAP, Iconv
All necessary PHP settings done

Requirements: VMWare Workstation, Player, etc. – min. 2GB RAM, 10GB HD

Optional: SVN Update for users with SW-maintenance

Configurationquery at the first start: Alfresco Server IP, Port. After that, the IP Adresse uner which the ifresco Client is reachable in your browser (optimally: Google Chrome) is shown.

Priceinformations in our web-shop >>>