Month: June 2019

ZFMerge – Combine ZUGFeRD PDF and other PDF’s into a total ZUGFeRD PDF

ZUGFeRD is a recognized and increasingly used standard for electronic invoices.

If, for example, a ZUGFeRD compliant invoice with embedded XML is generated from an ERP system, it may be necessary, depending on the customer, to attach further documents (for example: a performance report) that were not created via the ERP system. So a new ZUGFeRD compliant PDF has to be created (merged) containing the XML of the original invoice and all other documents.

ZFMerge – Features:

  • Generates complete ZUGFeRD compliant PDF/A-3b files.
  • The source file already contains a ZUGFeRD XML. The ZUGFeRD level and profiles are adopted.
  • More PDF files can be selected and added, the order can be adjusted.
  • Path / name of the new ZUGFeRD file is selected.
  • ZUGFeRD or PDF/A-3b compliant PDF is generated even if the source files do not conform to the PDF/A standard.

Download – ZFMerge Test Application >>>

OmniPage OCR engine for AutoOCR & AutoOCRLight from 2.0.7

Benefits of OmniPage OCR:

  • Recognition accuracy at the highest level, even for difficult documents
  • Fastest OCR processing – much faster and more powerful than anything we’ve tested and implemented so far. 1-2 seconds to create a searchable PDF per page are possible.
  • Affordable – 25,000 pages license with lower licensing costs than the previous 10,000-page Abbyy license
  • Easier activation of the (demo) license – The OmniPage OCR engine can be activated via our license server including a 30-day demo version together with the basic application.

Please note that for licensing reasons the OmniPage OCR Engine can only be installed on client operating systems – Windows 7/10 but not on Microsoft Server 2008, 2012, 2016 or 2019. The setup can only be performed on Windows 7/10. In terms of performance and stability, this is not a disadvantage. OCR processes are performance-intensive and should be executed on an own hardware (eg Intel NUC) with as many CPU cores and SSD disk as possible for optimal throughput.

The OmniPage OCR engine can be activated and is included in AutoOCR setup for AutoOCR or AutoOCRLight from version 2.0.7 as an option in addition to IOCR (Tesseract OCR). For AutoOCRLight, the OmniPage OCR can be separately downloaded and installed.

Download – OmniPage OCR Engine as Option for AutoOCRLight (ca. 235MB) >>>

Download – AutoOCRLight – Low Cost OCR Server (ca. 410MB) >>>

Download – AutoOCR – OCR Server incl. OmniPage OCR (ca. 640MB) >>>

eDocPrintPro PDF/A – Version 3.29.0 – supports the e-billing standard ZUGFeRD 2.0 / EN 16931 / Factur-X 1.0

The eDocPrintPro version 3.29.0 now also supports the current e-billing standards – ZUGFeRD 2.0 / EN 16931 and Factur-X 1.0. These are based on PDF/A-3 where the billing records are embedded as an XML file in the PDF as a file attachment.

Features:

  • Selection of e-billing standards – ZUGFeRD 1.0 / 2.0 / EN 16931 / Factur-X 1.0
  • Selection of a profile supported by the standard (MINIMUM, BASIC WL, BASIC, EN 16931, EXTENDED)
  • Use predefined (path, name) XML / select XML via file dialog
  • Automatically delete XML after embedding in PDF (Yes/No)

Requirement: A XML file valid according to the selected standard and profile must already be ready before printing. The XML file is not extracted from the print data.

Download – eDocPrintPro PDF/A & ZUGFeRD

GhostScript 9.27 Setup

PDF2PDFA and the e-invoice standard ZUGFeRD 2.0 / EN 16931 / Factur-X 1.0

ZUGFeRD 2.0 / EN 16931 / Factur-X:

Important prerequisites for the acceptance of electronic invoices are above all the use of a standardized data format and the legal framework conditions, which can be used by boh the sender and the recipient.

In June 2017, the EU created a standard format for electronic invoices, the European standard EN 16931. The aim of this EU standard: Electronic invoice exchange is to be standardized and legally guaranteed throughout Europe.

The previous ZUGFeRD standard 1.0 had to be adapted to the new standard. ZUGFeRD 2.0 (published on March 11, 2019) was developed as part of a Franco-German collaboration, in close coordination with the French Standard Factur-X 1.0 and is technically identical to this. Using ZUGFeRD 2.0, electronic invoices can be created, that comply with EU standard EN 16931 and EU Directive 2014/55/EU. Details about the electronic invoice can also be found on the pages of the European Commission. ZUGFeRD 2.0 also uses the global standard UN/CEFACT XML in the form of Cross Industry Invoice. This could even give ZUGFeRD 2.0 a worldwide perspective.

The new ZUGFeRD version 2.0, like version 1.0, combines a PDF/A-3 invoice file (visual representation of the invoice data as a digital format instead of the classic paper invoice) with an invoice file in XML format embedded in the PDF/A-3 document.

ZUGFeRD 2.0 profiles:

Several profiles are planned for ZUGFeRD 2.0 and Factur-X. The «EN 16931» profile replaces the ZUGFeRD 1.0 «Comfort» profile and complies fully with the EU standard, which however only defines the core elements of an invoice. With the «Extended» profile, additional information can be recorded in an electronic invoice, for example by industry or according to legal requirements. In addition, «Basic» and «Basic WL» have defined two more profiles for smaller suppliers.

ZUGFeRD is suitable for organizations of all sizes and, thanks to EN-16931 compliance, has an expanded, international significance. The profile «EN 16931» is recognized by all European administrations. Since November 2018, electronic invoices have been required in Business to Government (B2G) traffic in the EU.

With version 1.2.0 of the PDF2PDFA converter component, ZUGFeRD 2.0 and Factur-X documents can now be generated from a PDF and the corresponding XML.

Download – Demo & Test Application – PDF2PDFA Converter >>>

PDFmdx – Read position data via group / subgroup fields

In addition to document fields, PDFmdx can also read position data. Position data is lists or tables with rows and columns. These are typically found on invoices to cite several items in the document. We use the term “sliding group / subgroup. One or more columns (= fields) in on or more rows, on one or more pages, are searched and read in a vertically defined area.

From the PDFmdx version 3.5.0 there is a 2-stage structure where in addition to the groups a subgroup level is also possible. One or more subgroup datasets can be recognized and read out for a group dataset. There are documents with 2-stage position data, eg. in the case of textiles or clothing where an item (number, description) can also have a “sub-level” with sizes or color specifications. The item itself is simply listed and in the level below there are the quantities / prices for individual characteristics.

Two-level readout of position data:

  • “Document/Group/Subgroup” fields define the detection level.

  • An area defined by 2 red horizontal boundary lines will be scanned on all pages of the document for the group (red boxes) and subgroup (green boxes) records.

  • The specified conditions are used to identify and read out the group (G) and subgroup (U) data records.

  • Along with the lowest-level records, the information of the group and document fields is also available.

For tests and as a starting point for your own tests, we have created two example templates with PDF test files. The *.pmdx templates only need to be imported into the PDFmdx Editor via drag&drop and the output path may need to be adjusted. For processing, it is then necessary to create a job with input and error folders in the PDFmdx processor and to select the two test templates for the job.

Download – PDFmdx – Templates and examples for two-level reading of position data >>>
Download – PDFmdx Template Editor & Processor >>>

PDFmdx version 3.5.3 available

New features PDFmdx version 3.5.3:

  • Field / Area OCR / Invert area / Always execute OCR:

Normally for PDFmdx processing, PDF files are used as input, which already contain text – either “normal” PDF or scanned PDF which have received an additional text layer via a previous OCR process (eg. via AutoOCR or FileConverterPro).

PDFmdx also has an integrated OCR function to determine the text in the areas of the positioned fields from the image information.

With the general PDFmdx OCR settings it is possible to specify how the texts from the PDF are to be obtained – “Original”, “OCR” or “SmartOCR”. With “Original” the text is always taken from the PDF, with “OCR” the text is always obtained via a PDFmdx OCR process, even if a text already exists in the PDF. With the “SmartOCR” setting, the PDFmdx OCR function is only executed if there is no text in the PDF, otherwise the existing text in the PDF is taken. These settings generally apply to the entire template and all associated layouts.

In this context, there are now 2 new functions that allow to recognize white text on a black background.

Individual areas with white text on a black background can not be recognized via an automatic OCR process, because before the OCR process the area would have to be inverted in order to be recognized. This can only be done interactively by selecting the area manually.

In the PDFmdx Editor it is now possible to activate the option “Invert Area” in the field configuration. In this case, the field area is inverted for the OCR processing. This creates black text on a white background which can be recognized by the OCR.

There is another new field function “Execute OCR always” with which the general setting “SmartOCR” can be overridden. OCR recognition is then always executed for this field, even if an underlying text already exists.

  

  • PDFmdx Editor – find condition, call layout: There is now a search function to search in the conditions for a (partial) string forward and backward. A line in the conditions can thus be jumped to directly. The linked layout can then be called directly from the condition line. This feature makes it easy to work with a large number of conditions.

  • The web service functions have been revised. In the web service example the metadata can now also be downloaded as XML.
  • For the metadata XML, the new variables JobID, JobName, JobDescription and ProcessID have been added.

Download – PDFmdx Template Editor & Processor >>>