PDFmdx version 3.5.3 available

New features PDFmdx version 3.5.3:

  • Field / Area OCR / Invert area / Always execute OCR:

Normally for PDFmdx processing, PDF files are used as input, which already contain text – either “normal” PDF or scanned PDF which have received an additional text layer via a previous OCR process (eg. via AutoOCR or FileConverterPro).

PDFmdx also has an integrated OCR function to determine the text in the areas of the positioned fields from the image information.

With the general PDFmdx OCR settings it is possible to specify how the texts from the PDF are to be obtained – “Original”, “OCR” or “SmartOCR”. With “Original” the text is always taken from the PDF, with “OCR” the text is always obtained via a PDFmdx OCR process, even if a text already exists in the PDF. With the “SmartOCR” setting, the PDFmdx OCR function is only executed if there is no text in the PDF, otherwise the existing text in the PDF is taken. These settings generally apply to the entire template and all associated layouts.

In this context, there are now 2 new functions that allow to recognize white text on a black background.

Individual areas with white text on a black background can not be recognized via an automatic OCR process, because before the OCR process the area would have to be inverted in order to be recognized. This can only be done interactively by selecting the area manually.

In the PDFmdx Editor it is now possible to activate the option “Invert Area” in the field configuration. In this case, the field area is inverted for the OCR processing. This creates black text on a white background which can be recognized by the OCR.

There is another new field function “Execute OCR always” with which the general setting “SmartOCR” can be overridden. OCR recognition is then always executed for this field, even if an underlying text already exists.

  

  • PDFmdx Editor – find condition, call layout: There is now a search function to search in the conditions for a (partial) string forward and backward. A line in the conditions can thus be jumped to directly. The linked layout can then be called directly from the condition line. This feature makes it easy to work with a large number of conditions.

  • The web service functions have been revised. In the web service example the metadata can now also be downloaded as XML.
  • For the metadata XML, the new variables JobID, JobName, JobDescription and ProcessID have been added.

Download – PDFmdx Template Editor & Processor >>>