Innovations PDFmdx Version 3.5.0:
- Subgroups – additional hierarchy for moving groups: A sliding group is used, for example, to recognize invoice items that occur several times in a document or on a page and to be able to form several data records from this. However, there are documents where these records require a further hierarchy level, if there are multiple sub-records under one heading, e.g. to differentiate different characteristics of an article according to color or size. This can be done either as a list or in the form of a matrix. In order to be able to recognize and read out such additional characteristics it is now possible to form “subgroups” for a moving group.
There are now 3 field levels – the “Document fields”, the “Group fields” and the “Subgroup fields”. Subgroup records are defined by conditions such as the group records. The output also provides the information of the document and the group for subgroup records.
For the output, you can configure whether – all data records are output, or whether the group or document records are to be suppressed. The fields of the higher levels are also available in the group/subgroup dataset. To identify the data record level, the variable %RECORD_LEVEL% can be used with the values (D)ocument, (G)roup, (S)ubgroup.
The fields of the different levels are displayed in different colors in the PDFmdx Editor – document fields “Blue”, group fields “Red” and subgroup fields “Green”.
The working/search area for the moving group/subgroup is represented in the PDFmdx Editor by 2 horizontal red lines, which can be positioned vertically in the preview. The search for data records takes place only within the specified range.
- MS-SQL Database Support for Metadata / Log & Error Log Function: In addition to exporting the metadata to an XLSX/CSV/XML file, there is now also the option to write the records into MS-SQL database tables. The read-out variables are written as documents/groups/subgroup data sets with configurable fields and contents, the log table with a fixed structure.
MS-SQL Export Functions:
- Configuration – MS-SQL Server / Database.
- Create / delete SQL tables / delete data from the tables.
- Create / delete SQL columns in the selected table.
- For each template, the SQL export can be activated and the SQL table can be selected. Fields (variables) or fixed text can be assigned to any SQL column.
- Enable SQL – Logging / Error Log. The name of the log table is configurable.
- The SQL log contains the following information: PROCESS_ID, computer name (WsName), user name (UserName), template (Template), layout, status (OK, ERROR), error code (ErrorCode), error message as text (ErrorMessage), information about the input/output file (InputPath, InputFileName, InputFolder, OutputPath, OutputFileName, OutputFolder), start/end of processing (StartTime, EndTime), processing time (ProcessingTime).
PDFmdx error codes in the log:
- 0 = Successful processing.
- 1 = No pages remaining in the PDF.
- 2 = Configured stationery could not be found.
- 3 = Missing license.
- 4 = Error loading text plugin.
- 5 = Error writing the PDF file.
- 6 = No matching template/layout found for the specified criteria.
- 7 = Error writing printer (PCF) configuration file.
- 8 = Processing error.
- 9 = Error creating the output folder.
- 10 = Error creating the output file.
- 11 = Error when overlaying/underlaying the stationery.
- 12 = Error while signing.
- 13 = Error when sending emails.
- 14 = Error writing metadata.
- 15 = Error generating the XML file.
- PDFmdx Editor – Test Functions: The test feature in PDFmdx Editor and PDFmdx Processing are now based on the same component. This ensures that the result of the “Test” in the PDFmdx Editor, for the recognition, the splitting and the reading, yields the same result as for the processing by the PDFmdx Processor.
In a PDFmdx template you can configure if and how a layout should be identified by conditions. In the “Test” function in the PDFmdx Editor, the conditions are checked, the recognized layout is identified and the fields specified in the layout are read out. On the test mask there is now a checkbox to ignore the layout recognition/criteria. The fields are then read out and displayed only via the manually selected layout.
- Field substring from the end: The substring field function is now not only possible from the beginning of a field, but also from the end (switchable).
- New OCR version, several recognition languages: The field OCR function for fields has been updated and is now based on Tesseract Version 4.0. As a result, it is now possible to recognize multiple languages.
- Default values for fields – layout related: In addition to the function to give a general value, there is now also a function to assign an individual default value for a field for each layout. A variable is assigned the default value if the field was not positioned on a layout or if the field was positioned but nothing can be read because the area is empty (=blank). This allows the layout recognition of a variable to assign a fixed value – eg. a customer number that can not be read directly from the document.
- New “Composite” field type: The “Composite” type allows you to create combined fields, that consist of several other fields or text. Such composite fields are available for output (folder, filename, metadata), but not for conditions. These fields can be composed of variables of the documents, groups and subgroups.
- Option – No remaining pages – Do not move document to the error folder: When splitting, deleting pages (cover pages) and deleting blank pages, it may happen that the remaining document no longer has any remaining pages left for processing. This option determines whether the “remaining document” is to be retained and moved to the error folder, or whether such a document is not preserved and the process is only logged in the error log.
- Export of additional formats, selectable for – “Successful / Error / Both”: It is now also possible to convert PDF files, that have been moved to the error folder, into other formats (eg. TXT) to carry out further evaluations.