PDFmdx – Page 3 – PDF News – PDF/A, Archivierung, OCR, DMS, Dokumentenmanagment, Scan to PDF, ECM, PDF Convert, Free PDF printerdriver, freier PDF Druckertreiber, SDK, API, PDF softwaredevelopment

Category: PDFmdx

pdfFM – PDF Folder Merge – Convert documents with the same name to a total PDF (/A)

2017-07-06

With PDFmdx, document stacks can be easily split into single documents according to the most diverse criteria and named range contents can be named. Sometimes, however, it may also be necessary to automatically create documents with the same name from different sources in a certain sequence automatically into an overall document.

For a customer project, we have developed pdfFM – an application where 3 folders are specified. When processing, the folders are searched for documents with the same name, the same documents are added to a new total PDF in the order of the specified folders and stored in a destination folder. If a file is missing in one of the folders, these documents are moved to the error folder. A log file logs the processing. The processing can be executed either interactively or via command line call.

In addition to the merge to an overall PDF, the output file can also be converted to an ISO PDF / A-1b, 2b or 3b file.

PDFmdx – Video – Automatically send invoices via EMail

2017-06-08

This PDFmdx application example shows how a PDF document reads out areas and the information is subsequently used for automated email sending of the finished invoice.

Fields and areas are defined to: – read the company, the invoice number, the invoice date and the e-mail address from the document.

The input file is named based on the information read out. A PDF stationery is deposited. In addition, the read-out invoice number is applied to the invoice as a 1D bar code and a 2D QR code with a web link.

As a last step, an email message is generated via an HTML EMail template. Variables which have been inserted in the subject and in the message text are replaced with the read-out information. The PDF invoice as well as additional files are inserted as attachments and then automatically sent via an SMTP EMail server.

PDFmdx Version 3.2.5 available

2017-06-08

Innovations PDFmdx Version 3.2.5:

New option for sending HTML emails – So far it was only possible to use external links, which were also available for the recipient, for pictures in the message. Now the images are embedded directly into the HTML message – either “all images” or “only the local images”. This means that no external resources accessible to all receivers need to be used.

If the option to preserve the creation date / time is activated, then this information is now also transferred from the output file for files or subfiles that are moved to the error folder.

The% COUNTER% variable now supports values> 9999

If the “Delete Blank Pages” function is active and a document is processed with only one blank page, it now correctly lands in the error folder and not in the destination folder.

Download – PDFmdx Template Editor & Processor >>>

PDFmdx Version 3.2.4 available

2017-05-29

Innovations PDFmdx Version 3.2.4:

PDFmdx Editor – New HTML editor for the message text of the email function.

E-mail sending – The option in the PDFmdx editor to activate the e-mail transmission was not visible and therefore could not be activated.

The PDF / A converter has been updated.

Error Correction in the PDFmdx Editor – An automatic save error was fixed when creating templates. The problem occurred only when you created the first aliases.

Download – PDFmdx Template Editor & Processor >>

PDFmdx – Version 3.2.2 available

2017-05-18

Innovations PDFmdx Version 3.2.2:

PDFmdx is now a 64bit version and can only be installed on 64bit Windows versions. This requires the license to “move” on existing installations, that is, to return it to our license server and then retrieve it again. The new version also requires the .NET Runtime version 4.

New basic routines for PDF, Image, Barcode and OCR processing as well as for the extraction of text from the PDF

Extended list of supported 1D and 2D bar codes to detect barcodes and apply to the document.

PDF/A – 1b, 2b, 3b – Conversion and document output

New and improved feature to automatically detect and remove blank pages in black-and-white and color PDF documents. The percentage of the blackening serves as a parameter. In addition, the information about text in the background can also be used as a criterion. The test function now also displays the identified blank pages of the selected sample document as well as their degree of blackening. Empty pages are removed at the very beginning of PDFmdx processing.

You can sort the evaluation list of the blank recognition by the displayed columns in ascending or descending order.

In the test from the PDFmdx editor, the name of the layout identified by the D condition is now displayed. This can be used to determine whether and which layout is recognized with the document being tested.

Simplified collection and modification of the conditions in the PDFmdx editor, eg. an AND / OR condition can now be inserted at the start node.

Under the conditions, the page range definition is now processed correctly.

Fuzzy / approximation search for conditions and anchor fields. Specifies how many characters a deviation from the specified string is still accepted – is available with the deactivated substring search.

Text areas / fields will now also be read out if the text box in the PDF exceeds the visible page margin.

Text search and selection / copy function: In the preview of the PDFmdx editor, a text can be searched forward or backward in the entire document. The text location found is highlighted. It is now also possible to highlight text in the editor and copy it to the clipboard.

Function to accept and maintain the creation date or time of the output file for the target file. This information also includes variables for the path / file name as well as for the metadata output, for example over XLS available.

When the variable for the filename of the input file is used, the filenames’ uppercase / lowercase is retained – so far the file name has always been converted to lowercase.

In the PDFmdx service processor, the max. Number of parallel processes of 1,2,3,4,5,10, Previously, the minimum value of 5 was up.

The Web service interface via REST / SOAP is activated by default during installation.

New web service functions (REST / SOAP) for user and job template management.

The new features are included in the included .NET / C # sample project and can be tested with it. These extensions are required to implement the future PDFmdx Commandline application.

user management – Create New Users, Delete, Reset Password / New, so far, there is only one “admin” user. Now it is also possible to create additional users. The jobs and the job templates are managed on the basis of the users. Additional users can be created via the “Admin”. The “admin” password can be reset by the PDFmdx service processor.

Job-Template Function – To create new jobs via the web service interface simply without much configuration effort, there is now also the possibility to use job templates. Job templates serve as a reference for new jobs. An existing job can be made via a checkbox to a job template. Jobs created via a template are referenced.

Download – PDFmdx Template Editor & Processor >>>

PDFmdx – Version 2.8.1 available

2016-10-19

Innovations PDFmdx Version 2.8.1:

Template synchronization:

The PDFmdx template editor can now match the locally created templates and layouts via a web service connection to one or more PDFmdx servers. This allows templates to be developed and tested locally, to be replicated to the processing servers later. Communication is via SOAP via http / https. This considerably simplifies and accelerates the matching and distribution of new and updated templates.

Textstamp with rotation angle:

To apply a text not only horizontally, but at any angle, there is now the additional “angle” parameter.

Anchor Field Search – New Features:

So far, the string for the positioning of the anchor field on the entire page has been searched (from top) and the first reference was assumed as the position for the anchor field. However, it may occur that the term is not the first but the next occurrence is the search position, and there is no other unique way to position the field over a search string. Therefore the function was extended.

By default, the anchor field search is now performed from the field position of the template. The next matching string is taken as the position for the anchor fields. A new addition is the ‘hit’ option. If it is activated and a number is specified, the page is scanned from top to bottom and left to right for the anchor text. The number indicates the number of hits as an anchor field position. So eg. the 2nd hit can be found on a page as an anchor field position.

AutoScale-function:

Especially in the case of scanned documents it can happen that the contents of the documents on the page vary not only in their positioning horizontally or vertically, but documents can also have different scaling and sizes. Z.B. For example a scanned expression with different scales was created. Although the relative position and size of the fields to be read is the same between the documents, the absolute values are different. The layout for the reading of the fields is created using a typical document and so far only considered the absolute distances and sizes of the fields. A document that appears about 10% smaller on the A4 page could not be processed, because the fields compared to the created layout both of the position so synonymous of the size does not fit. For this, we have now implemented an AutoScale function, which is able to automatically compensate for such different scaling to a certain extent.

What is to be considered:

The layout should be created from the “largest” version
An anchor field must be used that can be found without partial string search. E.g. Via the string “Invoice” but not via “* Invoice *”
The “AutoScale” option must be activated.

Detect and remove blank pages:

When scanning documents, duplicate scans may contain blank pages (partially unprinted backs) in the document. Scanners do not always have a function to automatically remove them during the scanning process. For the further processing and archiving, empty pages are disturbing and should be able to be removed. With the current PDFmdx version 2.8.1 there is now a function to automatically detect and remove blank pages. The criterion for detecting a blank page is a threshold value which is set to 95% by default. We recommend a value between 95 and 98%. The value specifies the percentage of the “white pixels” on a page. A page is identified as “empty” as soon as the proportion of white pixels is greater than or equal to the set value, e.g. 95%. Blank pages are removed before all other PDFmdx processors are started.

Remove sides / blank pages after the separation pages:

If a document is split, the found separator page can also be deleted. New addition is now also a function to delete the following pages of the separation page. In this case, either a certain number of subsequent pages to be removed can be defined, or the function for automatic page identification / removal with threshold values can be used.

Regular Expression Parameters to selectively extract numbers from a field:

RegEx expression “\ d +” can be used to return numbers of a field. If no parameter is specified, we automatically return the “first of the longest of the found numbers”. (E.g., the read-out field content is “page 15/110”, “110” is returned). Together with the “Hits” parameter, a number of a specific position can be extracted from the string. With parameter = 1, the first number found in the string “15” is returned with 2 the second “110” and so on.

RegEx can also be used in combination with the additional string formulations:

Up to now, only the RegEx processing or alternatively the other string processing functions could be used. Now it is also possible to combine these two functions – RegEx can therefore be used together with the functions – partial string, remove – left / right / space / leading zeros as well as the function characters and type selection. The RegEx processing is executed first, regardless of the type of the field.

%TIME% Variable – Now in 24 hour format

Update to SQL Compact Version 4.x – The version is now already included in the setup and does not have to be reloaded and installed as usual with version 3.5.

Download – PDFmdx Template Editor & Processor >>>

PDFmdx – Version 2.7.1 – with SOAP / REST interface and extensive enhancements and improvements

2016-09-20

Innovations PDFmdx Version 2.7.1:

1. Web interface via SOAP / REST – PDFmdx can therefore be integrated into other applications and services. To test which functions via the web services are available, a test client and the source code of this application is installed as a C # project with the Setup.

Features Web-Service Test-Client:

List of available templates – Refresh / upload / download / delete
edit job – description, selection of layouts, processing without conditions – Yes / No, preliminary parts – by templates, by pages, by field change or when it changes the layout or template.
With the “Upload” function a created job is “filled” with PDF files for processing
“Start” – processing the uploaded PDFs with the job settings on PDFmdx server
“Download”: After processing, the results can completely downloaded or viewed on Job Info details and result Files downloaded separately.
For the “Download” can be selected if only the PDF output files, only the metadata, or both should be downloaded together.Other data to be processed can be recharged via “upload“ in a created job and “Restart” will also be processed.
“Delete” allows delete existing jobs
Jobs can be queried and displayed as a list – ID, status (created, uploaded, started, finished, downloaded), user, description – are displayed. The job list can be filtered by the user and the status.

2. EMail attachment negative list – External list of email recipients to whom the additional selected attachment should not be sent. The recipients listed only receive the resulting document from the processing. An additionally configured attachment is not sent. All other e-mail recipients receive the additional attachments.

3. eMail replacement list – An E mail address of “To:”, “CC:” and “BC” may be replaced by an external ASCII / TXT list with another email address.

The attachment negative list and the replacement list can be defined generally or per template. The lists of templates override the general conditions laid down in the e-mail configuration lists.

4. Condition Editor has been extended to copy / cut and paste to existing conditions. Conditions can therefore be exchanged between the templates.

5. The choice of 1D / 2D bar code types for the detection can be set individually for each template and field –So far, the barcode types were set uniquely only in general for the whole application and therefore for all the templates and for all fields – now it is individually possible to field level.

6. 2D barcode recognition for fields or across the entire page. 2D barcodes of types DataMatrix, PDF417, Micro QR Code and QR Code can be recognized and read through the whole page or in a field area now.

7. Get the PDF information fields, obtaining date / time – The PDF information fields as well as the creation and modification date of the source file can be copied to the output file. Without this option, this information will be rewritten.

8. Suppressing the documents record in XLS / CSV output because with the gliding group (single position data sets) fields all document fields can be output as well.

9. The XML output format has been improved and implemented correctly – The document fields and the records for the fields of “gliding group” (item fields) are output separately in the XML structure. The used unit “mm” is output as an XML element.

10. Search function in the PDF preview – therefore texts in PDF prototype currently displayed file can be searched and highlighed. The search can be carried forward / reverse and runs automatically as a substring search. This feature facilitates checking the conditions and the anchor Search function.

11. The default preview has been set to maximum width at the upper edge of documents – previously the default view was the vertical center of the document.

12. The test function been revised and extended. The defined condition can be switched off for the test. The layout to be used can be selected manually. So can be displayed regardless of the defined condition, the read-out fields. So the analysis and fine-tuning will be easier. The document fields and the fields of the gliding group are now properly evaluated during the test function and display correctly.

13. Set Default / default values for fields – they can be accessed and used via the% variables. So values can be assigned even if the field is not positioned or if the read-out area is empty.

14. Extensive corrections and bug fixes have been made in the range of page numbers in the conditions field–search anchors, and at page function of the search anchor definitions. It is now reliably the first search string found found on the specified page and used as an anchor reference for the fields. Likewise, improvements in the text extraction were made – especially with regard to the correct order of the extracted words. With these improvements PDFmdx has been optimized for the processing of scanned or OCR documents.

15. Show/Hide of jobs in the executable PDFmdx processing. In order to increase the clarity of a large number of configured jobs, job definitions can have a checkbox “hidden” and therefore hidden on the user interface.

Download – PDFmdx Template Editor & Processor >>>
Download – PDFmdx REST Interface >>>

PDFmdx – Version 2.5.0 available

2016-07-04

Innovations PDFmdx Version 2.5.0:

pmdx – template Export / Import – The PDF prototype files of all layouts are now included in the file PMDX. Was previously a template from the PDFmdx Editor exported and imported on another computer again, so the PDF prototype file had copied separately and be available on the target computer in the correct path before calling. Was it not present, the position was lost and the fields had to be re-inserted. The PDF prototypes are now included in the PMDX file and will no longer be stored externally. This simplifies the exchange, as well as run the PDFmdx Editor and PDFmdx processor on different computers.

Sites and Dividers (advance) delete – Via conditions pages and Dividers can be detected and be deleted before or during processing. If the deleted page will not be used for the detection of the layout or to split, there is a separate function to delete the recognized side “before processing”. In this case, first all pages to be deleted are searched and the further detections and divisions for the “remaining document“ carried out only in a second passage. The criterion can be set on a selected field or “the whole page”. Also the search can be done on a particular page, a series of pages, or all pages. It is possible to work with an exact string, or by using wildcards as #,? and *.

Pre-split of document stack with reference to layout or template – Are different layouts or templates contained in a stack of documents, and the various documents to be recognized on the basis of criteria and divided into individual documents, so this should be done in PDFmdx processor using the “up-front parts, if the layout or template changes “. A in a layout defined “S = Split” condition affects the stack of documents only with the same layout, not across different layouts and templates. Exist the documents to be processed already as separate individual documents, so this option is not needed.

Barcode – Recognition – The field types, there is now the option of selecting “barcode”. So 1D barcodes can be recognized within the marked field area. The bar code value is returned as a field content. On the bar code settings can be set which barcode types to be detected. The barcode recognition is performed directly at the field position and in test function. So is immediately visible whether the bar code is recognized and what is the value of the barcode.

Date- and Time-Format of the Output-Variables configurable – The format for output variables% DATE% and% TIME% can now be specified individually on the variables “dd” “mm” “yyyy” and “hh”, “mm”, “ss”.

Test function displays the extracted and conditioned field contents – For the test function the display of the PDF Extract, and also conditioned by the field definition Field contents, which are used by PDFmdx for further processing.

Text bzw. 1D/2D Barcode Stempel aufbringen – PDFmdx can read texts from PDF documents and it may text or barcode stamp applied to the pages. In this case, fixed text, standard variables such as date, time, calculator, user, template name, layout page number, page number overall but also read out field values are used on configured variables. For text-stamp font style, font size, color, and formatting are – bold, italic, outline, underline, strikethrough as parameters available. The text alignment can be selected within the position box – Left / Center / Right, Top / Center / Bottom. A value can also be applied as a 1D / 2D barcode stamp. There are 36 types of bar codes to choose from. Another parameter determines whether the stamp to be applied to specific pages in a page area, on all pages or only on the last page.

Download – PDFmdx Template Editor & Processor >>>

PDFmdx – Version 2.4.3 – Barcode recognition and improved anchor / search field function

2016-05-23

Innovations Version 2.4.3:

Field-Typ – Barcode: With the new field type barcode, it is now also possible to define fields (areas) as type “barcode”. From the specified range will not be read the text, but carried out a 1D barcode recognition by the selected bar code type and the recognized value is returned.

Improved anchor / search field function: Fields can be set in relation to an “anchor / search box” field. The anchor field can be searched on the page through a text or substring search text. If the term is found, the other fields are read in relation to this field. So, for example, can be found and read “total amount“ and all the related fields – net, VAT, gross. This feature is important for the processing of scanned documents, because it is not guaranteed by the scan that the readout fields are always in the same position. Also barcode fields can be set and read relative to anchor fields.

Download – PDFmdx Template Editor & Processor >>>

PDFmdx – Version 2.2.1 – with many enhancements and new features

2015-06-02

With version 2.x and now with version 2.2.1 a variety of enhancements and new features have been implemented to PDFmdx.

Innovations PDFmdx Template Editor version 2.x:

Completely redesigned template editor – several layouts can now be assigned to a template. Until now had to be created a custom template per layout. EA template now summarizes all the layouts that contain the same fields and have to be processed in the same way. Example: To process input invoices there is a single invoice – template with fields – company, number, date, amount. Per supplier there is just one layout to reflect the different invoice forms. Layouts are recognized over conditions and define the fields on the form.

Stationery control via information from the document: PDFmdx can also apply a PDF stationery as an overlay or underlay to the generated documents. Field contents and criteria control different letterheads via the document content. The fields can also be used for generation of the name and path of the PDF stationery.

Anchor fields – Substring search and page ranges: Anchor fields are required in order to find reference points in a document to relate other fields absolutely to them. There is information which neither are always in the same place nor on the same page. EA typical example is e.g. the final amount of an invoice – this can be on the first, the last or even on an x any page. In addition, depending on the number of billing positions, the final amount may vary vertically. A fixed definition would not help here. Another challenge are scanned documents. Here, from document to document each field can migrate horizontally and vertically – Depending on how the document was placed and scanned in the scanner may result shifts and distortions. A fixed position definition would cause a high error rate.

In order to read and process the fields reliably, we have implemented the “anchor fields”. These fields are fixed points on which other fields can relate absolute – e.g. letter heads with company names or texts such as “Total” or “invoice amount” etc.

One or more anchor fields can be defined per layout, to which different fields can relate. The search for the anchor fields can be done by substring or fixed texts and it can be specified whether all or only certain pages should be searched.

Delete Pages – Sometimes it is necessary to delete certain pages and not to take over in the target document. E.g. with cover or slip-sheets. The pages are like the layouts identified via conditions from the content.

Split & Merge New: PDFmdx is able to split stacks of documents according to criteria in individual documents – e.g. via page number, via change of a specific field (eg invoice number) or via freely definable and / or criteria. A new addition is now a function, to reassemble these split documents, according to other criteria, to new documents. In the document, it can be re-sorted and structured via bookmarks according to data read from the document information. Thus, total reports can be split into single reports and newly reassemble and structure according to other criteria.

Test – Function – significantly extended: Allows to perform in advance, based on a pattern or template file, a test of the field search, extraction, preparation, the layout recognition, the pages erase function and document sharing. For the test can also select a different PDF file than the one used for layout.

Fields – assign values: Normally, information can be read from the document and assigned to fields / variables. But there are cases where it is not possible to read out certain information which should be stored. For example, it may happen that in a PDF invoice, the supplier only has used images for the design of the form and therefore is no area which you can read the company name. The layout can also be recognized by other criteria and is clearly assigned. In this case, the “company” is not positioned on the layout, but can be assigned with the name of the company in order to use the information subsequently as a variable or in the metadata.

Innovations PDFmdx processor version 2.x:

Copy to..: To process a single input file in several different ways, the function is “Copy to ..” – There must be selected only the name of the target jobs from the list. The transfer can also be carried out in parallel on multiple jobs. All input files of a job are automatically copied into the input folder of the other jobs. There can be performed automatically several variants of processing with different parameters at once – see also the function “Split & Merge New”

Pre-Split at changing the layout / template: This function allows split into individual documents a stack of documents prior to further processing with reference to the detected layout bzw.- templates and then continue to process. A split in a new document takes place once a detected layout changes. Layouts and templates are recognized via the configured criteria. It is split only at pages where a layout is detected.

PDFmdx service processor – scheduled processing: By default, the processing begins as soon as a new file is detected in any of the input folders. In addition is a scheduled process was possible so far with the startable PDFmdx Processor – with a set interval of x minutes, or even to a specific date and time or even every day at a specific time. This option is now available for the PDFmdx service processor as well.

Download – PDFmdx Template Editor & Processor >>>