PDFmdx – version 1.6.0 – OCR, Regular Expression, XLS lookup for e-mail adresses, start of the processing via date & time

Innovations PDFmdx version 1.6.0:

1.) Area OCR: marked areas / field contents now can be read from the document via OCR. For every field a mode can be set to decide how the text should be read from thePDF – Native – With that the underlying text is read directly from the PDF like so far. OCR / SmartOCR – Not always every word which is in the marked area can be read from any PDF. Sometimes there is only an image or only a whole sentence/paragraph but not a single term on a specific position can be read – In this case the OCR / SmartOCR mode can be used. With that an image gets created which is then processed with OCR. The marked area can be read, independent of how the PDF is constructed or how it was created. In the OCR settings the OCR-language as well as the resolution for the conversion of the PDF to the image are configured. The standard value for the resolution is at 300dpi and can be raised up to 600dpi for difficult fonts and documents. SmartOCR automatically switches between native and OCR.

2.) Regular Expression: RegEx rules can be defined for every field and allow a preperation and filtering of the extracted field contents – e.g.:

  • [ab]+“ corresponds to „a“, „b“, „aa“, „bbaab“ etc.
  • [0-9]{2,5}“ corresponds to two, three, four or five digits in a row, e.g. “42” or “54072”, but not the strings “0”, “1.1” or “a1a1”

3.) XLS lookup for e-mail adresses: With this feature a XLS(X) file with 2 columns can be used to find an e-mail address, which is not contained in the document, via a key value like a customer id to use it as variable in the “to”-field for the e-mail transmission.

4.) Start of the processing at a specific date & time: So far the processing could be started manually or recurring with a defined interval but it wasn’t possible to start it at a specified date and time. This could be useful to send a bigger amount of e-mails delayed at night to not disturb the normal work flow. The files can be prepared and the processing will be started at the defined time.

1_Felddefinition mit Regular Expression und OCR Support  2_OCR Settings - Spachauswahl  3_OCR Settings - Auswahl der Auflösung für PDF Rendering  4_E-Mail Adress Extraktion auch über OCR möglich  5_E-Mail Lookup über XLS Tabelle möglich  6_Start der verarbeitung zu einem vordefinierten Zeitpunkt um z.b. die Verarbeitung in der Nachausführen zu lassen

 

Download – PDFmdx template editor & processor >>>