What’s new in PDFmdx version 3.21.0:
Share pages by barcode:
There are use cases, e.g.: Rolling cards from freight forwarders, where it is not necessary to split documents with a whole page, but pages are to be split horizontally in order to create individual new documents from them. The new function “Share pages via barcode” has been implemented for this.
A barcode and a condition about the read barcode value are used as criteria. Every page is searched for barcodes. If a barcode is found with a matching condition, *,? and # can be used for substring search, so the page is split horizontally into a new document at this point. The existing document “ends” at this point. The remainder of the page is created as a new document, starting at the top of a new page, until a matching barcode is encountered again or the end of the source file is reached.
Functions – Split pages by barcode:
- The input file is searched page by page for specific barcode types with a specific value, substring wildcards (*,?,#) can be used.
- If a corresponding barcode is found on a page, the page is divided horizontally at this point. A new file is created from this position.
- The barcode value found on the pages can be used as a variable for file name / folder or metadata.
- Delete – Upper / lower edge of the input file in mm: These areas are “deleted” and not considered in the result file. e.g. letterhead and footers
- The page size of the output file is automatically based on the input file or can be set to A4 portrait format.
- In order to line up the divided blocks continuously, white areas (before and after the block) are detected and removed. (Black percentage in % as a parameter)
- To format the result file, the upper/lower margin and the distance between the blocks can be specified in mm.
Disable PDF password support:With the PDFmdx Version 3.19.0 a new function was implemented to store the PDF “Open” password in the layout as well. The dialog with the password query was displayed in the PDFmdx Editor for all protected files, even for PDF files that only have an “Admin” and no “Open” password. To prevent this, you can now generally deactivate this function using a button in the PDFmdx editor.
new features eDocPrintPro Version 5.5.0:
- eDocPrintPro now uses the current GhostScript Version 9.56.1.
- PDF/A and ZUGFeRD have been updated and now offer the same functionality as the PDF2PDFA Konverter.
- eDocPrintPro PDF/A now also supports the creation of PDF/A2u and PDF/A-3u files,
- eDocPrintPro PDF/A now supports the current ZUGFeRD / XRechnung Standard 2.2.0
- The configurable PDF page size has been expanded to the maximum possible format of 5080 x 5080mm supported by PS / GhostScript.
- Several “oversize” page formats have been added to the standard paper size selection list. (914×1300, 914×11600, 914×2100, 914×2500, 914x3250mm)
- For the file name and the output path, in addition to the variables that can be selected directly, all system variables that are supported by the operating system and that you have defined yourself can now be used. These are read out and used at runtime when the PDF file is output.
- The output path is checked for the maximum length of 256 characters supported by the operating system and truncated if necessary.
- Error correction in the “Insert at the beginning” / “Append at the end” function for PDF and TIFF files.
Download – eDocPrintPro free 64bit Version
Download – eDocPrintPro 64bit PDF/A & ZUGFeRD
Download GhostScript 9.56.1 64bit Setup
Download – eDocPrintPro 32bit Version 4.0.2
Our default OCR Engine – iOCR / vsOCR has been updated and is now based on the latest Google / tesseract OCR Version 5.0.
Info about additional languages for iOCR5 / vsOCR5 – see >>>
Download – GenOCR – OCR test application for iOCR (approx. 680MB) >> ;>
Download – iOCR5 (vsOCR5) Setup – Basic Languages (approx. 400MB) > ;>>
Download – iOCR5 (vsOCR5) Setup – additional languages (approx. 1200MB) > ;>>
The current version 1.9.0 of our PDF to PDF/A converter now also supports the latest ZUGFeRD / XRechnung standard version 2.2.2
In addition, an option was implemented to validate the XML file to be embedded against a corresponding XSD.
PDF2PDFA Konverter test & demo application >>>
PDF2PDFA-CL – PDF/A – commandline converter>>>
PDF2PDFA-FM – PDF nach PDF/A converter with folder monitoring >>>
PDF2PDFA-CS – PDF to PDFA converter Service >>>
eDocPrintPro PDF/A & ZUGFeRD >>>
News / improvements PDFmdx version 3.20.0:
- Reps & Waiting time for locked files: If a file in one of the monitored input folders is still locked when another application starts processing, a *.lock file is automatically created for this file. This happens if e.g. a scanner stores the scans directly in a watched folder. Often the PDF file is created first, locked and then added page by page. Depending on the size of the batch of documents, it can take a few minutes to finish writing a file and unlocking it. In order to intercept such situations, 2 parameters were implemented to automatically remove *.lock files again. This allows initially locked files to be processed automatically after they have been released.
The “Number of repetitions” and the “Waiting time between repetitions” can be configured. It is therefore repeatedly checked whether the file is still locked and if it has been released, the *.lock file is deleted and the PDF is processed.
- Open PDF password – XML export of templates / layouts: During the XML export of templates / layouts, the passwords stored in the PDFmdx layout can now also be exported to open a protected PDF file.
SplitBarcode is used to recognize 1D / 2D barcodes in PDF documents or to split them into individual documents. The barcode information read out can be used via variables for naming and filing the newly created files. Metadata about the recognized barcodes can be generated individually for each PDF or as an XML, XLSX, CSV overall file.
Supported 1D / 2D Barcode Types: Australian Post, Aztec, Codabar, Code11, Code128, Code16K, Code39, Code93, DataMatrix, DotCode, DutchKIX, EAN13, EAN13Plus2, EAN13Plus5, EAN8, EAN8Plus2, EAN8Plus5, HanXinCode, IATA2of5, IntelligentMail, Interleaved2of5, MailMark4StateC, MailMark4StateL, Matrix2of5, MatrixCode, MicroPDF417, MicroQR, MSI, PatchCode, PDF417, PDF417Compact, Pharmacode, Planet, Plus2, Plus5, Postnet, QR, RoyalMail, RSS14, RSS14Stacked, RSSExpanded, RSSExpandedStacked, RSSLimited, Standard2of5, Telepen, UPCA, UPCAPlus2, UPCAPlus5, UPCE, UPCEPlus2, UPCEPlus5.
- Recognition and reading of 1D / 2D barcodes (51 barcode types).
- Split batches of documents into separate files.
- Delete blank pages.
- Generate XML, CSV, XLSX metadata files.
- Configuration is done through the SplitBarcode Editor to create templates (different settings and configurations), with multiple layouts linked to the template (different positions of the fields).
SplitBarcode Editor:One template contains all settings and configurations (fields, conditions, output, PDF info fields, barcodes). One or more layouts are associated with the template. A layout requires a PDF sample file on which the size and position of the fields defined by the template are specified.
SplitBarcode Template Editor – Features:
- Template – add, rename, copy, delete, import/export, Layout – add, rename, copy, delete.
- Definition of fields to be able to use the read out barcode information via variables.
- Assignment of the barcode types to be recognized to the fields. For each field you can specify which of the 51 barcode types should be recognized.
- Preparation of the read barcode values ??via RegEx, remove: right / left / spaces / zeros, replace text via table or via external CSV / XLS file.
- In addition to the read value, further information is also available as “sub-variables” for each barcode field: bc_read – the originally read barcode, bc_processed – the barcode processed according to the configuration value, bc_page – the page on which the barcode was found, bc_type – the barcode type, bc_dir – the barcode orientation in degrees, bc_left – the barcode position on the left in mm, bc_top – the barcode position at the top in mm, bc_width – the barcode width in mm, bc_height< /strong> – the barcode height in mm,
- Identification and assignment of the documents to be processed via template / layout, based on conditions (AND / OR / NOT) and substring (*, #, ?) criteria.
- Split a batch of documents into individual files – by page number, by changing a field value or by criteria.
- Recognition and removal of blank pages by setting a maximum percentage of black
- Filling out the PDF information fields (title, author, subject, keywords) with text or via read-out field values.
SplitBarcode Layout Editor – Features:
- Select / save PDF sample file (prototype), select page, up, down, zoom, view selection.
- For each layout, the defined fields can be positioned visually / interactively on a loaded PDF sample file and their size can be specified.
- Display for positioned fields: position, size, barcode value, barcode type
- Field Functions: Automatically detect barcode type, create condition from field and insert into condition editor.
- SplitBarCode Layout Editor test function – tests the current configuration against the PDF prototype or against other PDF files – tests the criteria for recognition and sharing. The recognized layout, the field values read out and prepared, the pages on which the document is divided and the pages to be deleted are displayed.
- The templates / layouts to be used are activated for processing and can be managed via profiles.
- The PDF document to be processed is validated against all activated templates / layouts according to the defined criteria. If a criterion applies, the document is processed according to the template definition. If no criterion applies, an error is thrown or the file is moved to the error folder.
- PDF file output – Configuration of destination path and filename using the defined fields/variables. Overwrite, filename count and append can be configured for existing files.
- Metadata Output: Configuration of target path and filename using the defined fields/variables. Output as XML, XLSX, CSV file, as a complete file or as a single file per generated PDF, column definition for the XLSX / CSV file via field / variable selection.
- SplitBarCode – .NET / C# test application to test the functions interactively. Drag & Drop processing one or more PDF files.
- SplitBarCode-CL – Command line application for client / server.
- SplitBarcode-FM – MS-Windows folder monitoring service.
- SplitBarCode-CS – REST / SOAP web service – result is returned as a single ZIP file.
- DropSplit – Interactive workspace application for drag & Drop or via a single watched folder.
- eDocPrintPro SplitBarCode Plugin – Barcode recognition and splitting into individual files via the eDocPrintPro printer driver.
Innovations / improvements PDFmdx version 3.19.0:
- Improved barcode performance through preprocessing: There is now a new option to speed up barcode recognition and processing. The previous barcode processing is still available because there may be use cases where the previous processing offers an advantage. Both implementations have their advantages depending on the situation. Barcode recognition requires an image and not a normal PDF structure with text and lines. The PDF must therefore be rendered beforehand for barcode recognition.
In the previous implementation, only the area marked by the field is rendered. Since this doesn’t take very long and in order to achieve a better result even with bad scans, the rendering is carried out three times – with 200, 300, and 400dpi. Advantage: If barcode recognition is only to take place in a single small area, not everything has to be rendered or if scans are processed with inferior quality.
With the new “Barcode preprocessing” function, the entire PDF is rendered in advance with 300dpi, all barcode types defined in the template are recognized and used later cached. This works faster because the redering or barcode recognition is simply carried out with 300dpi and the recognized values ??are saved for further processing. Advantage: If there are several areas with barcodes on a page or if the document is of good quality and multiple rendering is not required.
- Process protected PDF – Open password: Previously, password protected PDF could not be processed. Now there is a function to store the “Open PDF” password in the layout. All passwords stored for the job for the selected layouts are loaded for processing. If PDFmdx recognizes a protected PDF file, the password list tries to open and decrypt the PDF one after the other. If a password matches, the PDF is opened and decrypted. A new, unprotected PDF is then created from it and processed normally via PDFmdx.
Document scanners and multifunction devices for capturing, archiving and forwarding of documents can be found in almost every company today. All modern devices are able to generate documents in color and as PDF. Color document capture and processing is becoming more common and popular. The disadvantage: The file of a color scan with the usual JPEG compression for color is many times larger than a comparable black and white scan. JPEG compression is good for color images/photos, but JPEG compression is very poor for text due to the artifacts created by the compression. Text needs sharp edges to be legible. With JPEG compression, these edges become blurred, making smaller fonts in particular illegible.
The PDF-MRC (Mixed Raster Content) compression solves the problem – it creates very small PDF files from color scans and enables texts to be read easily.
MRC compression is also known as “Hyper Compression” and uses the method of image segmentation. It is very efficient and well applicable for typical business documents consisting of text and images that are scanned in color.
The PDF-MRC compression in brief:
The essential point is that with PDF-MRC compression, page areas of the scan are divided into separate images – so-called “layers“. Each of these layers is adjusted accordingly and optimally compressed. The PDF format makes it possible to display the original representation of a page from these separate levels as an overall picture. See also Wikipedia >>>
The 4 levels of PDF-MRC compression:
- Background layer (color image) – contains the background, background “smudges” and all other graphic elements which cannot be identified as text, line graphic or image.
- Image layer (set of color images) – contains all color images of the page.
- Mask layer (black and white image) – contains the text and line art.
- Foreground Layers (Picture Palette) – contains the information about the colors of the masking layer, thereby saving the color of the text and line graphics.
PDFCompressor with PDF-MRC compression:
The current PDFCompressor component used in PDFCompressor-CL, -FM and -CS-Service, as well as in our other applications (e.g. AutoOCR, FileConverterPro…), now also supports PDF-MRC compression. PDF-MRC compression is applicable for color documents/scans, but not for black and white and grayscale scans.
Supported file formats: PDF, JPEG, PNG, BMP, TIFF, JPEG2000, JBIG2, ICO, PCX, GIF, WMF, EMF – for file format with multiple pages, a multi-page PDF is automatically generated.
Predefined PDF-MRC profiles: PDF documents are “rendered” before MRC compression with the set resolution (150, 200, 300dpi), i.e. in converted to a color image and then subjected to MRC compression. To make it easier for the user, the essential MRC compression settings can already be selected as pre-defined profiles. “MRC – text only” or “MRC – text and images” for 150 or 200dpi resolution.
Example – PDF-MRC compression:
PDF-MRC compression is specifically designed for compressing document color scans and allows file size reduction by a factor of 8 or 10 compared to traditional JPEG compression. “Normal ” PDFs that only consist of lines and text can also be MRC compressed, but it should be noted that the PDF is rendered before MRC compression, i.e. it is converted into an image file in order to then be output as MRC PDF. The text previously contained in the PDF is reinserted into the MRC PDF. Normally it only makes sense to subject scanned color files or PDF files that have been OCR processed to MRC compression.
The text display in an MRC-PDF file is clearly legible and without artefacts, even with small fonts, despite a low resolution of 150-200dpi. The PDF-MRC compression is therefore the ideal solution for archiving color documents generated in the company, whereby the file size is not significantly larger than that of black-and-white scans, the texts remain legible and the advantage of the color representation exists.
So far we have only used FileConverterPro (FCpro) as a Windows service with a REST/SOAP web service interface. Now we have also made it an independent C# / .NET component. This allows us to implement all document conversion functions available in FileConverterPro (FCpro) directly in other applications. Based on this, we will soon create the FCpro converter as an independent application with a command line and as a Windows service with directory monitoring (hotfolder). We will also integrate this converter component directly into PDFMerge / EasyMerge or in EMailArchiver and other applications.
This FCpro C# / .NET component is not offered by us as an independent product and is only used in our applications and in individual software projects. An example & test application is available for the FCpro converter component, with which all FCpro conversion functions can be tried out interactively.
Functions of the FCpro component test application:
- Standalone test application to interactively test the FCpro PDF & to be able to test PDF/A conversion functions
- Creates – PDF, PDF/A, text, image preview, thumbnail view from various file and container formats (ZIP, EML, MSG…)
- FCpro convert profile functions: create, copy, edit, export/import profile
- Configuration of the number of parallel processes for conversion
- One or more files can be dragged & Drop area to be dragged. The resulting files (PDF, TXT, JPEG, PNG) are placed in the source file folder.