Document scanners and multifunction devices for capturing, archiving and forwarding of documents can be found in almost every company today. All modern devices are able to generate documents in color and as PDF. Color document capture and processing is becoming more common and popular. The disadvantage: The file of a color scan with the usual JPEG compression for color is many times larger than a comparable black and white scan. JPEG compression is good for color images/photos, but JPEG compression is very poor for text due to the artifacts created by the compression. Text needs sharp edges to be legible. With JPEG compression, these edges become blurred, making smaller fonts in particular illegible.
The PDF-MRC (Mixed Raster Content) compression solves the problem – it creates very small PDF files from color scans and enables texts to be read easily.
MRC compression is also known as “Hyper Compression” and uses the method of image segmentation. It is very efficient and well applicable for typical business documents consisting of text and images that are scanned in color.
The PDF-MRC compression in brief:
The essential point is that with PDF-MRC compression, page areas of the scan are divided into separate images – so-called “layers“. Each of these layers is adjusted accordingly and optimally compressed. The PDF format makes it possible to display the original representation of a page from these separate levels as an overall picture. See also Wikipedia >>>
The 4 levels of PDF-MRC compression:
- Background layer (color image) – contains the background, background “smudges” and all other graphic elements which cannot be identified as text, line graphic or image.
- Image layer (set of color images) – contains all color images of the page.
- Mask layer (black and white image) – contains the text and line art.
- Foreground Layers (Picture Palette) – contains the information about the colors of the masking layer, thereby saving the color of the text and line graphics.
PDFCompressor with PDF-MRC compression:
The current PDFCompressor component used in PDFCompressor-CL, -FM and -CS-Service, as well as in our other applications (e.g. AutoOCR, FileConverterPro…), now also supports PDF-MRC compression. PDF-MRC compression is applicable for color documents/scans, but not for black and white and grayscale scans.
Supported file formats: PDF, JPEG, PNG, BMP, TIFF, JPEG2000, JBIG2, ICO, PCX, GIF, WMF, EMF – for file format with multiple pages, a multi-page PDF is automatically generated.
Predefined PDF-MRC profiles: PDF documents are “rendered” before MRC compression with the set resolution (150, 200, 300dpi), i.e. in converted to a color image and then subjected to MRC compression. To make it easier for the user, the essential MRC compression settings can already be selected as pre-defined profiles. “MRC – text only” or “MRC – text and images” for 150 or 200dpi resolution.
Example – PDF-MRC compression:
PDF-MRC compression is specifically designed for compressing document color scans and allows file size reduction by a factor of 8 or 10 compared to traditional JPEG compression. “Normal ” PDFs that only consist of lines and text can also be MRC compressed, but it should be noted that the PDF is rendered before MRC compression, i.e. it is converted into an image file in order to then be output as MRC PDF. The text previously contained in the PDF is reinserted into the MRC PDF. Normally it only makes sense to subject scanned color files or PDF files that have been OCR processed to MRC compression.
The text display in an MRC-PDF file is clearly legible and without artefacts, even with small fonts, despite a low resolution of 150-200dpi. The PDF-MRC compression is therefore the ideal solution for archiving color documents generated in the company, whereby the file size is not significantly larger than that of black-and-white scans, the texts remain legible and the advantage of the color representation exists.