July 2013 – PDF News – PDF/A, Archivierung, OCR, DMS, Dokumentenmanagment, Scan to PDF, ECM, PDF Convert, Free PDF printerdriver, freier PDF Druckertreiber, SDK, API, PDF softwaredevelopment

Month: July 2013

ifresco Profiler – scan, edit, OCR, barcode, capture metadata, Alfresco integration

2013-07-31

The ifresco Profiler provides important, easy-to-use page oriented document processing functions for PDF and image document, on any workplace. It enables the possibility of quickly and easily storing documents with metadata with individual and specific profiling masks in Alfresco as searchable PDF’s. Area OCR with an integrated OCR Engine, creating of searchable PDF’s on export with the integrated OCR or extern AutoOCR Server, barcode recognition for file names and document split, export to folder, as e-mail attachment or via installed plugin’s to Alfresco together with metadata, are some essential features of the Software.

The application consists of 2 components – the profiler basic software, which contains all general functions, and one or more installable plugins. These plugins represent the interface to Alfresco and allow to use individually to the requirements and field of application adjusted profiling masks. The complete logic for the metadata, filing structure and naming is displayed in a plugin.

ifresco Profiler base:

Processes PDF and image files – black & white, grayscale, color – without having to keep an eye on file format and color – all functions are implemented across.
Integrated scan function to scan documents via local connected scanners. The scan settings can be chosen directly by preconfigured scan-profiles.
Capturing of documents out of folders – display as document list e.g. for multifunction devices, network scanners wiht scan to folder function or via printer driver created, also as to process via e-mail received documents.
Quick changing of the document names – with automatic selection of the next file in the list, after finish of the change.
Area OCR by local integrated OCR engine, to assign file names.
Deleting / cutting out areas
Page preview – zoom, paging, turning – as well as thumbnail miniatures of the whole document
Page oriented document processing – turn pages left/right, delete pages, page moving in the thumbnail view via drag&drop.
split total document – at marked pages, after x pages, after barcode.
Merge single documents to a total one – specifying the order, automatic deleting of the single documents.
Export – into a folder, send as e-mail attachment or store into Alfresco with metadata via profiling – in the native format as PDF image or PDF-OCR
Within the export – generating of searchable PDF-OCR documents by the local integrated iOCR engine or by the via web-service integrated AutoOCR Server with Abbyy OCR
Intelligent OCR processing – only image pages get OCR processed – normal PDF pages get taken over without changes .

ifresco Profiler plugins:

The profile form and the logic of the profiling for the deposit of the documents in Alfresco gets realized from the ifresco Profiler via plugins. Because every company has it’s own data model and deposit logic, the plugins get developed and implemented individually after specifications. Here it’s possible to fall back to already realized plugins. For testing and illustration of the possibilities there is a demo plugin as well as already realized plugins available.

installable plugins – for profiling and capturing of metadata for filing of documents in Alfresco.
One or more plugins can be installed, chosen and with that, also switched to other Alfresco servers – each plugin contains its own individual logic for the profiling as stand-alone installed .NET / C# application which inserts itself into the ifresco Profiler base framework and uses its functions.
parallel displaying of the profile mask and the document preview with the capturing of the metadata.
free programmable logic and functions of the profiling mask with e.g. extern XML template rules with dynamic fields to always build the name / title the same, access to external data sources – MS-XLS, SQL, web-service (e.g. SugarCRM), linked tables and pre-assignment of fields with values of the table, type ahead part-string search over single or combined fields, usage of Alfresco categories as lookup’s, assignment of existing Alfresco tags, automatic new applyment of tags, automatic creation of the Alfresco folder structure as well as the file names out of profile field values, searching for folders in Alfresco, counter via web-service, stamping of the document with informations from the metadata before the upload, searching for in Alfresco available documents and takeover of profile values and so on.
interactive processing – with OCR and upload or alternative
background / batch processing – for PDF-OCR conversion and Alfresco upload – the user is already able to continue working while the OCR processing and the Alfresco upload takes place in the background.
preserve existing profile values / delete mask
automatic loading of the next document in the list – processed document gets deleted ore moved into the archiving area after upload.

Download ifresco Profiler >>>

Intelligent PDF OCR processing via AutoOCR for Abbyy and iOCR

2013-07-11

PDF documents can be generated in different ways. PDFs are able to summarize various contents and sources in one document. Pages can be constructed from “normal” PDF content consisting of text, images, and vector graphics, and typically already have textual content that can be used for full text indexing and search. However, a PDF document can also contain scanned pages in black and white or color. Such pages or documents must undergo OCR recognition to insert the textual information for indexing and searching.

So there are certain PDF documents which either should not be subjected to any OCR processing, or only individual pages or all of them have to be processed because they were generated by a scan process.

Normally, all these types of PDF documents occur in business processes and the user can not distinguish whether or not a document needs to become OCR – viewed from the outside via the Adobe Reader or on the printer, this can not be immediately recognized and distinguished.

If you would generally process every PDF document / page in the same way, regardless of how they are structured and whether an OCR processing makes sense or not, there would be some disadvantages:

Each PDF page is “rasterized” again, regardless of the structure and content, ie converted into an image and then processed OCR. This is like printing the document, scanning it again and then subjecting it to OCR processing. This gives you a picture from a “normal” PDF page with underlying text recognized by the OCR engine.

the quality is not the same as before
the documents become bigger
special PDF properties are lost (bookmarks, links, etc.)
Processing time and resources are consumed
OCR page licenses are consumed unnecessarily

A PDF OCR processing should therefore be “intelligent”, so that in the process and by the user does not have to decide with difficulty whether a PDF document must be subjected to OCR processing or not. Even more difficult is when a single PDF document consists of mixed normal and scanned parts.

That’s why we’ve integrated intelligent OCR processing into AutoOCR, which works in the same way with both the Abbyy and the iOCR OCR engine. This can be controlled per input folder or for the web service interface via the OCR profile and is available for both PDF> PDF and PDF> TXT processing.

Highlights – Intelligent PDF OCR processing:

works for both PDF> PDF and PDF> TXT processing
for the Abbyy OCR and iOCR engine
at the folder as well as for the web service processing
the PDF document as well as every single page are analyzed and only those pages OCR are processed that do not contain any text – these are usually scanned pages that have not yet been processed by OCR.
existing normal PDF documents and pages are taken over unchanged and not processed
OCRed documents and pages are not processed again.
in the case of PDF> TXT processing, the text is extracted from the normal PDF pages and OCR is only performed on pages without text.
PDF functions and bookmarks are retained and are included in the target document.
Saves processing time and Abbyy OCR page licenses
the files are not enlarged
the quality of the PDF pages is preserved.

The “intelligent PDF OCR processing” can be found in addition to AutoOCR in all other of our software products that support OCR processing z.b. ifresco Profiler, FileConverter, DropOCR, PDFMerge, etc.

ifresco Transformer for AutoOCR – JavaDoc for the AutoOCR Client available

2013-07-11

To use the AutoOCR Client functions of the Alfresco AutoOCR integration – ifresco Transformer for AutoOCR – out of Java applications now there also is a JavaDoc description of the available functions.

Download JavaDoc for AutoOCR Client >>>