Applies to version: 2020.1.x; author: Wojciech Kołodziej
The OCR AI Learn mechanism allows you to create dedicated templates for recognizing documents in the process. Depending on the type of processed documents, various different distinguishers are used, which clearly indicate which template to use for the document. For example, in the invoice process – the contractor’s Tax ID is the most used. If in the process there are several types of documents, it’s a good idea to create a distinguisher based on the Tax ID and acronym of the form type. During the teaching process, the mechanism chooses the areas where values are most likely to occur and keywords that are most likely to be found within the vicinity of the value.
This article describes the “OCR AI learn” process. For more information about OCR, check out the following articles:
If the teaching process is started for the first time for a given contractor, a new dedicated template is created. The list of all templates can be found in the “System settings” tab in WEBCON BPS Designer Studio.
Fig. 1. OCR AI project in the “System settings” tab
If there was already a template for the contractor, it is trained during teaching and a new version of the template is created.
Fig. 2. OCR projects management
How to create a new document recognition template?
The process of creating or teaching OCR AI templates consists of several stages:
1. Selection of elements to teach
A user validates the recognition of the values. If a value is corrected, you can select it for the teaching process – just check the field in the “Teach” column.
All selected fields will be used to correct the template for a given contractor.
Fig. 3.Teaching based on the contractor’s invoice
If no field is selected for teaching – the process will no be started.
2. Creating a new template based on existing documents
In the teaching process, a maximum of 100 most recent documents will be used which:
If you need a non-standard way of selecting documents for the teaching process, e.g. If you want to teach based on 2 document types or only documents from the verification step – you can change the attachments selection criteria by using the SQL query.
Fig. 4. The example criteria of attachments selection
Remember to include the distinguisher (if the dedicated network mechanisms are used) and document type in the query – otherwise, all attachments that have passed the OCR AI recognition process will be used.
You can see the created templates in the “OCR AI Learn Queue” report in WEBCON BPS Designer Studio.
Fig. 5. OCR AI learn queue
3. The new template will appear in OCR AI projects in the “System settings” tab
By clicking on the template from the selected provider, you can see which version of the template has been created.
Fig. 6. The example OCR AI project
From the moment a new template appears, it will be used to recognize new documents. If you want to restore to an earlier version:
Fig. 7. The example scan that is not suitable for teaching
If a suitable place was specified during the recognition, but after copying the characters are wrong - the template should not be taught. Bad characters are the result of poor scan quality or unusual fonts.
Fig. 8. The example invoice
If on the first 10 invoices, the gross amount is mistakenly based on the Taxable Amount. Even if we rectify this in the 11th invoice and indicate the gross amount from the Invoice Total line, the template will continue to collect the value from the Taxable Amount field.
In the teaching process, 10 incorrect verified invoices will be used and only one correct. Therefore, thorough verification of all documents is crucial. The place of recognition of the value will be corrected when the number of correctly verified documents in the database is significantly greater than the number of incorrect documents.
Fig. 9. The example Invoice
For example, if the due date was entered as:
Fig. 10. The example invoice
it will not be possible to copy it to the date field on the form.
Likewise, if the values are incorrectly formatted on the document. In the example below, the date is written together with the year, so both the date and the "r" (Description field) are copied, which prevents this value from being entered into a date field.
Fig. 11. The example invoice
Fig. 12. The example Invoice