OCR Programmer's Guide

EMDK For Android 13.0

Overview

Optical Character Recognition (OCR) is a feature that enables the conversion of text images into machine-encoded text. In EMDK 7.5 (and higher), the Barcode API can configure the device scanner to enables an app to capture various OCR font types as text. This functionality is modeled as decoder types (OCRA, OCRB, MICRE13B and USCurrency) exposed through the Barcode API. The captured OCR data can be retrieved from the data returned to the application from a scan event using the onData callback.

Enable OCR

Before an application can capture using OCR, the decoder that corresponds with the OCR font type (OCRA, OCRB, MICRE13B, USCurrency) must be enabled. To do so, get an instance of a scanner object (see the Barcode Scanning API Programmer's Guide for details). Note: For OCR-A and OCR-B, selecting the most appropriate font variant optimizes performance and accuracy.

Once initialized, modify the scanner configuration as below:

    
    ScannerConfig config = scanner.getConfig();
    config.decoderParams.ocrA.enabled = true; //enable OCRA decoder
    config.decoderParams.ocrA.ocrAVariant = ScannerConfig.OcrAVariant.FULL_ASCII; //select required variant
    scanner.setConfig(config);

Configure Parameters

Set the parameters based on specific app requirements.

Default values for OCR parameters:

    
    ScannerConfig config = scanner.getConfig();
    config.ocrParams.inverseOcr = ScannerConfig.InverseOcr.REGULAR_ONLY;
    config.ocrParams.checkDigitModulus = 1;
    config.ocrParams.checkDigitMultiplier = "121212121212";
    config.ocrParams.checkDigitValidation = ScannerConfig.OcrCheckDigitValidation.NONE;
    config.ocrParams.ocrLines = ScannerConfig.OcrLines.ONE_LINE;
    config.ocrParams.maxCharacters = 100;
    config.ocrParams.minCharacters = 3;
    config.ocrParams.orientation = ScannerConfig.OcrOrientation.DEGREE_0;
    config.ocrParams.quietZone = 50;
    config.ocrParams.subset = "!\"#$%()*+,-./0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\\^|";
    config.ocrParams.template = "99999999";
    scanner.setConfig(config);

OCR Parameters

Inverse OCR

White or light words on a black or dark background.

Options for decoding

  • Regular Only - Decode regular OCR (black on white) strings only (default)
  • Inverse Only - Decode inverse OCR (white on black) strings only
  • Auto-discriminate - Decode regular and inverse OCR strings

OCR Check Digit Modulus

Sets the OCR module check digit calculation. The check digit is the digit in the right-most position in an OCR string and helps improve the accuracy of the collected data. It is the end product of a calculation made on the incoming data. For check digit calculation, for example, Modulus 10 alpha and numeric characters are assigned numeric weights. The calculation is applied to the character weights and the resulting check digit is added to the end of the data. If the incoming data does not match the check digit, the data is considered corrupt. The selected check digit option does not take effect until the OCR Check Digit Validation is set.

Possible values:

  • Low - 1 (default)
  • High - 99

OCR Check Digit Multiplier

Sets OCR check digit multipliers for the character positions. For check digit validation, each character in scanned data has an equivalent weight used in the check digit calculation.

Possible values:

  • Minimum length - 1
  • Maximum Length - 100
  • Default = 121212121212

OCR Check Digit Validation

Protects against scanning errors by applying a check digit validation scheme.

Possible values:

  • None - 0 (default)

  • Product Add Left to Right - Each character in the scanned data is assigned a numeric value. Each digit representing a character in the scanned data is multiplied by its corresponding digit in the multiplier, and the sum of these products is computed. The check digit passes if this sum modulo Check Digit Modulus is zero.

Example: Scanned data numeric value is 132456 (check digit is 6). Check digit multiplier string is 123456

  • Digits: 1 3 2 4 5 6
  • Multipliers: 1 2 3 4 5 6
  • Products: 1 6 6 16 25 36
  • Sum of products: 1+6+6+16+25+36 = 90

If the Check Digit Modulus is 10, it passes because 90 is divisible by 10 (the remainder is zero).

Product Add Right to Left - Each character in the scanned data is assigned a numeric value. The check digit multiplier is reversed in order. Each value representing a character in the scanned data is multiplied by its corresponding digit in the reversed multiplier, resulting in a product for each character in the scanned data. The sum of these products is computed. The check digit passes if this sum modulo Check Digit Modulus is zero. Example: Scanned data numeric value is 132459 (check digit is 9). Check digit multiplier string is 123456.

  • Digits: 1 3 2 4 5 9
  • Multipliers: 6 5 4 3 2 1
  • Products: 6 15 8 12 10 9
  • Sum of products: 6+15+8+12+10+9 = 60

If the Check Digit Modulus is 10, it passes because 60 is divisible by 10 (the remainder is 0).

Digit Add Left to Right - Each character in the scanned data is assigned a numeric value. Each value representing a character in the scanned data is multiplied by its corresponding digit in the multiplier, resulting in a product for each character in the scanned data. The sum of each individual digit in all of the products is then calculated. The check digit passes if this sum modulo Check Digit Modulus is zero.

Example: Scanned data numeric value is 132456 (check digit is 6). Check digit multiplier string is 123456.

  • Digits: 1 3 2 4 5 6
  • Multipliers: 1 2 3 4 5 6
  • Products: 1 6 6 16 25 36
  • Sum of product digits: 1+6+6+1+6+2+5+3+6 = 36

If the Check Digit Modulus is 12, it passes because 36 is divisible by 12 (the remainder is 0).

Digit Add Right to Left - Each character in the scanned data is assigned a numeric value. The check digit multiplier is reversed in order. Each value representing a character in the scanned data is multiplied by its corresponding digit in the reversed multiplier, resulting in a product for each character in the scanned data. The sum of each individual digit in all of the products is then calculated. The check digit passes if this sum modulo Check Digit Modulus is zero.

Example: Scanned data numeric value is 132456 (check digit is 6). Check digit multiplier string is 123456.

  • Digits: 1 3 2 4 5 6
  • Multipliers: 6 5 4 3 2 1
  • Products: 6 15 8 12 10 6
  • Sum of product digits: 6+1+5+8+1+2+1+0+6 = 30

The Check Digit Modulus is 10. It passes because 30 is divisible by 10 (the remainder is 0).

Product Add Right to Left Simple Remainder - Each character in the scanned data is assigned a numeric value. The check digit multiplier is reversed in order. Each value representing a character in the scanned data is multiplied by its corresponding digit in the reversed multiplier, resulting in a product for each character in the scanned data. The sum of these products except for the check digit's product is computed. The check digit passes if this sum modulo Check Digit Modulus is equal to the check digit's product.

Example: Scanned data numeric value is 122456 (check digit is 6). Check digit multiplier string is 123456.

  • Digits: 1 2 2 4 5 6
  • Multipliers 6 5 4 3 2 1
  • Products 6 10 8 12 10 6
  • Sum of products: 6+10+8+12+10 = 46

The Check Digit Modulus is 10. It passes because 46 divided by 10 leaves a remainder of 6.

Digit Add Right to Left Simple Remainder - Each character in the scanned data is assigned a numeric value. The check digit multiplier is reversed in order. Each value representing a character in the scanned data is multiplied by its corresponding digit in the reversed multiplier, resulting in a product for each character in the scanned data. The sum of each individual digit in all of the products except for the check digit's product is then calculated. The check digit passes if this sum modulo Check Digit Modulus is equal to the check digit's product.

Example: Scanned data numeric value is 122459 (check digit is 6). Check digit multiplier string is 123456.

  • Digits: 1 2 2 4 5 9
  • Multipliers: 6 5 4 3 2 1
  • Products: 6 10 8 12 10 9
  • Sum of product digits: 6+1+0+8+1+2+1+0+= 19

The Check Digit Modulus is 10. It passes because 19 divided by 10 leaves a remainder of 9.

Health Industry - HIBCC43 - The health industry module 43 check digit standard. The check digit is the modulus 43 sum of all the character values in a given message and is printed as the last character in a given message.

OCR Lines

Used to select the number of OCR lines to decode.

  • 1 Line (default)
  • 2 Lines
  • 3 Lines

OCR Maximum Characters

Select the maximum number of OCR characters (including spaces) per line to decode.

Possible values:

  • Low - 3
  • High – 100 (default)

OCR Minimum Characters

Select the minimum number of OCR characters (not including spaces) per line to decode.

Possible values:

  • Low - 3 (default)
  • High - 100

OCR Orientation

Select the orientation of an OCR string to be read. Setting an incorrect orientation can cause mis-decodes. Options:

Possible values:

  • 0 degrees - to the imaging engine (default)
  • 270 degrees - clockwise (or 90 degrees counterclockwise) to the imaging engine
  • 180 degrees - (upside down) to the imaging engine
  • 90 degrees - clockwise to the imaging engine
  • Omni-directional

OCR Quiet Zone

Sets the field width of blank space to stop scanning during OCR reading.

Possible values:

  • Low - 20
  • High - 99
  • Default = 50

OCR Subset

Defines a custom group of characters in place of a preset font variant. For example, if scanning only numerals and the letters A, B, and C, create a subset of just these characters to speed decoding. This applies a designated OCR Subset across all enabled OCR fonts.

Possible values:

  • Minimum length - 1
  • Maximum Length – 100
  • Default = !"#$%()*+,-./0123456789<>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\^|)

OCR Template

Creates a template for precisely matching scanned OCR characters to a desired input format. Carefully constructing an OCR template eliminates scanning errors. The template expression is formed by numbers and letters. The default is 99999999 which accepts any numeric character OCR string. If there are less than 8 '9' characters, the '9' represents only numerical values.

Possible values:

  • Minimum length - 3
  • Maximum Length - 100
  • Default = 99999999

OCR Template Operators

The template operators in the following table can assist in capturing, delimiting and formatting scanned OCR data. OCR template expressions are formed by numbers and letters arranged in a sequence. Refer to the Zebra DS36X8 Reference Guide Chapter 15 for more information.

Name Description Template Valid Data Invalid Data/
Outgoing Data
Required Digit (9) Accepts a numeric character only in this position. 99999 12987 123AB
Required Alpha (A) Accepts an alpha character only in this position. AAA ABC 12F
Require and Suppress (0) Any character in this position is suppressed from the output, including space and reject. 990AA 12QAB 12AB
Optional Alphanumeric (1) Accepts an alpha-numeric character, if present. Optional characters are not allowed as the first character(s) in a field of like characters. 99991 1234A 1234<
Optional Alpha (2) Accepts an alpha character, if present. Optional characters are not allowed as the first character(s) in a field of like characters. AAAA2 ABCDE ABCD6
Alpha or Digit (3) An alpha-numeric character is required in this position to validate the incoming data. 33333 12ABC 12AB<
Any Including Space and Reject (4) Accepts any character in this position, including spaces and rejects. Rejects are represented by an underscore (_) character in the output. This is a good selection for troubleshooting. 99499 12$34
34_98
Any except Space and Reject (5) Accepts any character in this position except a space or reject. 55999 A.123
*Z456
A BCD
Optional Digit (7) Accepts a numeric character, if present. Optional characters are not allowed as the first character(s) in a field of like characters. 99977 12345
789
789AB
Digit or Fill (8) Accepts any numeric or fill character in this position. 88899 12345
>>789
<<789
Alpha or Fill (F) Accepts any alpha or fill character in this position. AAAFF ABCXY
LMN>>
ABC<5
Optional Space ( ) Accepts a space, if present. Optional characters are not allowed as the first character(s) in a field of like characters. 99 99 12 34
1234
67891
Optional Small Special (.) Accepts a special character, if present. Optional characters are not allowed as the first character(s) in a field of like characters. Small special characters are - (dash) and . (dot) AA.99 MN.35
XY98
XYZ12
Other Template Operators. These template operators assist in capturing, delimiting and formatting scanned OCR data.
Literal String (" and +) Use either of these delimiting characters surrounding alphanumeric characters to define a literal string within a template that must be present in scanned OCR data. There are two characters used to delimit required literal strings; if one of the delimiter characters is already present in the desired literal string, use the other delimiter. "35+BC" 35+BC AB+22
New Line (E) To create a template of multiple lines, add an "E" between the template of each single line. 999EAAAA 321
BCAD
XYZW
12
String Extract (C) This operator combined with others defines a string of characters to extract from the scanned data. The string extract is structured as follows:

CbPe

Where:
• "C" is the string extract operator
• "b" is the string begin delimiter
• "P" is the category (one or more numeric or alpha characters) describing the string representation
• "e" is the string end delimiter

Values for "b" and "e" can be any character that can be scanned and are included in the output stream.
C>A> XQ3>ABCDE>
->ATRU>123
>ABCDE>
>ATRU>
Ignore to End of Field (D) This operator causes all characters after a template to be ignored. Use this as the last character in a template expression. 999D 123-PED
357298
123
357
Skip Until (P1) This operator allows skipping over characters until a specific character type or a literal string is detected. It can be used in two ways:

P1ct

Where:
• "P1" is the "Skip Until" operator
• "c" is the type of character that triggers the start of output
• "t" is one or more template characters

P1"s"t

Where:
• P1 is the "Skip Until" operator
• "s" is one or more literal string characters that trigger the start of output
• t is one or more template characters

The trigger character or literal string is included in output from a "Skip Until" operator, and the first character in the template should accommodate this trigger.
P1"PN"AA9999 123PN9876
X-PN3592
PN9876
PN3592
Skip Until Not (P0) This operator allows skipping over characters until a specific character type or a literal string is not matched in the output stream. It can be used in two ways:

P0ct

Where:
• P0 is the "Skip Until Not" operator
• "c" is the type of character that triggers the start of output
• "t" is one or more template characters

P0"s"t

Where:
• "P0" is the "Skip Until Not" operator
• "s" is one or more literal string characters that trigger the start of output
• "t" is one or more template characters

The trigger character or literal string is included in output from a "Skip Until Not" operator.
P0A9999 BPN3456
X-PN3592
5341
No output
Repeat Previous (R) This operator allows a template character to repeat one or more times, allowing the capture of variable-length scanned data. The following examples capture two required alpha characters followed by one or more required digits: AA9R AB3
AB3
32RM52700
PN12345
PN12345
No output
Scroll Until Match (S) This operator steps through scanned data one character at a time until the data matches the template. S99999 AB3
PN12345
32RM52700
No output
12345
52700

Multiple Templates

The multiple templates feature sets up two or more templates for OCR decoding, with a capital letter "X" as the separator between strings in the template. For example, setting the OCR Template as "99999XAAAAA" decodes OCR strings of either "12345" or "ABCDE." Additional sample template strings are shown below with descriptions of data that would be valid for each template.

  • "M99977"- injects a capital letter M followed by three required numerical characters (numerals) and two optional numerals to be acquired.

  • "X997777X"- begins with a capital X followed by two required numerals, four optional numerals and another X.

  • "9959775599"- defines two numerals followed by any character, another required numeral, two optional numerals, any two alpha-numerical characters and two additional numerals.

  • "A55-999-99"- requires an alpha character followed by any two alpha-numeric characters, a dash, three numerals, a dash, and two more numerals.

  • "33A.99"- defines two alpha-numeric characters followed by a letter a "dot" (period) and two required numerals.

  • "999992991"- defines five numerals followed by an optional alpha-numeric character plus two numerals and an optional alpha-numeric character.

  • "PN98"- is a literal field.


Also See

Zebra DS36X8 Reference Guide (PDF) | Chapter 15 covers OCR programming