Data Types - TechDocs

Overview

This guide outlines the data types used in the AI Data Capture SDK.

BBox

The Bounding Box BBox class, a subclass of the Rect class, is a core component for handling bounding boxes, rectangular frames that enclose detected objects within an image, defining their position and dimensions. Each bounding box includes:

Probability (Confidence) - The confidence score of the detection.
Class - The label of the detected object.
Coordinates - The coordinates: xmin, ymin, xmax, and ymax.

The BBox class contains attributes such as class label (cls) and probability (prob), which are essential for identifying and assessing detected objects.

The following sections cover the attributes for BBox.

cls

    public int cls

Description: An integer that represents the class label of the detected object, corresponding to a specific category or class in the object detection model.

prob

    public float prob

Description: A float value that indicates the probability or confidence score associated with the detection of the object, typically ranging from 0.0 to 1.0.

xmin

    public float xmin

Description: The x-coordinate of the lower-left corner.

ymin

    public float ymin

Description: The y-coordinate of the lower-left corner.

xmax

    public float xmax

Description: The x-coordinate of the upper-right corner.

ymax

    public float ymax

Description: The y-coordinate of the upper-right corner.

Complex BBox

The ComplexBBox, or Complex Bounding Box, class is a core component for handling bounding polygon that encloses detected text within an image, defining its position and dimensions. Each complex bounding box includes:

Probability (Confidence) - The confidence score of the detection.
Coordinates - The coordinates of the bounding polygon.

The following sections cover the variables for ComplexBBox.

x[ ]

    public float[] x

Description: An array of floats representing the x-coordinates of the vertices of the complex bounding box, allowing for polygons beyond simple rectangles.

y[ ]

    public float[] y

Description: An array of floats representing the y-coordinates corresponding to the x array, defining the vertices of the complex bounding box.

prob

    public float prob

Description: A float representing the probability or confidence score associated with the bounding box, indicating the confidence level of the detected text.

DecodedText

The DecodedText class represents recognized text along with an associated confidence score, allowing applications to choose the best interpretation based on context or additional processing.

The following sections cover the attributes for DecodedText.

content

    public String content

Description: A String that contains the recognized or decoded text.

confidence

    public float confidence

Description: A float representing the confidence level in the accuracy of the recognized text, typically ranging from 0.0 (no confidence) to 1.0 (full confidence)..

Descriptor

The Descriptor class represents the feature descriptors generated by the FeatureExtractor as a vector of float values, along with the model version used to generate these descriptors.

The following sections cover the methods for Descriptor.

getModelVersionInfo()

    public String getModelVersionInfo()

Description: Returns the model version used to generate feature descriptors.

getDescriptorInfo()

    public float[][] getDescriptorInfo()

Description: Returns feature descriptors that are generated by the FeatureExtractor.

OCRResult

The OCRResult class represents the result of an Optical Character Recognition (OCR) operation. It provides information about the recognized text and its location within an image. This is essential for applications that require precise localization and extraction of text from images, such as in automated document processing and image-based data entry.

The following sections cover the attributes for OCRResult.

bbox

Description: An instance of the ComplexBBox class, which defines the rectangular area in the image where the text was detected.

text

    public String text

Description: Contains the recognized text within the bounding box. This attribute provides direct access to the textual content obtained from the OCR process, allowing applications to utilize this data for further analysis or processing.

TextLine

An array of TextLine objects representing the lines of text in a paragraph.

The following sections cover the attributes and methods for TextLine.

bbox

    public ComplexBBox bbox

Description: Represents the ComplexBBox class.

decodes

    public Word[] decodes

Description: Represents an array of Word objects.

toString()

    public String toString()

Description: Returns a string representation of the TextLine, concatenating the content of all words with spaces. Returns an empty string if there are no words.

TextParagraph

The TextParagraph class represents a paragraph of text detected within an image, providing a structured method to access the individual lines of text and their bounding information.

The following sections cover the attributes and methods for TextLine.

bbox

    public ComplexBBox bbox

Description: Represents the ComplexBBox class.

lines

    public TextLine[] lines

Description: An array of TextLine objects representing the lines of text in the paragraph. This structure allows for detailed analysis and processing of each line, enabling applications to reconstruct the paragraph’s text content, understand its structure, and perform further operations like editing or translation.

toString()

    public String toString()

Description: Returns a string representation of the TextParagraph, concatenating the content of all words in each line with newline characters. Returns an empty string if there are no lines.

Word

The Word class represents a word extracted from a document, including spatial information about the word's location through a bounding box and multiple potential decoded text interpretations.

The following sections cover the variables for Word.

bbox

    public ComplexBBox bbox

Description: Represents the ComplexBBox class.

decodes

    public DecodedText[] decodes

Description: Represents an array of DecodedText objects.

Related Guides: