Overview
This guide outlines the data types used in the AI Data Capture SDK.
BBox
The Bounding Box BBox class, a subclass of the Rect class, is a core component for handling bounding boxes, rectangular frames that enclose detected objects within an image, defining their position and dimensions. Each bounding box includes:
- Probability (Confidence) - The confidence score of the detection.
- Class - The label of the detected object.
- Coordinates - The coordinates: xmin, ymin, xmax, and ymax.
The BBox class contains attributes such as class label (cls) and probability (prob), which are essential for identifying and assessing detected objects.
The following sections cover the attributes for BBox.
cls
public int cls
Description: An integer that represents the class label of the detected object, corresponding to a specific category or class in the object detection model.
prob
public float prob
Description: A float value that indicates the probability or confidence score associated with the detection of the object, typically ranging from 0.0 to 1.0.
xmin
public float xmin
Description: The x-coordinate of the lower-left corner.
ymin
public float ymin
Description: The y-coordinate of the lower-left corner.
xmax
public float xmax
Description: The x-coordinate of the upper-right corner.
ymax
public float ymax
Description: The y-coordinate of the upper-right corner.
Complex BBox
The ComplexBBox, or Complex Bounding Box, class is a core component for handling bounding polygon that encloses detected text within an image, defining its position and dimensions. Each complex bounding box includes:
- Probability (Confidence) - The confidence score of the detection.
- Coordinates - The coordinates of the bounding polygon.
The following sections cover the variables for ComplexBBox.
x[ ]
public float[] x
Description: An array of floats representing the x-coordinates of the vertices of the complex bounding box, allowing for polygons beyond simple rectangles.
y[ ]
public float[] y
Description: An array of floats representing the y-coordinates corresponding to the x array, defining the vertices of the complex bounding box.
prob
public float prob
Description: A float representing the probability or confidence score associated with the bounding box, indicating the confidence level of the detected text.
DecodedText
The DecodedText class represents recognized text along with an associated confidence score, allowing applications to choose the best interpretation based on context or additional processing.
The following sections cover the attributes for DecodedText.
content
public String content
Description: A String that contains the recognized or decoded text.
confidence
public float confidence
Description: A float representing the confidence level in the accuracy of the recognized text, typically ranging from 0.0 (no confidence) to 1.0 (full confidence)..
Descriptor
The Descriptor class represents the feature descriptors generated by the FeatureExtractor as a vector of float values, along with the model version used to generate these descriptors.
The following sections cover the methods for Descriptor.
getModelVersionInfo()
public String getModelVersionInfo()
Description: Returns the model version used to generate feature descriptors.
getDescriptorInfo()
public float[][] getDescriptorInfo()
Description: Returns feature descriptors that are generated by the FeatureExtractor.
OCRResult
The OCRResult class represents the result of an Optical Character Recognition (OCR) operation. It provides information about the recognized text and its location within an image. This is essential for applications that require precise localization and extraction of text from images, such as in automated document processing and image-based data entry.
The following sections cover the attributes for OCRResult.
bbox
Description: An instance of the ComplexBBox class, which defines the rectangular area in the image where the text was detected.
text
public String text
Description: Contains the recognized text within the bounding box. This attribute provides direct access to the textual content obtained from the OCR process, allowing applications to utilize this data for further analysis or processing.
TextLine
An array of TextLine objects representing the lines of text in a paragraph.
The following sections cover the attributes and methods for TextLine.
bbox
public ComplexBBox bbox
Description: Represents the ComplexBBox class.
decodes
public Word[] decodes
Description: Represents an array of Word objects.
toString()
public String toString()
Description: Returns a string representation of the TextLine, concatenating the content of all words with spaces. Returns an empty string if there are no words.
TextParagraph
The TextParagraph class represents a paragraph of text detected within an image, providing a structured method to access the individual lines of text and their bounding information.
The following sections cover the attributes and methods for TextLine.
bbox
public ComplexBBox bbox
Description: Represents the ComplexBBox class.
lines
public TextLine[] lines
Description: An array of TextLine objects representing the lines of text in the paragraph. This structure allows for detailed analysis and processing of each line, enabling applications to reconstruct the paragraph’s text content, understand its structure, and perform further operations like editing or translation.
toString()
public String toString()
Description: Returns a string representation of the TextParagraph, concatenating the content of all words in each line with newline characters. Returns an empty string if there are no lines.
Word
The Word class represents a word extracted from a document, including spatial information about the word's location through a bounding box and multiple potential decoded text interpretations.
The following sections cover the variables for Word.
bbox
public ComplexBBox bbox
Description: Represents the ComplexBBox class.
decodes
public DecodedText[] decodes
Description: Represents an array of DecodedText objects.
Related Guides: