Overview
This guide outlines the data types used in the AI Data Capture SDK.
BBox
The Bounding Box BBox
class, a subclass of the Rect
class, is a core component for handling bounding boxes, rectangular frames that enclose detected objects within an image, defining their position and dimensions. Each bounding box includes:
- Probability (Confidence) - The confidence score of the detection.
- Class - The label of the detected object.
- Coordinates - The coordinates: xmin, ymin, xmax, and ymax.
The BBox
class contains attributes such as class label (cls
) and probability (prob
), which are essential for identifying and assessing detected objects.
The following sections cover the attributes for BBox
.
cls
public int cls
Description: An integer that represents the class label of the detected object, corresponding to a specific category or class in the object detection model.
prob
public float prob
Description: A float value that indicates the probability or confidence score associated with the detection of the object, typically ranging from 0.0 to 1.0.
xmin
public float xmin
Description: The x-coordinate of the lower-left corner.
ymin
public float ymin
Description: The y-coordinate of the lower-left corner.
xmax
public float xmax
Description: The x-coordinate of the upper-right corner.
ymax
public float ymax
Description: The y-coordinate of the upper-right corner.
Complex BBox
The ComplexBBox
, or Complex Bounding Box, class is a core component for handling bounding polygon that encloses detected text within an image, defining its position and dimensions. Each complex bounding box includes:
- Probability (Confidence) - The confidence score of the detection.
- Coordinates - The coordinates of the bounding polygon.
The following sections cover the variables for ComplexBBox
.
x[ ]
public float[] x
Description: An array of floats representing the x-coordinates of the vertices of the complex bounding box, allowing for polygons beyond simple rectangles.
y[ ]
public float[] y
Description: An array of floats representing the y-coordinates corresponding to the x array, defining the vertices of the complex bounding box.
prob
public float prob
Description: A float representing the probability or confidence score associated with the bounding box, indicating the confidence level of the detected text.
CompletableFuture
CompletableFuture
is a Java class used for asynchronous programming. It represents a future result of an asynchronous computation, providing a powerful and flexible way to write non-blocking, asynchronous code. Byt using CompletableFuture
, operations can be executed in a non-blocking manner, allowing the main thread to continue executing other tasks while the asynchronous operation is in progress.
Many APIs of AI Data Capture SDK are defined to return a CompletableFuture
object for asynchronous processing of data.
For more details, refer to the official documentation.
DecodedText
The DecodedText
class represents recognized text along with an associated confidence score, allowing applications to choose the best interpretation based on context or additional processing.
The following sections cover the attributes for DecodedText
.
content
public String content
Description: A String that contains the recognized or decoded text.
confidence
public float confidence
Description: A float representing the confidence level in the accuracy of the recognized text, typically ranging from 0.0 (no confidence) to 1.0 (full confidence)..
Descriptor
The Descriptor
class represents the feature descriptors generated by the FeatureExtractor
as a vector of float values, along with the model version used to generate these descriptors.
The following sections cover the methods for Descriptor
.
getModelVersionInfo()
public String getModelVersionInfo()
Description: Returns the model version used to generate feature descriptors.
getDescriptorInfo()
public float[][] getDescriptorInfo()
Description: Returns feature descriptors that are generated by the FeatureExtractor
.
ImageData
The ImageData
class acts as a container for image data sourced from the process() API. It provides a unified interface for detectors to access image information, ensuring orientation adjustments and standardizing the format required for AI processing. The following sections outline the methods available in this class.
getBitmap()
Bitmap ImageData.getBitmap()
Description: Retrieves the bitmap representation of the image data.
Return Value: Returns the Bitmap representation of the image.
getRotationDegrees()
int ImageData.getRotationDegrees()
Description: Retrieves the rotation of the image in degrees.
Return Value: Returns an integer representing the rotation in degrees (0, 90, 180, or 270).
fromImageProxy (ImageProxy imageProxy)
static ImageData ImageData.fromImageProxy(ImageProxy imageProxy) throws InvalidInputException
Description: Creates an ImageData instance from an ImageProxy. This method converts a CameraX ImageProxy
to an ImageData
object, using the rotation information from the ImageProxy
itself.
Parameters:
- ImageProxy - The
ImageProxy
to convert.
Return Value:
- ImageData - A new
ImageData
instance containing the image data.
Exceptions:
- InvalidInputException - Thrown when the input imageProxy is null, indicating an invalid input parameter.
fromMediaImage (Image image, int rotationDegrees)
static ImageData ImageData.fromMediaImage(Image image, int rotationDegrees) throws InvalidInputException
Creates an ImageData
instance from a media Image
. This factory method converts an android.media.Image
into an ImageData
object, applying the specified rotation to ensure correct orientation.
Parameters:
- image - The media
Image
to convert. - rotationDegrees - The rotation to apply in degrees.
Return Value:
- ImageData - A new
ImageData
instance containing the image data.
Exceptions:
- InvalidInputException - Thrown if the image is null or the orientation is not supported.
fromBitmap(Bitmap bitmap, int rotationDegrees)
static ImageData ImageData.fromBitmap(Bitmap bitmap, int rotationDegrees) throws InvalidInputException
Creates an ImageData
instance from a given bitmap with specified rotation degrees. This factory method transforms a bitmap into an ImageData
object, applying the specified rotation to ensure the image is correctly oriented for processing.
Parameters:
- bitmap - The bitmap from which to create
ImageData
. - rotationDegrees - The rotation to apply to the image in degrees.
Return Value:
- ImageData - A new
ImageData
instance containing the bitmap data.
Exceptions:
- InvalidInputException - Thrown if the bitmap is null or the rotation is unsupported.
OCRResult
The OCRResult
class represents the result of an Optical Character Recognition (OCR) operation. It provides information about the recognized text and its location within an image. This is essential for applications that require precise localization and extraction of text from images, such as in automated document processing and image-based data entry.
The following sections cover the attributes for OCRResult
.
bbox
Description: An instance of the ComplexBBox class, which defines the rectangular area in the image where the text was detected.
text
public String text
Description: Contains the recognized text within the bounding box. This attribute provides direct access to the textual content obtained from the OCR process, allowing applications to utilize this data for further analysis or processing.
TextLine
An array of TextLine
objects representing the lines of text in a paragraph.
The following sections cover the attributes and methods for TextLine
.
bbox
public ComplexBBox bbox
Description: Represents the ComplexBBox
class.
decodes
public Word[] decodes
Description: Represents an array of Word
objects.
toString()
public String toString()
Description: Returns a string representation of the TextLine
, concatenating the content of all words with spaces. Returns an empty string if there are no words.
TextParagraph
The TextParagraph
class represents a paragraph of text detected within an image, providing a structured method to access the individual lines of text and their bounding information.
The following sections cover the attributes and methods for TextLine
.
bbox
public ComplexBBox bbox
Description: Represents the ComplexBBox class.
lines
public TextLine[] lines
Description: An array of TextLine
objects representing the lines of text in the paragraph. This structure allows for detailed analysis and processing of each line, enabling applications to reconstruct the paragraph’s text content, understand its structure, and perform further operations like editing or translation.
toString()
public String toString()
Description: Returns a string representation of the TextParagraph
, concatenating the content of all words in each line with newline characters. Returns an empty string if there are no lines.
Word
The Word
class represents a word extracted from a document, including spatial information about the word's location through a bounding box and multiple potential decoded text interpretations.
The following sections cover the variables for Word
.
bbox
public ComplexBBox bbox
Description: Represents the ComplexBBox class.
decodes
public DecodedText[] decodes
Description: Represents an array of DecodedText objects.