Data Types

AI Data Capture SDK

Overview

This guide outlines the data types used in the AI Data Capture SDK.


BBox

The Bounding Box BBox class, a subclass of the Rect class, is a core component for handling bounding boxes, rectangular frames that enclose detected objects within an image, defining their position and dimensions. Each bounding box includes:

  • Probability (Confidence) - The confidence score of the detection.
  • Class - The label of the detected object.
  • Coordinates - The coordinates: xmin, ymin, xmax, and ymax.

The BBox class contains attributes such as class label (cls) and probability (prob), which are essential for identifying and assessing detected objects.

The following sections cover the attributes for BBox.


cls

    public int cls

Description: An integer that represents the class label of the detected object, corresponding to a specific category or class in the object detection model.


prob

    public float prob

Description: A float value that indicates the probability or confidence score associated with the detection of the object, typically ranging from 0.0 to 1.0.


xmin

    public float xmin

Description: The x-coordinate of the lower-left corner.


ymin

    public float ymin

Description: The y-coordinate of the lower-left corner.


xmax

    public float xmax

Description: The x-coordinate of the upper-right corner.


ymax

    public float ymax

Description: The y-coordinate of the upper-right corner.


Complex BBox

The ComplexBBox, or Complex Bounding Box, class is a core component for handling bounding polygon that encloses detected text within an image, defining its position and dimensions. Each complex bounding box includes:

  • Probability (Confidence) - The confidence score of the detection.
  • Coordinates - The coordinates of the bounding polygon.

The following sections cover the variables for ComplexBBox.


x[ ]

    public float[] x

Description: An array of floats representing the x-coordinates of the vertices of the complex bounding box, allowing for polygons beyond simple rectangles.


y[ ]

    public float[] y

Description: An array of floats representing the y-coordinates corresponding to the x array, defining the vertices of the complex bounding box.


prob

    public float prob

Description: A float representing the probability or confidence score associated with the bounding box, indicating the confidence level of the detected text.


CompletableFuture

CompletableFuture is a Java class used for asynchronous programming. It represents a future result of an asynchronous computation, providing a powerful and flexible way to write non-blocking, asynchronous code. Byt using CompletableFuture, operations can be executed in a non-blocking manner, allowing the main thread to continue executing other tasks while the asynchronous operation is in progress.

Many APIs of AI Data Capture SDK are defined to return a CompletableFuture object for asynchronous processing of data.

For more details, refer to the official documentation.


DecodedText

The DecodedText class represents recognized text along with an associated confidence score, allowing applications to choose the best interpretation based on context or additional processing.

The following sections cover the attributes for DecodedText.


content

    public String content

Description: A String that contains the recognized or decoded text.


confidence

    public float confidence

Description: A float representing the confidence level in the accuracy of the recognized text, typically ranging from 0.0 (no confidence) to 1.0 (full confidence)..


Descriptor

The Descriptor class represents the feature descriptors generated by the FeatureExtractor as a vector of float values, along with the model version used to generate these descriptors.

The following sections cover the methods for Descriptor.


getModelVersionInfo()

    public String getModelVersionInfo()

Description: Returns the model version used to generate feature descriptors.


getDescriptorInfo()

    public float[][] getDescriptorInfo()

Description: Returns feature descriptors that are generated by the FeatureExtractor.


ImageData

The ImageData class acts as a container for image data sourced from the process() API. It provides a unified interface for detectors to access image information, ensuring orientation adjustments and standardizing the format required for AI processing. The following sections outline the methods available in this class.

getBitmap()

    Bitmap ImageData.getBitmap() 

Description: Retrieves the bitmap representation of the image data.

Return Value: Returns the Bitmap representation of the image.

getRotationDegrees()

    int ImageData.getRotationDegrees() 

Description: Retrieves the rotation of the image in degrees.

Return Value: Returns an integer representing the rotation in degrees (0, 90, 180, or 270).

fromImageProxy (ImageProxy imageProxy)

    static ImageData ImageData.fromImageProxy(ImageProxy imageProxy) throws InvalidInputException 

Description: Creates an ImageData instance from an ImageProxy. This method converts a CameraX ImageProxy to an ImageData object, using the rotation information from the ImageProxy itself.

Parameters:

  • ImageProxy - The ImageProxy to convert.

Return Value:

  • ImageData - A new ImageData instance containing the image data.

Exceptions:

  • InvalidInputException - Thrown when the input imageProxy is null, indicating an invalid input parameter.

fromMediaImage (Image image, int rotationDegrees)

    static ImageData ImageData.fromMediaImage(Image image, int rotationDegrees) throws InvalidInputException

Creates an ImageData instance from a media Image. This factory method converts an android.media.Image into an ImageData object, applying the specified rotation to ensure correct orientation.

Parameters:

  • image - The media Image to convert.
  • rotationDegrees - The rotation to apply in degrees.

Return Value:

  • ImageData - A new ImageData instance containing the image data.

Exceptions:

  • InvalidInputException - Thrown if the image is null or the orientation is not supported.

fromBitmap(Bitmap bitmap, int rotationDegrees)

    static ImageData ImageData.fromBitmap(Bitmap bitmap, int rotationDegrees) throws InvalidInputException

Creates an ImageData instance from a given bitmap with specified rotation degrees. This factory method transforms a bitmap into an ImageData object, applying the specified rotation to ensure the image is correctly oriented for processing.

Parameters:

  • bitmap - The bitmap from which to create ImageData.
  • rotationDegrees - The rotation to apply to the image in degrees.

Return Value:

  • ImageData - A new ImageData instance containing the bitmap data.

Exceptions:

  • InvalidInputException - Thrown if the bitmap is null or the rotation is unsupported.

OCRResult

The OCRResult class represents the result of an Optical Character Recognition (OCR) operation. It provides information about the recognized text and its location within an image. This is essential for applications that require precise localization and extraction of text from images, such as in automated document processing and image-based data entry.

The following sections cover the attributes for OCRResult.


bbox

Description: An instance of the ComplexBBox class, which defines the rectangular area in the image where the text was detected.


text

    public String text

Description: Contains the recognized text within the bounding box. This attribute provides direct access to the textual content obtained from the OCR process, allowing applications to utilize this data for further analysis or processing.


TextLine

An array of TextLine objects representing the lines of text in a paragraph.

The following sections cover the attributes and methods for TextLine.


bbox

    public ComplexBBox bbox

Description: Represents the ComplexBBox class.


decodes

    public Word[] decodes

Description: Represents an array of Word objects.


toString()

    public String toString()

Description: Returns a string representation of the TextLine, concatenating the content of all words with spaces. Returns an empty string if there are no words.


TextParagraph

The TextParagraph class represents a paragraph of text detected within an image, providing a structured method to access the individual lines of text and their bounding information.

The following sections cover the attributes and methods for TextLine.


bbox

    public ComplexBBox bbox

Description: Represents the ComplexBBox class.


lines

    public TextLine[] lines

Description: An array of TextLine objects representing the lines of text in the paragraph. This structure allows for detailed analysis and processing of each line, enabling applications to reconstruct the paragraph’s text content, understand its structure, and perform further operations like editing or translation.


toString()

    public String toString()

Description: Returns a string representation of the TextParagraph, concatenating the content of all words in each line with newline characters. Returns an empty string if there are no lines.


Word

The Word class represents a word extracted from a document, including spatial information about the word's location through a bounding box and multiple potential decoded text interpretations.

The following sections cover the variables for Word.


bbox

    public ComplexBBox bbox

Description: Represents the ComplexBBox class.


decodes

    public DecodedText[] decodes

Description: Represents an array of DecodedText objects.