Skip to main content

Image OCR

This Integration is part of the Image OCR Pack.#

Use the Image OCR integration to extract text from images. The integration utilizes the open-source tesseract OCR engine.

Use Cases#

  • Extract text from images included in emails during a phishing investigation.
  • Extract text from images included in an html page.

Configure Image OCR in Cortex#

ParameterDescriptionRequired
A CSV of language codes of the language to use for OCR (leave empty to use defaults).The default language used for OCR is English. Use this parameter to specify a list of additional languages. For example, eng,fra. To see all supported language codes, use the image-ocr-list-languages command.False
Skip on corrupted imagesIf true, will not raise an error if the image is corrupted and could not be processed.False

Commands#

You can execute these commands from the CLI, as part of an automation, or in a playbook. After you successfully execute a command, a DBot message appears in the War Room with the command details.

image-ocr-list-languages#


Lists supported languages for which the integration can extract text.

Base Command#

image-ocr-list-languages

Input#

There are no input arguments for this command.

Command Example#

!image-ocr-list-languages

Human Readable Output#

Image OCR Supported Languages#

  • ara
  • chi_sim
  • chi_sim_vert
  • chi_tra
  • chi_tra_vert
  • deu
  • eng
  • fra
  • heb
  • ita
  • jpn
  • jpn_vert
  • pol
  • por
  • rus
  • spa
  • swe
  • tur

image-ocr-extract-text#


Extracts text from an image.

Base Command#

image-ocr-extract-text

Input#

Argument NameDescriptionRequired
entryidA comma-separated list of Entry IDs of image files to process.Required
langsA CSV of language codes of the language to use for OCR. Overrides the default configured language list.Optional
verboseTurn on verbose flag to display tesseract and other used libraries versions.Optional

Context Output#

PathTypeDescription
File.TextStringExtracted text from the passed image file.

Command Example#

!image-ocr-extract-text entryid="922@e84104f7-b235-4d82-860a-ea09f5dc0559"

Context Example#

{
"File": {
"Text": "The quick brown fox\njumped over the 5\nlazy dogs!\n\f",
"EntryID": "922@e84104f7-b235-4d82-860a-ea09f5dc0559"
}
}

Human Readable Output#

Image OCR Extracted Text for Entry ID 1613@1e6b4a55-33e7-433b-8f6f-2c0751c8c444#

The quick brown fox jumped over the 5 lazy dogs!