Image OCR
Image OCR Pack.#
This Integration is part of theUse the Image OCR integration to extract text from images. The integration utilizes the open-source tesseract OCR engine.
Use Cases
- Extract text from images included in emails during a phishing investigation.
- Extract text from images included in an html page.
Configure Image OCR on Cortex XSOAR
- Navigate to Settings > Integrations > Servers & Services .
- Search for Image OCR.
-
Click
Add instance
to create and configure a new integration instance.
- Name : a textual name for the integration instance.
- A CSV of language codes of the language to use for OCR (leave empty to use defaults). Default language is English.
- Click Test to validate that the configuration is valid.
Note : The default language used for OCR is English. To configure additional languages, in the Languages option specify a CSV list of language codes. For example, to set the integration for English and French, set this value: eng,fra . To see all supported language codes, use the following command:
!image-ocr-list-languages
Commands
You can execute these commands from the Cortex XSOAR CLI, as part of an automation, or in a playbook. After you successfully execute a command, a DBot message appears in the War Room with the command details.
- Get a list of supported OCR languages: image-ocr-list-languages
- Extract text from an image: image-ocr-extract-text
1. Get a list of supported OCR languages
Lists supported languages for which the integration can extract text.
Base Command
image-ocr-list-languages
Input
There are no input arguments for this command.
Context Output
There is no context output for this command.
Command Example
image-ocr-list-languages
Human Readable Output
Image OCR Supported Languages
- ara
- chi_sim
- chi_sim_vert
- chi_tra
- chi_tra_vert
- deu
- eng
- fra
- heb
- ita
- jpn
- jpn_vert
- osd
- rus
- spa
- tur
2. Extract text from an image
Extracts text from an image.
Base Command
image-ocr-extract-text
Input
Argument Name | Description | Required |
---|---|---|
entryid | Entry ID of the image file to process. | Required |
langs | A CSV of language codes of the language to use for OCR. Overrides default language. languages. | Optional |
Context Output
Path | Type | Description |
---|---|---|
File.Text | String | Extracted text from the passed image file. |
Command Example
image-ocr-extract-text entryid="922@e84104f7-b235-4d82-860a-ea09f5dc0559"
Context Example
{
"File": {
"Text": "The quick brown fox\njumped over the 5\nlazy dogs!\n\f",
"EntryID": "922@e84104f7-b235-4d82-860a-ea09f5dc0559"
}
}
Human Readable Output
Image OCR Extracted Text
The quick brown fox
jumped over the 5
lazy dogs!