Use the Image OCR integration to extract text from images. The integration utilizes the open-source tesseract OCR engine.
- Extract text from images included in emails during a phishing investigation.
- Extract text from images included in an html page.
Navigate to Settings > Integrations > Servers & Services.
Search for Image OCR.
Click Add instance to create and configure a new integration instance.
Parameter Description Required A CSV of language codes of the language to use for OCR (leave empty to use defaults). The default language used for OCR is English. Use this parameter to specify a list of additional languages. For example,
eng,fra. To see all supported language codes, use the image-ocr-list-languages command.
False Skip on corrupted images If true, will not raise an error if the image is corrupted and could not be processed. False
Click Test to validate the URLs, token, and connection.
You can execute these commands from the Cortex XSOAR CLI, as part of an automation, or in a playbook. After you successfully execute a command, a DBot message appears in the War Room with the command details.
Lists supported languages for which the integration can extract text.
There are no input arguments for this command.
Extracts text from an image.
|A comma-separated list of Entry IDs of image files to process.
|A CSV of language codes of the language to use for OCR. Overrides the default configured language list.
|Extracted text from the passed image file.
The quick brown fox jumped over the 5 lazy dogs!