Domain Extraction
The Cortex XSOAR domain indicator type is built using regular expression and a formatting script. The following describes the domain extraction components and what output you should expect when extracting indicators of type domain.
Domain Extraction Components#
There are two components when extracting domain indicators:
- Regular expression
- Formatting script
Regular Expression#
When text is given, a domain regular expression will try to catch a valid domain based on the following characteristics:
- A domain with ASCII and non-ASCII characters.
- Escaped and unescaped domains.
The regular expression can extract domains from one of the following:
- Explicit domain.
- URL.
- Email address.
Formatting Script#
After extracting the domain using a regular expression, an ExtractDomainAndFQDNFromUrlAndEmail formatting script iterates on each given domain and does the following:
Replaces "[.]" with ".".
For example:
www[.]evil.com --> www.evil.comValidate the Top-Level-Domain to avoid file extension false positives.
Excludes ‘.zip’ Top-Level-Domain by default.
Returns the formatted domain.
Common Domain Structures#
The following are some of the most common domain structures that Cortex XSOAR supports:
test.comwww.test.comxn--t1e2s3t4.comwww.xn--t1e2s3t4.comwww.test.co.uktest.co.uksubtest.test.comwww.test.test.comötest.comtestö.comwww.testö.comwww.teöst.com
For more information about indicator extraction, see the Cortex XSOAR Administrator's Guide.