Domain Extraction
The Cortex XSOAR domain indicator type is built using regular expression and a formatting script. The following describes the domain extraction components and what output you should expect when extracting indicators of type domain.
#
Domain Extraction ComponentsThere are two components when extracting domain indicators:
- Regular expression
- Formatting script
#
Regular ExpressionWhen text is given, a domain regular expression will try to catch a valid domain based on the following characteristics:
- A domain with ASCII and non-ASCII characters.
- Escaped and unescaped domains.
The regular expression can extract domains from one of the following:
- Explicit domain.
- URL.
- Email address.
#
Formatting ScriptAfter extracting the domain using a regular expression, an ExtractDomainAndFQDNFromUrlAndEmail formatting script iterates on each given domain and does the following:
Replaces "[.]" with ".".
For example:
www[.]evil.com --> www.evil.com
Validate the Top-Level-Domain to avoid file extension false positives.
Excludes ‘.zip’ Top-Level-Domain by default.
Returns the formatted domain.
#
Common Domain StructuresThe following are some of the most common domain structures that Cortex XSOAR supports:
test.com
www.test.com
xn--t1e2s3t4.com
www.xn--t1e2s3t4.com
www.test.co.uk
test.co.uk
subtest.test.com
www.test.test.com
ötest.com
testö.com
www.testö.com
www.teöst.com
For more information about indicator extraction, see the Cortex XSOAR Administrator's Guide.