Troubleshooting Guide
This guide provides common troubleshooting steps. When reporting an issue to Cortex XSOAR Support, always include all information obtained from running the following troubleshooting steps.
#
Reverting a Pack to a Previous VersionIf you encounter an issue after upgrading a Pack, you can revert to a previous version by going to Installed Content Packs -> Pack Name -> Version History and choosing Revert to this version. Sample screenshot:
#
Network TroubleshootingExamples of common errors indicating that there probably is a networking issue:
[Errno -2] Name does not resolve
[Errno 110] Operation timed out
Failed to establish a new connection: [Errno -3] Try again
dial tcp: lookup ****: no such host
connect: operation timed out
connect: connection refused
ERR_CONNECTION_REFUSED
When troubleshooting networking issues, it is important to first understand what type of networking the integration or automation is using. Cortex XSOAR integrations and automations can be classified into two main types regarding their networking use:
#
Host Based NetworkingIntegrations/automations running within the server/engine will use the networking stack provided by the host machine of the server/engine. Such integrations/automations include native integrations (part of the server binary) such as the RemoteAccess
integration and JavaScript integrations such as VirusTotal
and http
. Native integrations can be identified by the fact that they are shipped as part of the server and not associated with a Content Pack. JavaScript integrations/automations can be identified by checking the integration/automation settings to see that the Language Type is JavaScript. JavaScript integrations/automations run within the Cortex XSOAR server/engine process using a JavaScript virtual environment and therefore use the same network stack as the server/engine. The source IP addresses for these integrations/automations are the same as used by the server/engine.
If the integration/automation is using HTTP-based communication, we recommend first testing locally using the curl
utility to verify that it is possible to perform network communication with the HTTP endpoint. Run the curl
command on the server or engine machine by logging in via SSH. Common curl
command variants (httpbin.org
is used as an example url):
More info about curl
is available at Everything curl.
If you are not able to perform a basic curl
request from the machine to the target HTTP endpoint, the issue is probably not a problem with the integration/automation but rather with the networking setup of the server/engine machine. Make sure to first resolve the networking issue so a basic curl
command succeeds before continuing to test the integration/automation. Many times this resolves to a firewall, NAT or proxy issue.
#
Docker Based NetworkingDocker Based integrations/automations are written in Python or Powershell. They can be identified by inspecting the integration/automation settings and under Language type will appear Python or Powershell. Docker creates its own networking, therefore the integrations/automations use a different networking stack from the Cortex XSOAR server/engine. The source IP addresses for these integrations/automations are different and provided according to the Docker networking configuration.
As with Host Based Networking, for integrations/automations that use HTTP endpoints we recommend testing with curl
from within a Docker container as a first step. This can be done by logging in to the server/engine machine via SSH and running the following command:
For example:
For additional curl
sample commands see the Host Based Networking section.
Note: You may need to run docker
with sudo
or login with root if your user doesn't have sufficient permissions to execute the docker
command.
If running curl
from within docker
fails with networking errors, we recommend checking if the curl
command succeeds or fails without docker
by running the curl
command directly on the host machine. If the curl
command succeeds on the host machine and fails within Docker, you are probably experiencing a Docker networking issue due to how the Docker networking stack is configured.
We recommend that you use the Docker networking stack because it provides networking isolation. Try to resolve the Docker networking issue and consult the Docker networking docs.
When running with Docker's networking stack continues to cause issues, there is an option to run Docker containers with host networking. In this mode, the container will share the host’s network stack and all interfaces from the host will be available to the container. The container’s hostname will match the hostname on the host system. You can test this mode by running a curl
command via docker
in the following form:
If running with --network=host
succeeds, you can configure the server to use host networking for docker by adding the following advanced server configuration in Cortex XSOAR:
Key | Value |
---|---|
python.pass.extra.keys | --network=host |
It is also possible to configure only a specific docker image to use the host networking by stating python.pass.extra.keys.<docker-image>
as the key. For example:
Key | Value |
---|---|
python.pass.extra.keys.demisto/smbprotocol | --network=host |
After you add the server configuration, run the /reset_containers
command from the Cortex XSOAR CLI to reset all containers and to begin using the new configuration.
Notes:
- For multi-tenant deployments, you need to add this setting to each tenant.
- When using engines, you need to add this setting to each engine.
#
Read TimeoutIn case you encounter a ReadTimeout error, such as ReadTimeout: HTTPSConnectionPool(host='www.google.com', port=443): Read timed out. (read timeout=10)
, it means that the server (or network) failed to deliver any data within 10 seconds. This might be due to a large response size.
Starting from Base Content Pack version 1.17.6, we support controlling the read timeout value via server advanced configuration, as follows:
System wide
Key Value python.pass.extra.keys
--env=REQUESTS_TIMEOUT=<TIMEOUT>
Per Integration
Key Value python.pass.extra.keys
--env=REQUESTS_TIMEOUT.<INTEGRATION-ID>=<TIMEOUT>
Examples:
- Set the read timeout value to 120 seconds system wide,
--env=REQUESTS_TIMEOUT=120
- Set the read timeout value to 75 seconds for the Palo Alto Networks WildFire v2 integration,
--env=REQUESTS_TIMEOUT.WildFire-v2=75
Note: The REQUESTS_TIMEOUT
settings only affects integrations which use the BaseClient class from CommonServerPython.
#
TLS/SSL TroubleshootingExamples of common errors indicating that there is an issue with trusting a TLS/SSL networking connection:
SSLCertVerificationError
SSL_CERTIFICATE_VERIFY_FAILED
SSL: CERTIFICATE_VERIFY_FAILED
SSLError: certificate verify failed
These errors are usually as a result of a server using an untrusted certificate or a proxy (might be transparent) that is doing TLS/SSL termination.
Notes
- Most integrations provide a configuration option of Trust any certificate, which will cause the integration to ignore TLS/SSL certificate validation errors. You can use this option to test the connection and verify that in fact the issue is certificate related.
- To trust custom certificates in Cortex XSOAR server or engines, follow the following instructions.
#
CertificatesTroubleshoot AutomationUse the CertificatesTroubleshoot Automation to retrieve and decode an endpoint certificate. Additionally, use it to retrieve, decode and validiate the custom certificates deployed in Docker containers. The automation is part of the Troubleshoot Pack.
Common reasons for TLS/SSL issues and resolutions
Endpoint certificate issues:
Expiration date - The certificate has a start and end date which is not valid anymore.
- Identify:
Endpoint certificate
->General
->NotValidBefore/NotValidAfter
:
- Resolution: If the certificate expired, make sure to renew the certificate at the target endpoint.
- Identify:
Common name / Alt name - A certificate signed only for a specific URI, For example, if the certificate is signed for
test.com
and the integration is accessing the endpoint usingtest1.com
the certification validation will fail.Identify:
Endpoint certificate
->Subject
->CommonName
andcertificate
->Extentions
->SubjectAlternativeName
:Resolution: If the URI isn't matching the URI endpoint (Regex), try to access the endpoint with one of the alt names/common names. If the endpoint isn't accessible via trusted names, sign the certificate with the correct common name or apply an additional alt name.
#
Fetch Incidents Troubleshooting#
Fetch HistoryIn XSOAR Versions 6.8 and above, it is possible to observe the results of the last fetch-incidents/fetch-indicators runs using the Fetch History modal. To view the modal, click the button with the history icon next to the Integration Instance settings.
The following fields are stored for each record:
- Pulled At - The date and time the fetch run was completed.
- Duration - The length of time the fetch run took to complete.
- Last Run - The contents of the last run object.
- Message - Depending on the fetch run status, will be one of the following:
- If successfully finished, how many Incidents/Indicators were pulled or dropped. If nothing was pulled or dropped, the message will be "Completed".
- In case of an error, the error details.
- In long-running integrations, the info/error message forwarded to
demisto.updateModuleHealth()
. The is_error boolean argument of this method determines the message type.
- Source IDs - If available, displays the incident IDs as they appear in the 3rd-party product. The IDs are collected from incidents that contain the
dbotMirrorId
field. Note: thedbotMirrorId
field should be determined at the integration level rather than the mapping level.
#
Server ConfigurationsKey | Description | Default Value |
---|---|---|
fetch.history.size | The amount of records stored for every instance. | 20 |
fetch.history.enabled | Whether or not the feature is enabled. | true |
#
DebuggingIn case of a recurring issue with a fetching instance, follow these steps to produce a debug log of a single fetch run.
If the issue does not reproduce consistently:
- Set the log level of the specific instance for more convenient tracking of the fetch logs over time.
- Keep track on the Fetch History of this instance. Consider temporarily setting the fetch.history.size server configuration to store more records.
#
Debug ModeCortex XSOAR (Server 5.0+) supports running Python integration commands and automation scripts in debug-mode
from the Cortex XSOAR CLI. When a command is run in debug-mode
a log file of the command execution will be created and attached to the war room. When encountering an issue which is related to an integration or an automation, make sure to reproduce the command with debug-mode
and inspect the generated log file. The debug-mode
log file will contain information not available in the Server logs and can provide additional insights regarding the root cause of the issue. Additionally, some integrations have specific code to include extra debug info when run in debug-mode
.
Important Note
The debug mode feature prints extended data from an integrations configuration and settings which may include sensitive information. Before sharing the generated log files, make sure sensitive information has been removed.
debug-mode
#
Run a command in In the Cortex XSOAR CLI run the command with all arguments that cause the issue and append the following argument: debug-mode=true
. For example:
Screenshot of running a command with debug-mode=true
and the resulting log file (ad-search.log
):
debug-mode
#
Test Integration Module in Starting with Cortex XSOAR 6.2 when you Test
an integration module and it fails, you can download from the integration configuration dialog a debug-mode
full report by following the link: Run advanced test and download a full report. Example screenshot:
If you require a debug-mode
log when the Test
from the integration configuration dialog succeeds, it is possible to run the test integration module command from the Cortex XSOAR CLI with debug-mode=true
. This is done by issuing a command of the form:
For example for an integration instance name of: Cortex_XDR_instance_1
run the following from the CLI:
Note:
- If the instance name contains spaces, replace the space with an underscore (
_
). - The "Do not use by default" checkbox should be unchecked on the integration instance you are testing.
Screenshot of running a test-module
command with debug-mode=true
and the resulting log file (test-module.log
):
debug-mode
#
Fetch Incidents in Starting with Cortex XSOAR 6.0 it is possible to run the fetch incidents command from the Cortex XSOAR CLI with debug-mode=true
. This is done by issuing a command of the form:
For example for an integration instance name of: Cortex_XDR_instance_1
run the following from the CLI:
Note: if the instance name contains spaces, replace the space with an underscore (_
).
Screenshot of running a fetch
command with debug-mode=true
and the resulting log file (fetch-incidents.log
):
#
Integration Debug LogsImportant Note
The Integration Debug feature prints extended data from an integrations configuration and settings which may include sensitive information. Before sharing the generated Integration-Instance log files, make sure sensitive information has been removed.
Starting with version 6.2, it is possible to create logs for an instance of an integration in order to get debug information for a specific instance over a period of time.
This mode is especially useful for long running integrations such as EDL or TAXII-Server. It helps troubleshooting when it is not possible to run the desired command in debug-mode
from the playground. Whether it is a long running integration, or the issue occurs from time to time such as with the fetch-incidents command.
For example, if you have an integration instance running the fetch-incidents command, and the integration misses some of the incidents, you may want to get debug level information for each fetch-incidents command (or any other command executed by this instance) even if the server log level is set to Info. If you move the server log level to Debug, the server log would contain a lot of irrelevant information for integration troubleshooting. For this reason, the Log Level configuration parameter was added to the integration configuration.
There are three options for this parameter:
- Off
- Debug
- Verbose
In Debug mode, the server will run all the commands of this instance with a Debug log level and log the information in the Integration-Instance log.
In Verbose mode, additional information such as connections coming off device handling, the raw response, and all parameters and headers are logged in addition to the debug level information.
For example, if an integration fails and the instance log level is Debug, the Integration-Instance log will contain the error stack trace. If the log level is Verbose, the Integration-Instance log will contain the error stack trace, but also a copy of the HTTP request, the parameters used in the integration, what the response was, etc.
By default, the Log Level configuration parameter is set to Off.
The Integration-Instance.log is located in /var/log/demisto/
.
These log level modes are only for the configured instance and do not affect the log for the entire server.
Note that the log level configuration for an integration instance may affect performance of the integration instance, therefore use this feature only for troubleshooting and set it to Off when you have the required information in the log.