Common issues when installing or updating the agent

This article looks at three of the most common issues that can occur when installing or updating agents.

General helpful links

https://help.deepsecurity.trendmicro.com/aws/welcome.html

https://success.trendmicro.com/product-support/deep-security-20-0

1. Anti-Malware engine offline (Windows)

This problem typically occurs on Windows machines, where the Anti-Malware module has either not installed properly, or a driver/service is not running. From the Agent side, the Deep Security notifier app in the taskbar will show a status of “Driver Offline/Not Installed.” If the server reporting this error has not had the initial root certificate updates installed from Microsoft’s Updates, then the server must be patched, the Agent must be uninstalled, the server rebooted, and the Agent re-installed/re-activated.

Most of the time this problem is resolved by uninstalling, restarting, and re-installing/re-activating the Agent, as the troubleshooting steps in the first article referenced below states.

For a full walkthrough of cleaning up the Deep Security Agent from a Windows machine, refer to the third article linked below, which includes instructions for manually uninstalling the Deep Security Agent. It’s not always necessary to manually uninstall the Agent, but the instructions include file locations, registry entries, and services to clean up, after a normal uninstall and reboot has been completed.

Helpful links:

https://success.trendmicro.com/solution/1104241-updating-the-verisign-and-digicert-certificate-on-deep-security

https://help.deepsecurity.trendmicro.com/aws/anti-malware-event-engine-offline.html

https://success.trendmicro.com/solution/1096150-manually-uninstalling-deep-security-agent-relay-and-notifier-from-windows

2. Security update failed

If a Deep Security Agent is unable to communicate with the designated Deep Security Relay in the environment, the server has a risk of not running the latest Anti-Malware patterns, so this can be a higher priority issue.

When troubleshooting security update failures, the most common reason for the failure is due to network connectivity between the Deep Security Agent and the Deep Security Relay. The article linked below gives a few steps for checking that connectivity and confirming TCP communication is functioning between the two components.

Using a utility like Test-NetConnection in Powershell, or telnet/curl from a Linux server can help with confirming TCP communication between the Agent and Manager are open. If TCP connectivity is open, then there could potentially be a device between the two that is performing SSL Inspection, or interfering with the encrypted connection between the two points.

The ds_agent.log file on the Agent will normally provide a reason for why it cannot perform a security update and will be identified at the start of the line with the word Error or Warning. Correlate the update attempt time with the time in the log file to identify the underlying reason why updates are failing.

Log file location:

  • Windows – C:\ProgramData\Trend Micro\Deep Security Agent\diag
  • Linux - /var/opt/ds_agent/diag

Helpful links:

https://help.deepsecurity.trendmicro.com/aws/security-update-connectivity.html

https://www.trendmicro.com/en_us/business/products/downloads.html

3. Performance/Application issues introduced after installing the Deep Security Agent (Anti-Malware and Module Isolation)

Prior to deploying a Deep Security Agent, the appropriate security configuration will need to be applied to a server; this is common with any Anti-Malware/Security software, and ensures the server or applications installed are not negatively impacted by increased review of their activity.

Although this section does not refer directly to a status in the Deep Security console, this is one of the more common configuration adjustments that will require troubleshooting after deploying the Deep Security Agent to a new server. If a server’s performance is impacted, or an application’s functionality is impacted, you should first identify which Deep Security module could be contributing to the problem.

Performance issues can be identified first by which processes on a server may be utilizing more CPU/RAM than others. In Windows machines, there are two services that could typically be the culprit; dsa.exe or coreServiceShell.exe. dsa.exe is the core Agent process running on the machine, and coreServiceShell.exe is part of the Anti-Malware module. In a Linux server, these processes are named ds_agent and ds_am, respectively.

Regardless of which process is consuming resources, you’ll want to narrow down which protection module(s) are contributing to the increased use of resources. By turning off individual modules, one-by-one, from the Deep Security Manager console, you can watch the resource utilization for any decrease in use, then likely attribute that behavior to the most recent module disabled.

When coreServiceShell.exe or ds_am processes are utilizing a high amount of CPU, this is usually indicative of the Real-Time Anti-Malware engine scanning a high number of read/write transactions on the server, requiring a higher amount of resources to complete its job.

This high amount of activity can be reduced by adding exclusions for data/applications we know are safe. The most common method for reducing resource utilization, or resolving other Application issues introduced from the Anti-Malware module, is by identifying safe applications running on the server, and implementing Process Image exclusions. A Process Image exclusion is a pointer to the full path of a process running on the server that you know to be safe, such as sqlsvr.exe for Microsoft SQL Server. By excluding this process, any files accessed by the sqlsvr.exe process would not be scanned by the Real-Time engine. To make these adjustments, the Scan Configuration for the machine/policy must be edited in the Deep Security Manager, to include the appropriate processes to be excluded.

Applications that are impacted by the Anti-Malware module may require additional troubleshooting after applying exclusions, including collecting additional information from the server. On the server encountering Anti-Malware related application issues, additional debug logging can be enabled by editing the C:\Program Files\Trend Micro\AMSP\AmspConfig.ini file; change the line DebugLevel=0 to DebugLevel=1 or 2 (2 logs further information). Restart the Trend Micro Deep Security Agent and Solution Platform services for those changes to take effect. To revert these logging options, adjust the DebugLevel back to 0, and perform the same service restarts.

On Linux servers, Identify the PID for ds_am process:

$ ps aux | grep ds_am

Increase debug level (run command multiple times to increase level by 1):

kill -USR1 $(PID_for_ds_am)

To decrease the debug level (run command multiple times to decrease level by 1):

kill -USR2 $(PID_for_ds_am)

Reproduce the problem, and then collect a diagnostic package from the command line (link), which will include the additional information from the logging level that was adjusted (note: collecting the Diagnostic Package from the Deep Security Manager will include additional information not collected via command line). This diagnostic package can be provided to the support team to review and help identify the underlying problem.

Helpful links:

https://help.deepsecurity.trendmicro.com/aws/high-cpu-usage.html