System monitoring notifications
ASMS tracks various system metrics that trigger notifications when thresholds are exceeded. These notifications can be triggered as syslog messages, or events in the issues center.
AFA admins can modify the thresholds for each metric and the types of notifications triggered.
For more details, see Sending outgoing syslog messages and Manage ASMS issues.
Configure system notifications
This procedure describes how to configure the json file that determines how and which AFA system notifications are sent.
Do the following:
-
Browse to an open the /data/algosec-ms/config/watchdog_configuration.json file for editing.
The watchdog_configuration.json file includes the following properties:
metrics An array that specifes AFA metrics.
For more details, see Metric element.
actions An array of possible actions to take upon a metric status change.
Supported actions include:
- publish_syslog
- publish_issues_center
Note: While all metrics can trigger syslog messages, only some can trigger messages in the AFA issues center.
For more details, see System notifications enabled by default .
metricsActions An array of objects that each define when a specific status change triggers an action.
For more details, see MetricsAction.
-
Modify the json file as needed, and save your changes.
The Metric element in the watchdog_configuration.json file has the following properties:
Property |
Description |
---|---|
enabled |
Boolean. Determines whether the metric is enabled. |
name |
String. Read-only. A unique name for the metric. For details, see System notifications enabled by default. |
description |
String. A description of the metric. For details, see System notifications enabled by default. |
frequency |
A frequency object, which specifies the frequency for checking the metric. Each frequency object includes the following properties:
Default = 10 SECONDS. |
hostTypes |
Array. List of appliances that check the metric. One of the following:
If you do not have a distributed architecture, this is always defined as [MASTER]. |
thresholdPolicy |
An options object that specifies the metric's thresholds. The options object is an array of objects that each specify a threshold for a specific status. For more details, see Options object and Threshold sample configuration. |
Each options object includes the following properties:
status |
String. Determines the status of the metric if the threshold is met. One of the following:
|
type |
String. Determines the type of result returned by the metric collection. One of the following:
|
condition |
String. The comparison operator to use on the metric collection result. One of the following:
|
value |
A type specified in the type property. The value to compare to the metric collection result. Set the value to zero (0) to cause the status to change if the threshold is met even once. |
timeCondition |
A timeCondition object, which determines a time period for which the threshold must be met in order for the metric status to change. The timeCondition object includes the following properties:
|
Threshold sample configuration
The example below defines actions to take for PASS and FAIL statuses:
- The metric status will change to PASS if the result is OK for more than 1 minute.
- The metric status will change to FAIL if the result is not OK even once.
"thresholdPolicy": { "options": [ { "status": "PASS", "type": "STRING", "condition": "EQ", "value": "OK", "timeCondition": { "value": 1, "unit": "MINUTE" } }, { "status": "FAIL", "type": "STRING", "condition": "NOT", "value": "OK", "timeCondition": { "value": 0, "unit": "MINUTE" } } ] }
The MetricsAction element is an array that defines the statuses available for the threshold definition.
For example, the code sample shown above defines actions for the PASS and FAIL statuses, but not for WARNING statuses. In this scenario, the WARNING status should be disabled in the MetricAction array.
The MetricsAction array includes the following properties:
Property |
Description |
---|---|
metric |
String. Defines name of the metric, as stated in the metric's object in the metrics array. |
action |
String. The name of the action, as stated in the action's object in the actions array. |
pass |
Boolean. Determines whether the action should be triggered when the metric's status changes to pass. |
warning |
Boolean. Determines whether the action should be triggered when the metric's status changes to warning. |
fail |
Boolean. Determines whether the action should be triggered when the metric's status changes to fail. |
System notifications enabled by default
Some AFA messages can be triggered as syslog or Issues Center messages, and others can be triggered as syslog messages only.
The following table lists the notifications enabled in AFA by default:
Metric names |
Description |
Syslog |
Issues Center |
---|---|---|---|
suite_disk_space_available |
Available disk space in root partition Notifications triggered:
|
✔ | ✔ |
suite_nas_disk_space_available |
Available disk space in NAS partition Notifications triggered:
|
✔ | ✔ |
suite_data_disk_space_available |
Available disk space in data partition Notifications triggered:
|
✔ | ✔ |
suite_open_file_descriptors |
Open file descriptors Notifications triggered: Warning if more than 4000 for the last 5 minutes. |
✔ | ✔ |
suite_memory_available |
Available memory Notifications triggered: Warning if less than 10% for the last 3 hours. |
✔ | ✔ |
suite_cpu_usage |
CPU usage Notifications triggered: Warning if 90% or more for the last 16 hours. |
✔ | ✔ |
The following:
|
Essential linux daemons Notifications triggered:
|
✔ | ✖ |
The following:
|
Java processes health checks - shallow Notifications triggered:
|
✔ |
✖ |
The following:
|
Java processes health checks - deep Notifications triggered:
|
✔ | ✖ |
hadr_db_replication_health |
Database replication health check, between primary and secondary nodes in a cluster Relevant only when HA/DR and/or distributed architecture is enabled. Notifications triggered:
|
✔ | ✖ |
dfs_connectivity_health_check |
Distributed file system health check Notifications triggered:
|
✔ | ✖ |
suite_dist_elements_connection_health |
Connection health check between Central Manager and Remote Agents or Load Units in a distributed architecture Relevant only when HA/DR and/or distributed architecture is enabled. Notifications triggered:
|
✔ | ✖ |
suite_cyberark_aim_service |
Status of the CyberArk AIM service running on the ASMS host Notifications triggered:
|
✔ | ✖ |
cyberark_connectivity_health_check |
Connection health check between ASMS and CyberArk vault Notifications triggered:
|
✔ |
✔ |
Analysis |
Analysis results Notifications triggered:
Note: Always retrieved, even if this metric is disabled in the configuration file. |
✔ | ✖ |
Monitor |
Monitoring results Notifications triggered:
Note: Always retrieved, even if this metric is disabled in the configuration file. |
✔ | ✖ |
Log Collection |
Traffic log collection results Notifications triggered:
Note: Always retrieved, even if this metric is disabled in the configuration file. |
✔ | ✖ |
suite_traffic_logs_folder_size |
Size of the traffic log collection folder Notifications triggered:
|
✔ | ✔ |
Audit logs |
Audit log collection results Notifications triggered:
Note: Always retrieved, even if this metric is disabled in the configuration file. |
✔ | ✖ |
Scheduled Backup |
System backup service Notifications triggered:
Note: Always retrieved, even if this metric is disabled in the configuration file. |
✔ | ✖ |
SNMP platform monitoring
SNMP platform monitoring can be achieved using the Linux snmp.d service. SNMP configuration lies with the customer.