System monitoring notifications

ASMS tracks various system metrics that trigger notifications when thresholds are exceeded. These notifications can be triggered as syslog messages, or events in the Notification Center.

AFA admins can modify the thresholds for each metric and the types of notifications triggered.

For more details, see Sending outgoing syslog messages and Manage ASMS notifications.

Configure system notifications

This procedure describes how to configure the json file that determines how and which AFA system notifications are sent.

Do the following:

Open a terminal and log in as user afa.

Browse to an open the /data/algosec-ms/config/watchdog_configuration.json file for editing.

The watchdog_configuration.json file includes the following properties:

metrics

An array that specifes AFA metrics.

For more details, see Metric element.

actions

An array of possible actions to take upon a metric status change.

Supported actions include:

publish_syslog
publish_issues_center

Note: While all metrics can trigger syslog messages, only some can trigger messages in the AFA Notification center.

For more details, see System notifications enabled by default .

metricsActions

An array of objects that each define when a specific status change triggers an action.

For more details, see MetricsAction.

Modify the json file as needed, and save your changes.

Metric element

The Metric element in the watchdog_configuration.json file has the following properties:

Property	Description
enabled	Boolean. Determines whether the metric is enabled.
name	String. Read-only. A unique name for the metric. For details, see System notifications enabled by default.
description	String. A description of the metric. For details, see System notifications enabled by default.
frequency	A frequency object, which specifies the frequency for checking the metric. Each frequency object includes the following properties: value. Integer. Determines how often the metric is checked. 0 = the metric is checked every time the collection service runs. unit. String. One of the following time units: SECOND MINUTE HOUR DAY Default = 10 SECONDS.
hostTypes	Array. List of appliances that check the metric. One of the following: MASTER SLAVE REMOTE_MANAGER If you do not have a distributed architecture, this is always defined as [MASTER].
thresholdPolicy	An options object that specifies the metric's thresholds. The options object is an array of objects that each specify a threshold for a specific status. For more details, see Options object and Threshold sample configuration.

Options object

Each options object includes the following properties:

status	String. Determines the status of the metric if the threshold is met. One of the following: PASS FAIL WARNING
type	String. Determines the type of result returned by the metric collection. One of the following: STRING INTEGER FLOAT BOOLEAN
condition	String. The comparison operator to use on the metric collection result. One of the following: EQ (=) LT (<) LTE (<=) GT (>) GTE (>=) NOT (!=)
value	A type specified in the type property. The value to compare to the metric collection result. Set the value to zero (0) to cause the status to change if the threshold is met even once.
timeCondition	A timeCondition object, which determines a time period for which the threshold must be met in order for the metric status to change. The timeCondition object includes the following properties: value. Integer. Determines how often the metric is checked. 0 = the metric is checked every time the collection service runs. unit. String. One of the following time units: SECOND MINUTE HOUR DAY

Threshold sample configuration

The example below defines actions to take for PASS and FAIL statuses:

The metric status will change to PASS if the result is OK for more than 1 minute.
The metric status will change to FAIL if the result is not OK even once.

"thresholdPolicy": {
 "options": [
  {
   "status": "PASS",
   "type": "STRING",
   "condition": "EQ",
   "value": "OK",
   "timeCondition": {
    "value": 1,
    "unit": "MINUTE"
   }
  },
 {
  "status": "FAIL",
  "type": "STRING",
  "condition": "NOT",
  "value": "OK",
  "timeCondition": {
   "value": 0,
   "unit": "MINUTE"
  } 
 }
 ]
}

MetricsAction

The MetricsAction element is an array that defines the statuses available for the threshold definition.

For example, the code sample shown above defines actions for the PASS and FAIL statuses, but not for WARNING statuses. In this scenario, the WARNING status should be disabled in the MetricAction array.

The MetricsAction array includes the following properties:

Property	Description
metric	String. Defines name of the metric, as stated in the metric's object in the metrics array.
action	String. The name of the action, as stated in the action's object in the actions array.
pass	Boolean. Determines whether the action should be triggered when the metric's status changes to pass.
warning	Boolean. Determines whether the action should be triggered when the metric's status changes to warning.
fail	Boolean. Determines whether the action should be triggered when the metric's status changes to fail.

System notifications enabled by default

Some AFA messages can be triggered as syslog or Notification Center messages, and others can be triggered as syslog messages only.

The following table lists the notifications enabled in AFA by default:

Metric names	Description	Syslog	Notification Center
suite_disk_space_available	Available disk space in root partition Notifications triggered: Fail if < 5% Warning if >=5% and < 10% Pass if >10%	✔	✔
suite_nas_disk_space_available	Available disk space in NAS partition Notifications triggered: Fail if < 5% Warning if >=5% and < 10% Pass if >10%	✔	✔
suite_data_disk_space_available	Available disk space in data partition Notifications triggered: Fail if < 5% Warning if >=5% and < 10% Pass if >10%	✔	✔
suite_open_file_descriptors	Open file descriptors Notifications triggered: Warning if more than 4000 for the last 5 minutes.	✔	✔
suite_memory_available	Available memory Notifications triggered: Warning if less than 10% for the last 3 hours.	✔	✔
suite_cpu_usage	CPU usage Notifications triggered: Warning if 90% or more for the last 16 hours.	✔	✔
The following: suite_logstash_service suite_crond_service, suite_elasticsearch_service, suite_httpd_service suite_kibana_service, suite_metro_service suite_mongo_service, suite_postgresql_service suite_tomcat_service	Essential linux daemons Notifications triggered: Fail if down Pass if up	✔	✖
The following: afa_shallow_health_check abf shallow health check aff_shallow_health_check	Java processes health checks - shallow Notifications triggered: Fail if doesn't work for 20 seconds Pass if works for 30 seconds	✔	✖
The following: afa_deep_health_check abf deep health check aff_deep_health_check	Java processes health checks - deep Notifications triggered: Fail if at least one item fails for 10 minutes Pass (immediately) if everything works	✔	✖
hadr_db_replication_health	Database replication health check, between primary and secondary nodes in a cluster Relevant only when HA/DR and/or distributed architecture is enabled. Notifications triggered: Fail if replication failed Pass if replication succeeded	✔	✖
dfs_connectivity_health_check	Distributed file system health check Notifications triggered: Fail if down Pass if up	✔	✖
suite_dist_elements_connection_health	Connection health check between Central Manager and Remote Agents or Load Units in a distributed architecture Relevant only when HA/DR and/or distributed architecture is enabled. Notifications triggered: Fail if down for 2 minutes Pass if up for 1 minute	✔	✖
suite_cyberark_aim_service	Status of the CyberArk AIM service running on the ASMS host Notifications triggered: Fail if down Pass if up	✔	✖
cyberark_connectivity_health_check	Connection health check between ASMS and CyberArk vault Notifications triggered: Fail if check failed Pass if check succeeded	✔	✔
Analysis	Analysis results Notifications triggered: Fail if a device analysis failed Pass if a device analysis succeeded Note: Always retrieved, even if this metric is disabled in the configuration file.	✔	✖
Monitor	Monitoring results Notifications triggered: Fail if a device monitoring cycle failed Pass if a device monitoring cycle succeeded Note: Always retrieved, even if this metric is disabled in the configuration file.	✔	✖
Log Collection	Traffic log collection results Notifications triggered: Fail if a device traffic log collection failed Pass if a device traffic log collection succeeded Note: Always retrieved, even if this metric is disabled in the configuration file.	✔	✖
suite_traffic_logs_folder_size	Size of the traffic log collection folder Notifications triggered: Pass if the /home/afa/.fa/syslog folder size is lesser than or equal to 4000 Mbs Warning if the /home/afa/.fa/syslog folder size is greater than 4000 Mbs Fail if the /home/afa/.fa/syslog folder size is larger than 8000 Mbs	✔	✔
Audit logs	Audit log collection results Notifications triggered: Fail if a device audit log collection failed Pass if a device audit log collection succeeded Note: Always retrieved, even if this metric is disabled in the configuration file.	✔	✖
Scheduled Backup	System backup service Notifications triggered: Fail if a scheduled backup failed Pass if a scheduled backup succeeded Note: Always retrieved, even if this metric is disabled in the configuration file.	✔	✖

SNMP platform monitoring

SNMP platform monitoring can be achieved using the Linux snmp.d service. SNMP configuration lies with the customer.

System monitoring notifications

Configure system notifications

System notifications enabled by default

SNMP platform monitoring

Sorry about that

Great!