System sanity checks
This section describes how to perform system sanity checks. Run these checks after making changes to your environment, such as for clusters, distributed architectures, and upgrades.
These sanity checks also define standards for basic ASMS functionality, and enable you to verify that your environment is functioning as expected.
Note: This topic includes sub-sections for advanced sanity checks and tests. Advanced tests are intended to be run by experienced ASMS system administrators.
If you have questions about these advanced tests, contact AlgoSec support.
ASMS basic functionality
Basic functionality for ASMS is defined as follows:
Hardware or VM |
Basic functionality on virtual machines deployed with ASMS, or on AlgoSec Hardware Appliances, includes all necessary processes running. For details, see Test ASMS processes. |
AFA |
Basic AFA functionality includes:
For details, see Test AFA functionality. |
FireFlow |
Basic FireFlow functionality includes:
For details, see Test basic FireFlow functionality. |
AppViz |
Basic AppViz functionality includes:
For details, see Test basic AppViz functionality. |
Test machine installation and configuration
This section describes how to test that your ASMS machines are installed and configured correctly. Do this after making changes to your configuration, deploying a new system, or upgrading.
Do the following:
Open a browser, and browse to IP address of your AlgoSec machine.
If the AlgoSec home page appears, your machine is connected and configured correctly. For example:
If this page or another like it does not appear, check to see that your basic configurations have been done correctly.
Test ASMS processes
This procedure describes how to test that basic ASMS processes are running on your machines.
(Note: Starting from ASMS A32.60, the MongoDB database is no longer used.)
Do the following:
-
Connect to the Administration Interface. For details, see Connect to and Utilize the Administration Interface.
-
Enter 17. System health. Select 1. Check services status to verify service status.
Output similar to the following should appear, confirming that all of these services are running:
Copy|======================================|
| xxx.xxx.xx.xx |
| |
| (Mon May 12 07:29:43 EDT 2021) |
|--------------------------------------|
| crond | OK |
| httpd | OK |
| ntpd | OK |
| postgresql | OK |
| activemq | OK |
| syslog-ng | OK |
| AlgoSec microservices | OK |
| aff-boot | OK |
| batch application | OK |
| ABF | OK |
| cloudlicensing | OK |
| configuration | OK |
| device driver AWS | OK |
| device driver Azure | OK |
| device manager | OK |
| map diagnostics | OK |
| policy-optimizations | OK |
| trafficlogmanager | OK |
| Vulnerabilities | OK |
| watchdog | OK |
| validation | OK |
| ms-metro | OK |
| initial-plan | OK |
|======================================|Try to restart the services. Run for each service:algosec_test_service -n <SYSTEM SERVICE NAME> -f
for example,
algosec_test_service -n postgresql -f
System service names to use when restarting a service:FRIENDLY NAME SYSTEM SERVICE NAME AAD Log Sensor ms-aad-log-sensor AAD Network Sensor networksensor AAD Azure Sensor ms-aad-azure-sensor AAD Server ms-autodiscovery AppViz ms-bflow activemq activemq aff-boot aff-boot AlgoSec microservices algosec-ms backup/restore ms-backuprestore batch application ms-batch-application chisel chisel.target cloudflow-broker ms-cloudflow-broker cloudlicensing ms-cloudlicensing configuration ms-configuration configuration ms-configuration crond crond device driver AWS ms-devicedriver-aws device driver Azure ms-devicedriver-azure device manager ms-devicemanager elasticsearch elasticsearch HA/DR microservice ms-hadr httpd httpd iptables iptables kibana kibana logstash logstash map diagnostics ms-mapDiagnostics initial-plan initial-plan ms-metro ms-metro, metro ntpd ntpd policy-optimizations ms-policy-optimizations policy-optimizations ms-policy-optimizations postgresql postgresql syslog-ng syslog-ng trafficlogmanager ms-trafficlogmanager validation ms-validation vulnerabilities ms-vulnerabilities watchdog ms-watchdog If services do not restart, contact AlgoSec support.
-
Enter 17. System health. Select 2. Check system health
To check your system's health, choose:
-
Quick check: an abridged check. (Quick Check does not include checks for:
disk speed, disk space for DBschema upgrade, device complexity levels, and whether all devices in the firewall_data.xml are also in the DB). -
Full check: a full check (takes longer because includes all checks).
For the text file report of your system health, see /var/log/algosec_toolbox/system_check_output.json.
-
The following tests are intended for advanced users to verify detailed information about system health.
For more details, contact AlgoSec customer support.
Do the following:
-
Verify that the date and time on your ASMS machine are valid.
Run:
# date +'Date: %x | Time %X | Time Zone %:z'
-
Verify that the NTPD daemon is enabled on the ASMS machine. Run:
# service ntpd status
# ntpq -p
-
Verify that the installation or upgrade logs contain no errors. Find these log files in the following locations:
- /var/log/appliance-install.log
- /var/log/fa-install.log
- /var/log/fireflow-install.log
-
Ensure that the AlgoSec software RPM versions are up to date. Run:
# rpm -q fa algosec-appliance fireflow
-
Verify that there are no processes that are performing extensive I/O operations. Run one of the following:
-
Verify I/O operations using iotop:
iotop -o
-
Verify I/O operations without iotop:
# watch -n 1 -d 'iostat -Nxtz -p ALL'
-
-
Verify that the /data partition on the ASMS machine has enough free disk space. Run:
# df --human-readable --print-type
-
Check for errors in system logs. Run:
egrep -i "kernel:.*(error|crit|fatal|blocked|fail)" /var/log/kern
egrep -i '(error|warn|status|init|blocked|stopped)' -i /var/log/dmesg
egrep -i "kernel:.*(error|crit|fatal|blocked|fail|setting system clock to)" /var/log/messages*
-
Check for errors in POSTGRES logs. Run:
egrep -i '(deadlock|error|fatal|constraint|denied)' /data/pgsql/data/pg_log/*
-
Check for errors in GLOBAL logs. Run:
egrep -H '(duplicate key value violates unique constraint|has too many parameters|Cannot assign requested address at|Socket: connect: Connection refused at|Resource temporarily unavailable|Too many open files in system|Thread creation failed|unable to create new native thread|Permission denied, please try again|incorrect username or password|Wrong username or password|username or password is incorrect|Wrong user|value too long for type|java.lang.OutOfMemoryError|javax.persistence|No space left on device|Error: Migration failed for change set classpath)' /home/afa/.fa-*
-
Verify that no PHP out of MEMORY issues have been logged. Run:
grep -i 'bytes exhausted (tried to allocate' /etc/httpd/logs/* /home/afa/public_html/algosec/.ht-fa-history*
-
If you are working with an HA configuration, verify that folders are being synchronized to the secondary node. Do the following:
- Run an analysis of a single device. On the device's OVERVIEW tab, click Analyze. For details, see View AFA device data.
- Check that the relevant folders for the device you analyzed are synchronized to the HA's secondary server.
Test AFA functionality
This procedure describes how to test AFA functionality.
Do the following:
-
- Define an email server.
- Define a user with permissions for all devices. Specify that the user receives email notifications for all reports and configuration / policy changes.
-
Test device definition and analysis
-
Define a new device and assign a user with permissions for it, or use an existing device to test AFA functionality.
-
Run a manual analysis on the device.
-
Verify that all sections of the new report have valid results.
In the report, on the Policy Optimization tab, in the Rule Usage Statistics area, click All Rule Usage.
Check the first text line to verify that the report is based on logs collected today.
Advanced report testsVerify that no issues occurred during report generation. Run:
find /data/algosec/firewalls/ -xdev -type f -mtime -1 -name "fwa.history*" -exec egrep -H '(duplicate key value violates unique constraint|has too many parameters|Cannot assign requested address at|Socket: connect: Connection refused at|Resource temporarily unavailable|Too many open files in system|Thread creation failed|unable to create new native thread|Permission denied, please try again|incorrect username or password|Wrong username or password|username or password is incorrect|PSQLException|No routes found|Wrong user|Permission denied|value too long for type|java.lang.OutOfMemoryError|javax.persistence)' {} \;
If you have a distributed architecture, run:
LR=`awk '{print $1}' /home/afa/algosec/firewalls/lastjob`; let "FR = $LR - 300"; for ((j=$FR;j<=$LR;j++)); do grep -H "Request no longer runs on slave\|Analysis stopped running on the slave" /home/afa/algosec/firewalls/afa-$j/fwa.history ; done
-
-
- Add a rule to the device's policy.
- Wait for the next monitoring cycle to run. By default, this runs every 20 minutes.
- View the device's Monitoring tab and verify that the change was detected.
- In the DEVICES tree, navigate to your device and verify that the log collection status is green.
Advanced monitoring testsVerify that no monitoring issues occurred. Run:
find /data/algosec/monitor/ -xdev -type f -mtime -1 -name "fwa.history*" -exec egrep -H '(Failed to calculate|Device or resource busy|duplicate key value violates unique constraint|Exception while starting monitor subflow|has too many parameters|PSQLException|Wrong user|value too long for type|java.lang.OutOfMemoryError|javax.persistence|Current thread is not owner of the lock|Adding the monitor folder for HADR failed|java.lang.RuntimeException: external command failed \[ command = /usr/share/fa/bin/dist_dfs_conf.sh|Failed to create virtual work directories for monitor worker)' {} \;
If you have a distributed architecture, run:
find /data/algosec/monitor/ -xdev -type f -mtime -1 -name "fwa.history*" -exec egrep -H '(Failed to create virtual work directories for monitor|Adding the monitor folder for HADR failed|java.lang.RuntimeException: external command failed \[ command = /usr/share/fa/bin/dist_dfs_conf.sh|Current thread is not owner of the lock)' {} \;
On your Central Manager only, run:
find /data/algosec/monitor/ -xdev -type f -mtime -1 -name "fwa.history*" -exec egrep -H '(State machine execution was aborted)' {} \;
On your Remote Agent only, run:
find /data/algosec/monitor/ -xdev -type f -mtime -1 -name "fwa.history*" -exec egrep -H '(Updating RM return code)' {} \;
Advanced log server testsCheck for issues in the log server after monitoring. Run:
find /home/afa/.fa/firewalls/ -xdev -type f -mtime -1 -name "fwa.history*" -exec egrep -H '(duplicate key value violates unique constraint|has too many parameters|Cannot assign requested address at|Socket: connect: Connection refused at|Resource temporarily unavailable|Too many open files in system|Thread creation failed|unable to create new native thread|Permission denied, please try again|incorrect username or password|Wrong username or password|username or password is incorrect|is not defined in firewall_data.xml|aborted due to compilation errors|collect_lock more than |Wrong user|known_hosts|collector initialization failed)' {} \;
Advanced syslog processing testsCheck for errors in syslog messaging. Run:
egrep -H '(is not unique. Assigning to|failed to parse log line from)' /home/afa/algosec/syslog_processor/logs/*
-
Test email alerts
- Check that the user you defined back in Prepare for your test received an email alert about the analysis completed in Test device definition and analysis.
- Check that the same user received an alert about the change you made to the device in Test change monitoring.
-
Test the mapping topology
- Add more devices to AFA, and then run an analysis on ALL_FIREWALLS.
- Navigate to the ALL_FIREWALLS > MAP tab, and verify that the map generated successfully.
-
Run a traffic simulation query and use the Topology Advisor.
Save your results for comparison later. For details, see Run traffic simulation queries and Improve the map .
-
Schedule recurring analysis.
Schedule a device analysis job for the ALL_FIREWALLS group, and verify that it runs as configured.
For details, see Schedule analysis.
Test basic FireFlow functionality
This procedure describes how to test basic FireFlow functionality.
Do the following:
-
Test change request submission
Do the following as a Requestor user, and then again as a privileged user:
- Log in to FireFlow and submit a change request. If your organization uses a customized template or workflow, use the custom version.
- Verify that the change request was submitted successfully.
- Verify that an email was received by the configure user for the new change request. For details, see Manage FireFlow emails and notifications.
-
Test workflow functionality and validation
-
Locate one of the change requests you created in Test change request submission , and move it through the various stages of the workflow.
-
Verify that the following stages produce valid results:
- Initial Plan: Shows the relevant devices for the change request.
- Risk Check: Shows a list of risks.
- Work Order: Shows a valid suggestion to implemented the requested change.
-
When you get to the Work Order stage in the change request, implement the change on the device.
-
After the next monitoring cycle is complete, browse to the Validation stage of the workflow, and verify that accurate validation results are shown.
-
In AFA, run an analysis on the device. Wait 2 hours, and then browse to the AutoMatching FireFlow stage, and verify that the change request and change are listed in the correct section.
-
Test basic AppViz functionality
This procedure describes how to test basic AppViz functionality.
Do the following:
-
- Create a new application, and add flows to it. Add at least one flow that is currently blocked by the organization's firewalls.
- Verify that the application is created successfully.
-
Test connectivity and change requests
- Apply the application draft and check the application connectivity.
- Verify the connectivity for each flow, and that the connectivity of the entire application updates automatically.
- In the Change Requests tab, verify that a change request was created for the new flows.
-
Test application decommissioning
- Decommission the application you created in Test new applications.
- Verify that the application's status changes to Decommissioned.
- Verify that the relevant change requests were opened to drop the application's traffic.
Note: If the application contains flows that are in use by other applications, change requests for this traffic will not be opened.