Monitoring
Monit
Monit is a complete service monitoring software. It has a web interface that is available by default on the localhost interface via port 2812. It can detect if a service is down and restart it. Automatic alerts can also be configured.
Configuration
Install monit:
RHEL
$ sudo yum install epel-release
$ sudo yum install monit
Ubuntu
$ sudo apt-get install monit
Depending on your system, the main configuration file is one of these two below. The default settings can be used. The configurations are not case sensitive.
/etc/monitrc
/etc/monit.conf
The include directory for other configuration files will be on of these two:
/etc/monit.d/
/etc/monit/conf.d/
[1]
Global configuration options:
SET DAEMON
<SECONDs>Specify the cycle time in seconds. After this many seconds, each service in the configuration will be polled.
SET ALERT
<EMAIL>@<ADDRESS>Send all alerts to this e-mail address.
SET MAILSERVER
<HOSTNAME>[<PORT>] [USERNAME<USERNAME>] [PASSWORD<PASSWORD>]Define the mail server to use. This is typically localhost.
An example of a basic Nginx template is provided below. If the PID is not found, then monit will continue to attempt to start it until the new process is spawned.
Example:
check process nginx with pidfile /var/run/nginx.pid
start program = "/bin/systemctl start nginx"
stop program = "/bin/systemctl stop nginx"
check system
<NAME>Check system resources.
Checks:
IF
<RESOURCE><OPERATOR>THEN<ACTION>
check process
<SERVICE_NAME>with pidfile<PATH_TO_PIDFILE>Verify that the PID is running.
Checks:
IF CHANGED PID THEN
<ACTION>IF UPTIME
<OPERATOR>value<TIME_UNIT>THEN<ACTION>IF
<RESOURCE><OPERATOR>THEN<ACTION>
check file
<SERVICE_NAME>with path<PATH_TO_FILE>Verify a file exists with specific attributes. The “check directory” should be used instead if verifying a directory state.
Checks:
IF FAILED UID
<UID_OR_USERNAME>THEN<ACTION>IF FAILED GID
<GID_OR_GROUPNAME>THEN<ACTION>IF FAILED PERMISSION
<OCTAL_PERMISSION>THEN<ACTION>IF SIZE
<OPERATOR><NUMBER><SIZE_UNIT>THEN<ACTION>IF CHANGED SIZE THEN
<ACTION>IF CHANGED [MD5|SHA1] CHECKSUM THEN
<ACTION>IF FAILED [MD5|SHA1] CHECKSUM [EXPECT
<CHECKSUM>] THEN<ACTION>IF TIMESTAMP
<OPERATOR><TIME_VALUE><TIME_UNIT>THEN<ACTION>
check program
<SERVICE_NAME>with<PATH_TO_SCRIPT>Execute a script and verify it’s exit code.
Checks:
IF STATUS
<OPERATOR><EXIT_CODE>THEN<ACTION>
check host
<HOSTNAME>WITH ADDRESS<IP_ADDRESS>Verify that the remote host is accessible.
Checks:
IF FAILED PING[4|6] [COUNT
<NUMBER_VALUE>] [SIZE<MTU_SIZE>] [TIMEOUT<NUMBER_VALUE><TIME_UNIT>] [ADDRESS<IP_ADDRESS>] THEN<ACTION>IF FAILED PORT
<PORT_NUMBER>[TYPE[TCP|UDP]] [PROTOCOL<PROTOCOL>]
check network
<NETWORK_NAME>WITH INTERFACE<INTERFACE>Verify that an IP address exists on the local machine. This is useful for failover type load balancers.
Checks:
IF FAILED LINK THEN
<ACTION>IF SATURATION
<OPERATOR><PERCENT>THEN<ACTION>
check filesystem
<FILE_SYSTEM_NAME>with path<PATH_TO_DEVICE>Verify statistics about a file system.
<PATH_TO_DEVICE>can be a block device, mount, or directory.
Checks:
IF SPACE USAGE
<OPERATOR><SIZE_VALUE><SIZE_UNIT>THEN<ACTION>IF SPACE FREE
<OPERATOR><SIZE_VALUE><SIZE_UNIT>THEN<ACTION>IF INODE USAGE
<OPERATOR><SIZE_VALUE><SIZE_UNIT>THEN<ACTION>IF INODE FREE
<OPERATOR><SIZE_VALUE><SIZE_UNIT>THEN<ACTION>
Valid operators:
“<”, “lt”, or “less”
“>”, “gt”, or “greater”
“==”, “eq”, or “equal”
“!=”, “ne”, or “notequal”
Valid size units:
“B”, or “byte”
“KB”, or “kilobyte”
“MB”, or “megabyte”
“GB”, or “gigabyte”
“%”, or “percent”.
Valid time units:
“SECOND”, or “SECONDS”
“MINUTE”, or “MINUTES”
“HOUR”, or “HOURS”
“DAY”, or “DAYS”
Valid resources:
CPU([user|system|wait])
THREADS
CHILDREN
TOTAL MEMORY
<SIZE_UNIT>The memory usage of the main process and all of the children.
MEMORY
<SIZE_UNIT>The memory usage of just the main process. Alternatively this can monitor all of the server’s memory usage.
SWAP
<SIZE_UNIT>LOADAVG([1min|5min|15min])
Valid protocols:
dns
http
https
mysql
smtp
Valid actions:
“ALERT”
Send an e-mail alert.
“RESTART”
Run the restart function (or the stop and then start functions if the restart command is not specified). This will also send an e-mail alert.
“START”
Run the start service function.
“STOP”
Run the stop service function.
“EXEC”
Execute a specified script.
“UNMONITOR”
Stop monitoring the service.
[2]
Event Types:
1=checksum
2=resource
4=timeout
8=timestamp
16=size
32=connection
64=permission
128=UID
256=GID
512=nonexist
1024=invalid
2048=data
4096=exec
8192=fsflags
16384=icmp
32768=content
65536=instance
131072=action
262144=PID
524288=PPID
1048576=heartbeat
2097152=status
4194304=uptime [3]
History
Latest [https://github.com/LukeShortCloud/rootpages/commits/main/src/observation/monitoring.rst]
< 2020.10.01 [https://github.com/LukeShortCloud/rootpages/commits/main/src/administration/monitoring.rst]
< 2019.01.01 [https://github.com/LukeShortCloud/rootpages/commits/main/src/monitoring.rst]
< 2018.01.01 [https://github.com/LukeShortCloud/rootpages/commits/main/markdown/monitoring.md]
Bibliography
“Installing Monit for Server Monitoring.” Linode. October 15, 2015. Accessed November 22, 2016. https://www.linode.com/docs/uptime/monitoring/monitoring-servers-with-monit
“Mont Documentation.” Accessed September 30, 2016. https://mmonit.com/monit/documentation/monit.html
“Monit Events.” Accessed September 30, 2016. https://mmonit.com/documentation/http-api/Methods/Events