今天zabbix服务器mysql突然占用了大量内存,导致zabbix server服务器本身出现了内存占用大于95%的报警。占用大量内存,本着重启解决一切的原则,重启了该服务器。重启后发现zabbix server服务出现了无法启动的问题。
故障现象
一,检查服务是否启动
[root@shengbao.org ~]# ps aux|grep zabbix zabbix 1063 0.0 0.0 78800 1300 ? S 14:00 0:00 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf zabbix 1085 0.0 0.0 78800 1412 ? S 14:00 0:00 /usr/sbin/zabbix_agentd: collector [idle 1 sec] zabbix 1086 0.0 0.0 78800 2136 ? S 14:00 0:00 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection] zabbix 1087 0.0 0.0 78800 2136 ? S 14:00 0:00 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection] zabbix 1088 0.0 0.0 78800 2404 ? S 14:00 0:00 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection] zabbix 1089 0.0 0.0 78800 2208 ? S 14:00 0:00 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
二,重启zabbix_server服务
[root@shengbao.org ~]# systemctl restart zabbix_server
三,重启后检查该服务状态。
[root@shengbao.org ~]# systemctl status zabbix-server ● zabbix-server.service - Zabbix Server Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Mon 2020-11-30 14:06:18 CST; 12s ago Main PID: 11458 (code=exited, status=0/SUCCESS) CGroup: /system.slice/zabbix-server.service Nov 30 14:06:18 shengbao.org systemd[1]: zabbix-server.service: control process exited, code=exited status=1 Nov 30 14:06:18 shengbao.org systemd[1]: Unit zabbix-server.service entered failed state. Nov 30 14:06:18 shengbao.org systemd[1]: zabbix-server.service failed.
四,出现失败后,检查/var/log/message检查系统日志
[root@shengbao.org ~]# cat /var/log/messages Nov 30 14:01:47 shengbao.org systemd: zabbix-server.service holdoff time over, scheduling restart. Nov 30 14:01:47 shengbao.org systemd: Starting Zabbix Server... Nov 30 14:01:47 shengbao.org systemd: PID file /run/zabbix/zabbix_server.pid not readable (yet?) after start. Nov 30 14:01:47 shengbao.org systemd: Started Zabbix Server. Nov 30 14:01:50 shengbao.org kill: Usage: Nov 30 14:01:50 shengbao.org kill: kill [options] <pid|name> [...] Nov 30 14:01:50 shengbao.org kill: Options: Nov 30 14:01:50 shengbao.org kill: -a, --all do not restrict the name-to-pid conversion to processes Nov 30 14:01:50 shengbao.org kill: with the same uid as the present process Nov 30 14:01:50 shengbao.org kill: -s, --signal <sig> send specified signal Nov 30 14:01:50 shengbao.org kill: -q, --queue <sig> use sigqueue(2) rather than kill(2) Nov 30 14:01:50 shengbao.org kill: -p, --pid print pids without signaling them Nov 30 14:01:50 shengbao.org kill: -l, --list [=<signal>] list signal names, or convert one to a name Nov 30 14:01:50 shengbao.org kill: -L, --table list signal names and numbers Nov 30 14:01:50 shengbao.org kill: -h, --help display this help and exit Nov 30 14:01:50 shengbao.org kill: -V, --version output version information and exit Nov 30 14:01:50 shengbao.org kill: For more details see kill(1). Nov 30 14:01:50 shengbao.org systemd: zabbix-server.service: control process exited, code=exited status=1 Nov 30 14:01:50 shengbao.org systemd: Unit zabbix-server.service entered failed state. Nov 30 14:01:50 shengbao.org systemd: zabbix-server.service failed.
五,通过系统日志仅仅能查看到服务启动失败,造成失败的愿意并没有直观的显示出来。查看zabbix应用日志。
[root@shengbao.org ~]# cat /var/log/zabbix/zabbix-server.log 3520:20201130:144346.442 [Z3001] connection to database 'zabbix' failed: [1040] Too many connections 3520:20201130:144346.442 Cannot connect to the database. Exiting... 3521:20201130:144346.444 [Z3001] connection to database 'zabbix' failed: [1040] Too many connections 3521:20201130:144346.444 Cannot connect to the database. Exiting...
六,通过观察zabbix应用日志,发现报错是数据库Too many connections,修改数据库连接数解决该问题
[root@shengbao.org ~]# mysql -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 1247 Server version: 5.5.65-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. #查看当前最大连接数 MariaDB [(none)]> show variables like 'max_connections'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 151 | +-----------------+-------+ 1 row in set (0.00 sec) #设置全局最大连接数为1000. MariaDB [(none)]> set global max_connections=1000; Query OK, 0 rows affected (0.00 sec) #检查是否设置成功。 MariaDB [(none)]> show variables like 'max_connections'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 1000 | +-----------------+-------+ 1 row in set (0.00 sec) MariaDB [(none)]> exit
七,重启服务故障解决
#systemctl restart zabbix_server #systemctl status zabbix_server