Systemd service was terminated abnormally during reboot

  • A+
Categories:Linux

To start/stop all the instances on a server, I created a script named dbss (database start/stop) and it worked well on AIX, RHEL5/6 and SLES11, while I found it could not work on RHEL7/SLES12 with systemd service.

When the server was started, the instances would be started without any issue, but when I rebooted the system, the instances were stopped abnormally. Such issue maybe is not a big problem for general applications, but for database I should fix it.

Last year I tried to fix this issue while failed and I just found it maybe was related with cgroup, but I could not figure out the root cause.

I thought one day I would learn RHEL7 from the beginning and fix it, and one year past I still did not handle it.

Last night my bro told me when he deployed dbss on SLES12, he got one issue that the started listener could not be accessed, and I believed this issues should be similar with the above one, so this time I had to face them directly.

I could create a systemd service with the user oracle and it worked well.

[root@olinux73 ~]# cat /etc/systemd/system/oradb.service
[Unit]
Description=Oracle Instance
After=remote-fs.target
Before=shutdown.target reboot.target halt.target

[Service]
Environment='ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1'
Type=forking
ExecStart=/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbstart orcl
ExecStop=/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbshut orcl
User=oracle
Group=dba

[Install]
WantedBy=multi-user.target
[root@olinux73 ~]# systemctl status oradb
● oradb.service - Oracle Instance
   Loaded: loaded (/etc/systemd/system/oradb.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2018-06-09 16:38:38 NZST; 24s ago
  Process: 7756 ExecStart=/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbstart orcl (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/oradb.service
           ├─7854 ora_pmon_orcl
           ├─7856 ora_psp0_orcl
           ├─7858 ora_vktm_orcl
           ├─7862 ora_gen0_orcl

While if I created a systemd service with dbss script, I got different result:

[root@olinux73 ~]# cat /etc/systemd/system/dbss.service
[Unit]
Description=Oracle Instance and Listener
After=remote-fs.target systemd-logind.service local-fs.target remote-fs.target
Before=shutdown.target reboot.target halt.target

[Service]
Type=forking
RemainAfterExit=yes
ExecStart=/etc/init.d/dbss start
ExecReload="/etc/init.d/dbss stop; /etc/init.d/dbss start"
ExecStop=/etc/init.d/dbss stop
SendSIGKILL=no
TimeoutStopSec=0
KillMode=none

[Install]
WantedBy=multi-user.target
[root@olinux73 ~]# systemctl status dbss
● dbss.service - Oracle Instance and Listener
   Loaded: loaded (/etc/systemd/system/dbss.service; enabled; vendor preset: disabled)
   Active: active (exited) since Sat 2018-06-09 16:42:11 NZST; 33s ago
  Process: 8336 ExecStart=/etc/init.d/dbss start (code=exited, status=0/SUCCESS)
 Main PID: 6319 (code=exited, status=0/SUCCESS)

If I added the pidfile to the config file, I would get below status output:

[root@olinux73 ~]# systemctl status dbss
● dbss.service - Oracle Instance and Listener
   Loaded: loaded (/etc/systemd/system/dbss.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2018-06-09 16:29:40 NZST; 1min 10s ago
  Process: 6154 ExecStart=/etc/init.d/dbss start (code=exited, status=0/SUCCESS)
 Main PID: 6319 (ora_pmon_orcl)
   CGroup: /system.slice/dbss.service
           ‣ 6319 ora_pmon_orcl

And I found the instance was terminated:

2018-06-09 01:52:35.780000 +12:00
Performing implicit shutdown abort due to dead PMON
Shutting down instance (abort)
License high water mark = 12
USER (ospid: 21060): terminating the instance
Instance terminated by USER, pid = 21060
Instance shutdown complete
ORA-1092 : opitsk aborting process
2018-06-09 01:58:19.646000 +12:00
Starting ORACLE instance (normal) (OS id: 993)
CLI notifier numLatches:3 maxDescs:519

I tested lots of options of systemd service but most of them failed until I found this webpage:

SAP Instances failed stop on shutdown (PACEMAKER, SYSTEMD, SAP)

The key point was this part:

In a Linux with systemd any application call with
    su - <somenameofsomeuser> 
will result in a move into a user slice. This is especially true in case of SAP Instances and Databases handled by the pacemaker cluster service.

I already found all the processes were under the user.slice when I started the systemd dbss service:

user.slice
  user-1000.slice
   session-c6.scope
    3084 ora_pmon_orcl
    3086 ora_psp0_orcl
    3088 ora_vktm_orcl
    3092 ora_gen0_orcl
    3096 ora_mman_orcl
    3098 ora_diag_orcl
    3100 ora_dbrm_orcl
    3102 ora_vkrm_orcl
    3104 ora_dia0_orcl
    3106 ora_dbw0_orcl
    3108 ora_lgwr_orcl

While for systemd oradb service, they were under system.slice:

[root@olinux73 ~]# systemd-cgls
1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
system.slice
 dbus.service
  671 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
 firewalld.service
  698 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
 lvm2-lvmetad.service
  516 /usr/sbin/lvmetad -f
 postfix.service
  1785 /usr/libexec/postfix/master -w
  1794 pickup -l -t unix -u
  1795 qmgr -l -t unix -u
 crond.service
  700 /usr/sbin/crond -n
 oradb.service
   955 ora_pmon_orcl
   957 ora_psp0_orcl
   959 ora_vktm_orcl
   963 ora_gen0_orcl
   965 ora_mman_orcl
   969 ora_diag_orcl
   971 ora_dbrm_orcl
   973 ora_vkrm_orcl
   975 ora_dia0_orcl
   977 ora_dbw0_orcl

To start all the instances, the dbss has to be run as root user, and it will su to other users to start the instances, so the systemd dbss service could not be run with a special user like oracle.

Following the above webpage I modified the file /etc/pam.d/system-auth-ac:

[root@olinux73 ~]# cat /etc/pam.d/system-auth
#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 1000 quiet_success
auth        required      pam_deny.so

account     required      pam_unix.so
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 1000 quiet
account     required      pam_permit.so

password    requisite     pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type=
password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     [success=1 new_authtok_reqd=ok default=ignore] pam_listfile.so item=user sense=allow file=/etc/orausers  --->add this line just before below line.
-session     optional      pam_systemd.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so

And created a file /etc/orausers:

[root@olinux73 ~]# cat /etc/orausers
oracle

Then I restart the systemd dbss service and all the processes began to run under system.slice:

[root@olinux73 ~]# systemd-cgls
├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
├─system.slice
│ ├─dbus.service
│ │ └─675 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
│ ├─firewalld.service
│ │ └─695 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
│ ├─lvm2-lvmetad.service
│ │ └─515 /usr/sbin/lvmetad -f
│ ├─postfix.service
│ │ ├─2259 /usr/libexec/postfix/master -w
│ │ ├─2267 pickup -l -t unix -u
│ │ └─2268 qmgr -l -t unix -u
│ ├─crond.service
│ │ └─700 /usr/sbin/crond -n
│ ├─systemd-journald.service
│ │ └─496 /usr/lib/systemd/systemd-journald
│ ├─auditd.service
│ │ └─644 /sbin/auditd -n
│ ├─gssproxy.service
│ │ └─673 /usr/sbin/gssproxy -D
│ ├─dbss.service
│ │ ├─ 759 /u01/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit
│ │ ├─1021 ora_pmon_orcl
│ │ ├─1023 ora_psp0_orcl
│ │ ├─1065 ora_vktm_orcl
│ │ ├─1069 ora_gen0_orcl
│ │ ├─1071 ora_mman_orcl
│ │ ├─1075 ora_diag_orcl
[root@olinux73 ~]# systemctl status dbss
● dbss.service - Oracle Instance and Listener
   Loaded: loaded (/etc/systemd/system/dbss.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2018-06-10 00:07:38 NZST; 45min ago
  Process: 693 ExecStart=/etc/init.d/dbss start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/dbss.service
           ├─ 759 /u01/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit
           ├─1021 ora_pmon_orcl
           ├─1023 ora_psp0_orcl
           ├─1065 ora_vktm_orcl
           ├─1069 ora_gen0_orcl
           ├─1071 ora_mman_orcl
           ├─1075 ora_diag_orcl

Then I rebooted the system and checked the alert log of the instance:

2018-06-10 00:06:40.594000 +12:00
Shutting down instance (immediate)
Stopping background process SMCO
2018-06-10 00:06:41.600000 +12:00
Shutting down instance: further logons disabled
Stopping background process CJQ0
Stopping background process MMNL
2018-06-10 00:06:42.652000 +12:00
Stopping background process MMON
License high water mark = 2
2018-06-10 00:06:44.923000 +12:00
Waiting for dispatcher 'D000' to shutdown
2018-06-10 00:06:48.213000 +12:00
All dispatchers and shared servers shutdown
ALTER DATABASE CLOSE NORMAL
SMON: disabling tx recovery
Stopping Emon pool
Stopping Emon pool
SMON: disabling cache recovery
Shutting down archive processes
Archiving is disabled
Thread 1 closed at log sequence 33
Successful close of redo thread 1
Completed: ALTER DATABASE CLOSE NORMAL

I got big help also from below command:

[root@olinux73 ~]# journalctl -b-1 -u dbss
..............................
Jun 10 00:06:57 olinux73.dbcloudsvc.com su[12076]: pam_unix(su:session): session opened for user oracle by (uid=0)
Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12100]: (to oracle) root on none
Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12100]: pam_unix(su-l:session): session opened for user oracle by (uid=0)
Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12117]: (to oracle) root on none
Jun 10 00:06:59 olinux73.dbcloudsvc.com su[12117]: pam_unix(su:session): session opened for user oracle by (uid=0)
Jun 10 00:07:00 olinux73.dbcloudsvc.com su[12117]: pam_unix(su:session): session closed for user oracle
Jun 10 00:07:00 olinux73.dbcloudsvc.com dbss[11130]: Stopping database orcl..........Done!
Jun 10 00:07:00 olinux73.dbcloudsvc.com dbss[11130]: Databae orcl (/u01/app/oracle/product/12.1.0/dbhome_1) has been stopped successfully!
Jun 10 00:07:00 olinux73.dbcloudsvc.com systemd[1]: Stopped Oracle Instance and Listener.

I found the issue was caused by pam_systemd when I checked the log at the beginning.

Then I updated the dbss script to do such changes automatically. :)

Great day!

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: