Several days ago I helped to fix one ORA-12520 issue in RAC database, and from this issue I learned that special care should be taken when the listener.ora is modified in RAC database.
The issue was turned out to be not so complicated finally so I will not mention too many details of it.
The error messages were like this:
SQL> conn test/test1234ch;
SQL> conn test/test1234ch@prd01;
ORA-12520: TNS:listener could not find available handler for requested type of server
And it seemed the listener did not support any service:
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
Version TNSLSNR for Linux: Version 184.108.40.206.0 - Production
Start Date 25-JAN-2018 15:17:30
Uptime 0 days 0 hr. 1 min. 49 sec
Trace Level off
Security ON: Local OS Authentication
Listener Parameter File /u01/app/220.127.116.11/grid/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/prd01a/listener/alert/log.xml
Listening Endpoints Summary...
The listener supports no services
The command completed successfully
And we found the the status of the listener was not online:
ONLINE INTERMEDIATE prd01a Not All Endpoints Registered
ONLINE INTERMEDIATE prd01b Not All Endpoints Registered
And my friend told me the RAC database worked well in the past and they did not change anything of the database before and after the server reboot.
Yes, for some reason the server crashed and was restarted automatically but after that the database could not be connected.
We checked the alert file of the listener, and only got some information as below while not other clues:
25-JAN-2018 15:17:30 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=prd01a)(USER=oracle))(COMMAND=status)(ARGUMENTS=64)(SERVICE=LISTENER)(VERSION=186647552)) * status * 0
25-JAN-2018 15:17:30 * version * 0
25-JAN-2018 15:17:30 * service_register_NSGR * 1194
TNS-01194: The listener command did not arrive in a secure transport
From the 1194 error, I guessed the TCPS maybe was used for the connections so I asked to get the content of the listener.ora file:
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
SECURE_REGISTER_LISTENER = (TCP)
And I was told the file worked for a long time.
From the Oracle support, I found several possible reasons like wrong permissions on the /tmp/.oracle or /var/tmp/.oracle, infiniband compatibility issue, more than one started listener or wrong configurations of the parameter local_listener. We checked all of them and they were not matched.
I wanted to figure out the detail purpose of the option SECURE_REGISTER_LISTENER, and got below two notes:
One key point was: In 11.2 RAC the grid agent uses the IPC protocol to create and manage scan listeners so both IPC and TCPS must be enabled.
And the setting of this option: SECURE_REGISTER_LISTENER_SCAN1 = (IPC,TCPS)
So it was clear the IPC must be included.
My friend has a good habit and he showed his previous backup of this file, while I found the last line was not there!
We commented it and restarted the listener, then it became online and could be connected.
Then he recalled such change was made long time ago to fix a security issue and same change was implemented on both UAT and PROD environments and they did not get any issue ... until the reboot.
Why? They just restarted the listener during the changes while listener restarting will not affect all the existing sessions, so the RAC database worked well until this abnormal reboot, and no one would know this change caused this incident.
So when we change the listener.ora file in the RAC database, we'd better restart the cluster service one node by one node, right?