One ORA-12520 issue in Oracle RAC

  • A+
Categories:Oracle

Several days ago I helped to fix one ORA-12520 issue in RAC database, and from this issue I learned that special care should be taken when the listener.ora is modified in RAC database.

The issue was turned out to be not so complicated finally so I will not mention too many details of it.

The error messages were like this:

And it seemed the listener did not support any service:

And we found the the status of the listener was not online:

And my friend told me the RAC database worked well in the past and they did not change anything of the database before and after the server reboot.

Yes, for some reason the server crashed and was restarted automatically but after that the database could not be connected.

We checked the alert file of the listener, and only got some information as below while not other clues:

From the 1194 error, I guessed the TCPS maybe was used for the connections so I asked to get the content of the listener.ora file:

And I was told the file worked for a long time.

From the Oracle support, I found several possible reasons like wrong permissions on the /tmp/.oracle or /var/tmp/.oracle, infiniband compatibility issue, more than one started listener or wrong configurations of the parameter local_listener. We checked all of them and they were not matched.

I wanted to figure out the detail purpose of the option SECURE_REGISTER_LISTENER, and got below two notes:

Scan Listener TCPS Service Handlers are Blocked after Implementing COST on an SSL Cluster (Doc ID 1537743.1)

Using Class of Secure Transport (COST) to Restrict Instance Registration in Oracle RAC (Doc ID 1340831.1)

One key point was: In 11.2 RAC the grid agent uses the IPC protocol to create and manage scan listeners so both IPC and TCPS must be enabled.

And the setting of this option: SECURE_REGISTER_LISTENER_SCAN1 = (IPC,TCPS)

So it was clear the IPC must be included.

My friend has a good habit and he showed his previous backup of this file, while I found the last line was not there!

We commented it and restarted the listener, then it became online and could be connected.

Then he recalled such change was made long time ago to fix a security issue and same change was implemented on both UAT and PROD environments and they did not get any issue ... until the reboot.

Why? They just restarted the listener during the changes while listener restarting will not affect all the existing sessions, so the RAC database worked well until this abnormal reboot, and no one would know this change caused this incident.

So when we change the listener.ora file in the RAC database, we'd better restart the cluster service one node by one node, right?

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: