NFS Lock issue in NetApp NAS Server

  • A+
Categories:Linux Oracle Storage

On Jan 16, one Linux server crashed and the Oracle database could not be opened after server reboot, the prompt was the file was used by other process.
It was surprising as we had restarted the server twice, while still got the same error message, so the file was locked in the NAS server, not in the OS itself. To resume the business application, I had to restore the full database, but still want to know what locked these data files.

I wrote a document to explain this issue, so just copy it as below:
First, this issue could be reproduced; second, this issue could be fixed in several minutes.
This note is very useful for me to reproduce and fix this issue.
How to clear NFS locks during network crash or outage for Oracle datafiles
Below I’ll list the detail steps to reproduce this issue, and list the methods to resolve it.
1. When will this issue happen?
Crash is the most popular situation, and NFS locks will be left in the NAS server if the crash happens. The left locks will be released when some servers are powered on, but there are some servers always cannot clear the locks by themselves, and will cause Oracle cannot startup.
2.What’s the difference between the two kinds of servers?
To reproduce this issue, I downloaded the Netapp Simulator from the website and built a test environment, but found the locks will be freed every time when the server was booted, until I focused on this part of the above note:

So I changed the hostname of my test server from localhost.localdomain to localhost, then this issue was reproduced finally.
Detail steps are:
*Make sure the hostname of the test server is not a FQDN name
*Mount the filesystem from the NAS server
*Startup the Oracle database, and the locks will be placed on the NAS server.
NFS Lock issue in NetApp NAS Server
*Shutdown the Oracle database then the locks will be released
NFS Lock issue in NetApp NAS Server
*Startup the Oracle database again, and check the lock status one more time
NFS Lock issue in NetApp NAS Server
*Do not shutdown the Oracle database, but halt the system directly, and find the locks will be freed also
NFS Lock issue in NetApp NAS Server
*Startup the server and the Oracle database again, also check the lock status
NFS Lock issue in NetApp NAS Server
*Close the test server directly, and this action is almost the same of crash.
NFS Lock issue in NetApp NAS Server
*Check the lock status after several minutes, and find they are not cleared
NFS Lock issue in NetApp NAS Server
*Startup the OS, then check the lock status (If the hostname is FQDN name, the lock will be freed automatically)
NFS Lock issue in NetApp NAS Server
*Try to startup the Oracle database, will get below errors
NFS Lock issue in NetApp NAS Server
*Check the Oracle alert log
NFS Lock issue in NetApp NAS Server
So this issue is reproduced now.

How to fix this issue?
1.Do not crash the server!
2.Use FQDN name for server name.
3.Clear the locks from the NAS server
From the previous Netapp note, we could use below two commands to clear the locks.

NFS Lock issue in NetApp NAS Server
If the locks are cleared, then we can mount and open the Oracle database:
NFS Lock issue in NetApp NAS Server

Other useful documents:
How to determine filenames from lock_dump output
What are the details of Network File System (NFS) Lock Recovery and Network Status Monitor?

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: