201506030956VMWare 的 Guest OS 無法開機 - Lock

  VMWare 的 Guest OS 無法開機,開機程序卡在 95% 動也不會動,其VMDK也無法被其他 Guest OS 給掛載,甚至掛載過去導致該 Guest OS 都掛了,從嘗試 Power on 到顯示 Error message from xxx_IP: Cannot open the disk xx.vmdk or one of the snapshot disks it depends on 這個錯誤,時鐘已經走了四個多鐘頭。

QQ:當系統有問題,作業系統多強制開關機幾次會怎樣?我想大部分的人都會擔心這樣會導致系統真的起不來,那 Vmware 的 ESXi 呢?Guest OS 無法正常開機,多幾次強制開機會不會導致VMDK損毀,變成真的開不起來??
A:在當時系統緊急的時候,沒人可以回答我這個問題;現在你問我,我也沒辦法回答,我只能說幸好我的系統沒事 XD

------------------------------------------------------------------------------------------------------------
這篇沒辦法告訴各位,遇到這個相同問題該怎麼解決,我只能將當時的事件記錄下來,嘗試避免相同的問題發生
------------------------------------------------------------------------------------------------------------

##錯誤訊息 vmware.log ###
Worker#0| I120: DISK: OPEN scsi0:0 '/vmfs/volumes/eggroller/vmhost01/vmhost01-000001.vmdk' persistent R[]
Worker#1| I120: DISK: OPEN scsi0:1 '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-000001.vmdk' persistent R[]
Worker#1| I120: DISKLIB-VMFS : "/vmfs/volumes/eggroller/vmhost01/vmhost01_1-000001-delta.vmdk" : open successful (10) size = 1670075084800, hd = 698978. Type 8
Worker#1| I120: DISKLIB-DSCPTR: Opened [0]: "vmhost01_1-000001-delta.vmdk" (0xa)
Worker#1| I120: DISKLIB-LINK : Opened '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-000001.vmdk' (0xa): vmfsSparse, 3984588800 sectors / 1.9 TB.
Worker#1| I120: AIOGNRC: Failed to open '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-flat.vmdk' : Failed to lock the file (40003) (0x2011).
Worker#1| I120: AIOMGR: AIOMgr_OpenWithRetry: Descriptor file '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-flat.vmdk' locked (try 0)
Worker#1| I120: AIOGNRC: Failed to open '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-flat.vmdk' : Failed to lock the file (40003) (0x2011).
Worker#1| I120: AIOMGR: AIOMgr_OpenWithRetry: Descriptor file '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-flat.vmdk' locked (try 1)
Worker#0| I120: DISKLIB-VMFS : "/vmfs/volumes/eggroller/vmhost01/vmhost01-000001-delta.vmdk" : open successful (10) size = 1609693986816, hd = 1288801. Type 8
Worker#0| I120: DISKLIB-DSCPTR: Opened [0]: "vmhost01-000001-delta.vmdk" (0xa)
Worker#0| I120: DISKLIB-LINK : Opened '/vmfs/volumes/eggroller/vmhost01/vmhost01-000001.vmdk' (0xa): vmfsSparse, 4089446400 sectors / 1.9 TB.
Worker#1| I120: AIOGNRC: Failed to open '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-flat.vmdk' : Failed to lock the file (40003) (0x2011).
Worker#1| I120: AIOMGR: AIOMgr_OpenWithRetry: Descriptor file '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-flat.vmdk' locked (try 2)
(略)(try 0)
(略)(try 3)
(略)(try 1)
(略)(try 4)
Worker#0| I120: AIOGNRC: Failed to open '/vmfs/volumes/eggroller/vmhost01/vmhost01-flat.vmdk' : Failed to lock the file (40003) (0x2011).
Worker#0| I120: AIOMGR: AIOMgr_OpenWithRetry: Descriptor file '/vmfs/volumes/eggroller/vmhost01/vmhost01-flat.vmdk' locked (try 2)
Worker#1| I120: AIOGNRC: Failed to open '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-flat.vmdk' : Failed to lock the file (40003) (0x2011).
Worker#1| I120: OBJLIB-FILEBE : FileBEOpen: can't open '/vmfs/volumes/eggroller/vmhost01/vmhost01_1-flat.vmdk' : Failed to lock the file (262146).
Worker#1| I120: DISKLIB-VMFS : "/vmfs/volumes/eggroller/vmhost01/vmhost01_1-flat.vmdk" : failed to open (Failed to lock the file): ObjLib_Open failed. Type 3
Worker#1| I120: DISKLIB-LINK : "/vmfs/volumes/eggroller/vmhost01/vmhost01_1.vmdk" : failed to open (Failed to lock the file).
Worker#1| I120: DISKLIB-CHAIN :"/vmfs/volumes/eggroller/vmhost01/vmhost01_1-000001.vmdk": Failed to open parent "/vmfs/volumes/eggroller/vmhost01/vmhost01_1.vmdk": Failed to lock the file.
Worker#1| I120: DISKLIB-CHAIN : "/vmfs/volumes/eggroller/vmhost01/vmhost01_1.vmdk" : failed to open (The parent of this virtual disk could not be opened).
Worker#1| I120: DISKLIB-VMFS : "/vmfs/volumes/eggroller/vmhost01/vmhost01_1-000001-delta.vmdk" : closed.


## 解決辦法 ###
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004232
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2001005
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=10051
......

網路上可以找到很多的解決方案,但那個才有用?那個才能真的把系統救回來.......不知道
方法很多......你敢每個都試試看嗎?

---------------------------------------------------------------------------------------------------

當 VMDK 被 Lock 的時候,網路上直接可以找到兩種方法
方法1:刪除 .Lck 檔案 或刪除 Lck_xxxx  目錄
方法2:在 xxx.vmx 裡增加一行 disk.locking = "FALSE"

好~~這方法對我都沒用....XD

---------------------------------------------------------------------------------------------------

真正解決我的問題是靠「Investigating virtual machine file locks on ESXi/ESX (10051)
當發生此問題,會發生的事件跡象:

 

  • A virtual machine cannot power on. (有)
  • Powering on a virtual machine fails.(有)
  • Unable to power on a virtual machine.(有)
  • Adding an existing virtual machine disk (VMDK) to a virtual machine that is already powered on fails with the error: 

    Failed to add disk scsi0:1. Failed to power on scsi0:1

  • When powering on the virtual machine, you see one of these errors: 
    In the /var/log/vmkernel log file, you see entries similar to:
    • Unable to open Swap File
    • Unable to access a file since it is locked
    • Unable to access a file <filename> since it is locked(有)
    • Unable to access Virtual machine configuration



  • WARNING: World: VM xxxxxxx: Failed to open swap file <path>: Lock was not free(有)
    WARNING: World: VM xxxxxxx: Failed to initialize swap file <path>


  • When opening a console to the virtual machine, you may receive the error:

    Error connecting to <path><virtual machine>.vmx because the VMX is not started (有)

  • Powering on the virtual machine results in the power on task remaining at 95% indefinitely.(有)
  • Cannot power on the virtual machine after deploying it from a template.
  • The virtual machine reports conflicting power states between vCenter Server and the ESXi/ESX host console.
  • Attempting to view or open the .vmx file using a text editor (for example, cat or vi), reports an error similar to:

    cat: can't open '[name of vm].vmx': Invalid argument

基於這 Vmware KB 所條列的狀態,比較多符合我的情況,於是嘗試他在 quick test 的方法
Initial quick test :To get your critical virtual machine running:
1. Migrate the virtual machine to the host it was last known to be running on and attempt to power on.
2. If unsuccessful, continue to attempt a power on of the virtual machine on other hosts in the cluster. When you hit the host holding the file locks, the virtual machine should power on as the file locks in place are valid.
3. If you still cannot power on the virtual machine continue with the steps below to investigate in more detail.

使用了「Migrate the virtual machine to the host it was last known to be running on and attempt to power on. 」假設這些被 Lock 的 VMDK 檔案,是被同一 Cluster 內,其他台的 ESXi Host 給鎖了(意即:該 Guest 顯示在 ESXi A 但檔案卻是被同一 Cluster 的 ESXi B 給鎖住了),所以當 vMotion Migrate 從 A -> B 時,其 Guest OS 的紅色X標記就消失了,只花了0.X秒就移到 ESXi B 了。
這實在太不正常了,即使正常的 Migrate 也要幾秒鐘,所以再從 ESXi B 移到 ESXi C ,這次時間感覺就正常了些,再次再移到 ESXi A 又再移到 ESXi B ,我才進行開機 Power on 測試,第一次開機也在 9x% 卡住了好一段時間,幸好後續就恢復正常了。

--------

而 Investigating virtual machine file locks on ESXi/ESX (10051)」 還提供了很多方法,但系統正常後我就沒再去測試,只趕緊將資料複製出來。

--- 後記:---
我這個 Guest OS 使用了兩顆 1.8TB 的 VMDK,又在四月多的時候做了一次 Snapshot  快照,整體佔了快 7TB 的空間,單純複製主要資料區的 Flat.vmdk 跟 Delta.vmdk 就花了我一天多的時間,系統也多掛了一天多,對於 ESXi 的 snapshot 開始感到害怕。之前有一測試系統 VMDK 的 thin provision 開了 6TB 但真正只用200GB,利用 VDP 做備份(快照+複製)然後突然就掛了,連開都開不起來,不知道 Veeam 或 其他備份軟體,也會不會有這個問題??

感覺 Vmware 應該越用越開心,我則是越用越害怕,正常就沒事,但一出問題都求助無門阿....

~End

其他的錯誤畫面:解釋不出原因,就請大家自己參考吧!
Another task is already in progress. An error was received from the ESX host while powering on VM

A general system error occurred: The System returned an error. Communication with the virtual machine might have been interrupted.


 

回應
Google Search
Google
累積 | 今日
loading......
平均分數:0 顆星
投票人數:0
我要評分:
Google