Forum Discussion

Bill_Splittgerb's avatar
5 years ago

VMWARE: SNAPSHOT LOCKING - CONSOLIDATION FAILURE

VMWARE ESXi can perform snapshots on Virtual Machines which create the effect of a point in time (freezing) of a virtual disk. The new disk (delta disk) is created to continue to record changes in the I/O since the snapshot was created. When a snapshot is deleted, then all the changes are written down into the earlier disk. If you continue to delete snapshots until there are no snapshots then all the changes will be written finally down into the base vmdk disk. The delta disks are removed after this process which is called "Consolidation" of the delta disks. Due to situation in VMWARE ESXi where the files are locked, then the consolidation process will not occur or be interrupted by the condition. This will abort the process, and vm will continue to use the delta disk, and back out the changes or not commit them to the disk earlier. This will then create a snapshot chain that continues to grow. In our situation we use a Backup Solution that utilizes Snapshot Technology to freeze (Quiesce) the operating system and take a recoverable backup. Backups are taken every 15 minutes.

In this scenario we have reach a snapshot situation where the disk chain is 255 Delta disks in length. There are no Snapshots in the GUI (LogicMonitor can see these). However, LogicMonitor can not see the delta disks. In theory, when a snapshot is taken, it will create a delta disk named "vmdiskname-000001.vmdk", and if a second snapshot is taken it will create a delta disk called "vmdiskname-000002.vmdk". When you remove or delete the snapshots, and the process completes, then the delta disks no longer exist.

We need to see if the number of delta disks exceeds the number of snapshots in the GUI. If it does, then we have to repair the VM Manually.

This situation is very dangerous for an ESX Host, because with a disk chain at around 255 Delta Disks, you will start to see very strange Delay Behaviors. The ESX has to track all the data through the Chain. Behaviors include VM Freezing, and then releasing, and then speeds up  and processes faster than normally until all the data has been processed. Then it will freeze again. The more delta disks the worst the decay in performance for the ESX host.