[clue] [tech] Filesystem corruption with VMWare iSCSI initiator and block device translation

Chris Fedde chris at fedde.us
Tue Nov 20 22:58:57 MST 2012


I've used raw mapped LUN to a windows server guest on ESX in the past but
from a different SAN (netapp via fibre) with no problems.

I wonder if some how the same LUN got mapped to two different drive letters
or to two different systems.  That's about the only way I can think that
there would be corruption because of this.


On Tue, Nov 20, 2012 at 12:34 PM, Jim Ockers <ockers at ockers.net> wrote:

> **
> Hi CLUEbies,
>
> We had a major filesystem corruption event and I was wondering if anyone
> else had experienced something like this or if there is some good/obvious
> reason why it happened.
>
> We have a Windows 2003 (NTFS5) data volume (not the OS volume) on an iSCSI
> target on a Linux OpenFiler, with Windows running under VMWare ESXi5.  In
> order to give the Windows VM access to the iSCSI target volume there are 3
> ways to do it:
>
>    1. Boot the OS in the usual way for its VM, and use the Microsoft
>    iSCSI initiator to access the target.  The OS via its own initiator finds a
>    NTFS5 filesystem and assigns it a drive letter as usual.
>     2. Configure VMWare to access the target using its iSCSI initiator,
>    and then configure the VM with the *raw mapped LUN* as another disk
>    drive.  The OS finds a VMWare virtual disk, and finds a NTFS5 filesystem on
>    the disk.  VMWare handles the block device translation between a virtual
>    disk and an iSCSI target, and the OS has no knowledge that the actual block
>    device is an iSCSI target.
>     3. Configure VMWare to access the target using its iSCSI initiator,
>    and mount the target as a VMWare datastore using VMFS5 filesystem.  In the
>    datastore there would be a VMWare VMDK virtual disk, and the VM has this
>    VMDK as one of its disk drives.  The OS would then see a normal VMWare
>    virtual disk and has no knowledge of VMFS5 datastore or iSCSI.
>
>
> We first tried a raw mapped LUN, and things were fine for 2 or 3 days and
> then we started getting massive NTFS data corruption, but no indication was
> given other than Windows event viewer ntfs errors.  Because the system
> didn't crash, it ran for over a day like this, and the backups got
> corrupted too.  CHKDSK made matters worse.  We wound up having to merge two
> backups together because there were inconsistencies that required manual
> resolution. What a pain.
>
> We switched to using the Microsoft iSCSI initiator to access the volume,
> and it's been fine for a few days now with no NTFS errors or corruption or
> data loss that we know of.
>
> The VMDK on VMFS5 datastore on iSCSI is also problem-free as far as we can
> tell.
>
> I was wondering if anyone on this list had any ideas or wild speculation
> about why using the VMWare iSCSI initiator and giving the iSCSI target to
> the OS as a raw mapped LUN would cause filesystem corruption, whereas the
> other 2 options are both trouble-free?  Is there some good reason why the
> raw mapped LUN approach is not recommended?  Is it only bad for iSCSI or is
> it also bad for fiber channel etc?
>
> Obviously we won't be doing this again but I wish I had some good reasons
> for why it was so problematic.
>
> Thanks,
> Jim
>
> --
> Jim Ockers, P.E., P.Eng. (ockers at ockers.net)
> Contact info: http://www.ockers.net/
>
>
> _______________________________________________
> clue mailing list: clue at cluedenver.org
> For information, account preferences, or to unsubscribe see:
> http://cluedenver.org/mailman/listinfo/clue
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue/attachments/20121120/23a6fb9d/attachment.html 


More information about the clue mailing list