[clue] [tech] Filesystem corruption with VMWare iSCSI initiator and block device translation

Jim Ockers ockers at ockers.net
Wed Nov 21 11:22:20 MST 2012


Hi Chris,

Chris Fedde wrote:
> I've used raw mapped LUN to a windows server guest on ESX in the past 
> but from a different SAN (netapp via fibre) with no problems.
>
> I wonder if some how the same LUN got mapped to two different drive 
> letters or to two different systems.  That's about the only way I can 
> think that there would be corruption because of this.
Well if you think that you're in very good company.  All the support 
forums and google results indicated that the only time anyone else has 
ever had NTFS corruption on a raw mapped LUN was when it was mapped to 
two systems, both of which were writing to it concurrently.  Since NTFS 
is not a concurrent filesystem this causes corruption.  I didn't see any 
indication that there was any other reason besides this that anyone 
else's NTFS filesystem on iSCSI got corrupted.

We use CHAP authentication on each iSCSI target, and each target has 
only one LUN mapped to it, and furthermore we have only one Windows 
server, so there is no way that any Windows system(s) other than our 
Windows 2003 server had the iSCSI target open for writing.

-- 
Jim Ockers, P.E., P.Eng. (ockers at ockers.net)
Contact info: http://www.ockers.net/



>
>
> On Tue, Nov 20, 2012 at 12:34 PM, Jim Ockers <ockers at ockers.net 
> <mailto:ockers at ockers.net>> wrote:
>
>     Hi CLUEbies,
>
>     We had a major filesystem corruption event and I was wondering if
>     anyone else had experienced something like this or if there is
>     some good/obvious reason why it happened.
>
>     We have a Windows 2003 (NTFS5) data volume (not the OS volume) on
>     an iSCSI target on a Linux OpenFiler, with Windows running under
>     VMWare ESXi5.  In order to give the Windows VM access to the iSCSI
>     target volume there are 3 ways to do it:
>
>        1. Boot the OS in the usual way for its VM, and use the
>           Microsoft iSCSI initiator to access the target.  The OS via
>           its own initiator finds a NTFS5 filesystem and assigns it a
>           drive letter as usual.
>        2. Configure VMWare to access the target using its iSCSI
>           initiator, and then configure the VM with the _*raw mapped
>           LUN*_ as another disk drive.  The OS finds a VMWare virtual
>           disk, and finds a NTFS5 filesystem on the disk.  VMWare
>           handles the block device translation between a virtual disk
>           and an iSCSI target, and the OS has no knowledge that the
>           actual block device is an iSCSI target.
>        3. Configure VMWare to access the target using its iSCSI
>           initiator, and mount the target as a VMWare datastore using
>           VMFS5 filesystem.  In the datastore there would be a VMWare
>           VMDK virtual disk, and the VM has this VMDK as one of its
>           disk drives.  The OS would then see a normal VMWare virtual
>           disk and has no knowledge of VMFS5 datastore or iSCSI.
>
>
>     We first tried a raw mapped LUN, and things were fine for 2 or 3
>     days and then we started getting massive NTFS data corruption, but
>     no indication was given other than Windows event viewer ntfs
>     errors.  Because the system didn't crash, it ran for over a day
>     like this, and the backups got corrupted too.  CHKDSK made matters
>     worse.  We wound up having to merge two backups together because
>     there were inconsistencies that required manual resolution. What a
>     pain.
>
>     We switched to using the Microsoft iSCSI initiator to access the
>     volume, and it's been fine for a few days now with no NTFS errors
>     or corruption or data loss that we know of.
>
>     The VMDK on VMFS5 datastore on iSCSI is also problem-free as far
>     as we can tell.
>
>     I was wondering if anyone on this list had any ideas or wild
>     speculation about why using the VMWare iSCSI initiator and giving
>     the iSCSI target to the OS as a raw mapped LUN would cause
>     filesystem corruption, whereas the other 2 options are both
>     trouble-free?  Is there some good reason why the raw mapped LUN
>     approach is not recommended?  Is it only bad for iSCSI or is it
>     also bad for fiber channel etc?
>
>     Obviously we won't be doing this again but I wish I had some good
>     reasons for why it was so problematic.
>
>     Thanks,
>     Jim
>
>     -- 
>     Jim Ockers, P.E., P.Eng. (ockers at ockers.net <mailto:ockers at ockers.net>)
>     Contact info: http://www.ockers.net/
>
>         
>
>
>     _______________________________________________
>     clue mailing list: clue at cluedenver.org <mailto:clue at cluedenver.org>
>     For information, account preferences, or to unsubscribe see:
>     http://cluedenver.org/mailman/listinfo/clue
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> clue mailing list: clue at cluedenver.org
> For information, account preferences, or to unsubscribe see:
> http://cluedenver.org/mailman/listinfo/clue

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue/attachments/20121121/d3c302e4/attachment-0001.html 


More information about the clue mailing list