[ale] Bad SATA interactions

Sun Nov 4 13:47:18 EST 2012

On 11/04/2012 12:38 PM, Michael Trausch wrote:
> So I had an interesting few days... Aside from the fact that I have been
> sick, it turns out I have had an interesting problem appear.
> 
> I changed motherboards recently, to test UEFI and so forth out. When I did
> so I started having some problems that traditionally scream "memory
> errors", except my RAM was just fine.
> 
> I hadn't immediately thought to check the drive's SMART log because I am
> used to distributions signaling via the UI when such events happen. Well,
> it turns out that Fedora doesn't do smart monitoring by default!
> 
> I had an apparently bad SATA cable (am running tests now to see if the new
> cable is actually the solution here). The symptom was UDMA CRC error counts
> through the roof, which the drive detected and then aborted the
> corresponding command.
> 
> I mention this as we recently had a thread on silent corruption.
> 
> So, to the question part: even with smartctl and friends not installed and
> running, shouldn't modern file systems be storing checksums to catch this
> sort of thing without obscure errors? I thought that ext4 had such support,
> but I would appear to be incorrect there.

Btrfs has content checksums.  Ext4 has experimental journal checksums,
but that has been the subject of recent bugs, and is not yet recommended
for production.

The key issue is that much of the efficiency gains in modern I/O systems
is based upon buffering / implicit write-back cacheing, where multiple
small writes to the same sector of a file coalesce into a single, later,
actual write.  Since many applications depend on this for performance,
it cannot be disabled by default.  Filesystems that attempt to generate
checksums between those writes must either abort them when a subsequent
write comes, or keep multiple versions of the sector in memory.  Either
way, the checksum must then be written to the inodes in a way that
synchronizes with the actual sector write itself.

Btrfs can maintain synchronization because it doesn't rewrite in
place--it always allocates new space for rewritten sectors, eventually
garbage-collecting the superceded ones.  For filesystems that rewrite
file contents in place, I haven't yet seen a solution.  I'm not entirely
sure there is one, at a usable level of performance.

Note that btrfs has a mount option, "nodatacow", to disable data
copy-on-write for performance reasons on certain applications, like
large database files.  This also disables checksums, as the FS can no
longer ensure synchronization.

HTH,

Phil