[ale] fsck opinions

Michael H. Warfield mhw at WittsEnd.com
Thu Feb 24 11:36:15 EST 2011


On Thu, 2011-02-24 at 10:11 -0500, Lightner, Jeff wrote: 
> Since ext3 is a journaled filesystem is there any reason to continue
> doing automatic fscks of filesystems on boot?  If so why?

Yes.  Invariably some corruption creeps in and that cleans it up.  Even
a file system as simple as the old FAT file system can have that happen.

> RHEL by default does fsck if rebooted if one hasn't been done in some
> number of days (this is configurable).   However, if it does this on a
> system with lots of large filesystems it can take hours to boot.   This
> causes complaints in the event of an unexpected server boot because we
> let it run.

That's a tunable parameter in the e2 superblock.

A little excerpt from "dumpe2fs -h {dev}":

Filesystem created:       Thu Dec 17 11:52:58 2009
Last mount time:          Wed Feb 23 18:32:02 2011
Last write time:          Wed Feb 23 18:32:02 2011
Mount count:              367
Maximum mount count:      -1
Last checked:             Thu Dec 17 11:52:58 2009
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)

I think Check interval is what you are interested in (this is all turned
off) and you used tune2fs to change it.

> Last night I saw an issue with "sleeping on disk" on a process so of
> course it can't be killed and the filesystems can't be unmounted.   I
> think (but am not sure) that this is the only server where we disabled
> the fsck.   I'm going to check into that but before I push back on
> whether fsck is needed I'm just wondering what others think.

Yeah, you better run it.  If you shut that file system down unclean,
it's going to have to happen anyways.  With ext3, if it can recover the
journal, that can be real quick even with large file systems.

There was a system people were playing with a few years back that took
advantage of LVM snapshots to do fsck.  On the running system, you would
take an LVM snapshot and fsck the snapshot.  If it was clean, you
updated the superblock with a new "Last checked" time and continued on.
If it wasn't clean, you notified the operator.  Either way, you released
the snapshot and continued on.  You just had to leave enough space to
accumulate all the changes while the fsck was running.  I think Ted Tso
was involved in some of that but I haven't seen any discussion of it in
ages.

> My co-worker says he was told in training he took that it isn't
> necessary.   Long ago I was told similar information for the Veritas
> (VxFS) journaled filesystem used for HP-UX and Solaris 

Your co-worker may find himself disasterously wrong some date.  That's
something I would not want to discover the hard way.  I have had ext3
file systems turn up with problems including some that required manual
fsck.  It does happen.  Not often.  Damn rare, in fact.  But it does
happen.

> 
> 
> ________________________________________________________________________
> __________________
> 
> Jeff Lightner | UNIX/Linux Administrator | DS Waters of America, Inc |
> 5660 New Northside Drive, Ste 250 | Atlanta, GA 30328 
> *: (Direct Dial) 678-486-3516 |*: (Cell) 678-772-0018 |
> *:jlightner at water.com
>  
> Proud partner. Susan G. Komen for the Cure.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
Url : http://mail.ale.org/pipermail/ale/attachments/20110224/59a57ca2/attachment-0001.bin 


More information about the Ale mailing list