[ale] dd & unexpected soft update inconsistency, HUH ?

Fri Apr 15 14:37:21 EDT 2005

On Fri, 2005-04-15 at 11:47, Greg Freemyer wrote:
> On 4/15/05, Courtney Thomas <ccthomas at joimail.com> wrote:
> > On Fri, 2005-04-15 at 08:49, Stephan Uphoff wrote:
> > > On Fri, 2005-04-15 at 09:25, Courtney Thomas wrote:
> > > > I've been using dd for years to mirror my server HD without problems.
> > > >
> > > > The command used is:        dd if=< original HD > of=< mirror HD >
> > >
> > > I don't think dd is a good idea on a life file system ....
> > > ...but you probably know this so let's ignore this for now.
> > >
> > 
> > Stephen,
> > 
> > Thank you for your help  :-)
> > 
> > I'm not doing this on a live file system. I'm booting up with a CD and
> > doing this on unmounted filesystems.
> > 
> > > > However, yesterday, for the first time, when I do this, I get the error:
> > > >
> > > >     UNEXPECTED SOFT UPDATE INCONSISTENCY
> > > >
> > > > This happens to mirror HD /usr only.
> > > > Mirror HD / and /var do not exhibit this problem.
> > > >
> > > > If I fsck the mirror HD, all is OK except mirror HD /usr in which I get
> > > > a plethora of errors complaining about soft update inconsistencies.
> > > >
> > > > If I attempt to examine mirror HD /usr with ls -l, I see there's also a
> > > > raft of BAD FILE DESCRIPTORs.
> > > >
> > > > Further, the dd completion screen shows... an error coming from the
> > > > original HD, i.e. not the mirrored HD, drive. It only reports "[original
> > > > HD] input error". But when I fsck all partitions of that original HD,
> > > > all is reported as satisfactory.
> > >
> > > Looks like you have a bad sector somewhere on the disk.
> > 
> > I agree it looks like it, maybe. But why doesn't fsck find this ?
> > 
> > > Is there something in the log file?
> > 
> > Which log file ?
> > 
> > > The /usr partition is probably only partially copied.
> > 
> > You are correct on this too.
> > >
> > > You can try the following:
> > > 1) tar up /usr so that all used data blocks will be read.
> > >    This may indicate an unreadable file .. or you may be lucky and the
> > > bad sector is in unused space.
> > 
> > tar needs a mounted filesystem right ? [I don't use tar.]

Yes

> > 
> > > 2) Locate the defect sector (dd to /dev/null with offset,counts...)
> > 
> > > 3) Write zeroes to the defect sector to "repair" it and fsck..
> > 
> > I understand the writing zeroes to the bad sector using an offset, but
> > how do I exactly determine how many zeroes to write ?

Mhhh... doesn't dd tell you how many bytes it copied?
Then with a block size of 512 bytes and a block count of one you can use
iseek to try to copy sector by sector.

> > Also, I assume you recall that fsck gives no error message now.

fsck only reads meta data - it does not try to read the actual file
data. If you are really,really lucky the bad sector is even unallocated.

> > > 4) Restore the file that was not readable in 1)
> > >
> > > I believe that there are disk repair tools in the ports tree but never
> > > had the need to try them.
> > >
> > > > What's goin' on here and how can I remedy it. This is my gateway server
> > > > and I urgently need to resolve this.
> > > >
> > > > Appreciatively,
> > > >
> > > > Courtney
> > > >
> 
> Basically it sounds like your disk crapped out.
> 
> Trying to salvage the old disk itself is likely a waste of time and
> will lead to future problems.

Yes and no.
A single read error does not mean that a disk is crapping out.
On most IDE disks the specs rate one bit error in 10^14 bits. (SCSI is
usual 10^15)
A power failure while writing a sector will also destroy a sector.
I believe 20% of disks returned to some manufacturers are just send out
again after testing without any repair being done.  

> I know you were trying to make a backup, but do have a recent one you
> could use?
> 
> If you do have to disk recovery of the data, I would reperform your dd as:
> 
> dd if=/dev/orig of=/dev/target conv=noerror,sync
> 
> Then run fsck etc. on the new disk.  Or better yet on a cc copy of the
> new disk.  ie. Save away the dd image for repeated recovery attempts.

I totally agree.

> In the above, noerror says to continue copying even in the presence of
> errors.  sync says to fill the failed read blocks with zeros. 
> (Default behavior is to skip the block.  Very bad if you need to
> reconstruct the filesytem because the offsets will be wrong.)
> 
> You can also try dd_repair (I think).  I have not used it, but it is
> designed to get data off of a failing disk.
> 
> Greg