[ale] dd & unexpected soft update inconsistency, HUH ?

Fri Apr 15 19:09:50 EDT 2005

Stephen,

I'm now running the mirror after using the conv=noerror,sync option of
dd. Fsck was fine on the mirror.

But I don't know what I've lost.

I guess I'd need to run dd again and save the onscreen results to a file
to be able to address the bad sectors. But even knowing which sectors,
how do I know which 'files' ? I've never had to dig in the bowels to
this extent. Interesting but a bit hairy in that my main server is at
risk.

One thing I did notice is that the mirror is UDMA 100 and the original
is UDMA33. Another possibility is improper cables to the controller, I
guess.

Cordially,

Courtney

On Fri, 2005-04-15 at 13:20, Stephan Uphoff wrote:
> On Fri, 2005-04-15 at 11:47, Greg Freemyer wrote:
> > On 4/15/05, Courtney Thomas <ccthomas at joimail.com> wrote:
> > > On Fri, 2005-04-15 at 08:49, Stephan Uphoff wrote:
> > > > On Fri, 2005-04-15 at 09:25, Courtney Thomas wrote:
> > > > > I've been using dd for years to mirror my server HD without problems.
> > > > >
> > > > > The command used is:        dd if=< original HD > of=< mirror HD >
> > > >
> > > > I don't think dd is a good idea on a life file system ....
> > > > ...but you probably know this so let's ignore this for now.
> > > >
> > > 
> > > Stephen,
> > > 
> > > Thank you for your help  :-)
> > > 
> > > I'm not doing this on a live file system. I'm booting up with a CD and
> > > doing this on unmounted filesystems.
> > > 
> > > > > However, yesterday, for the first time, when I do this, I get the error:
> > > > >
> > > > >     UNEXPECTED SOFT UPDATE INCONSISTENCY
> > > > >
> > > > > This happens to mirror HD /usr only.
> > > > > Mirror HD / and /var do not exhibit this problem.
> > > > >
> > > > > If I fsck the mirror HD, all is OK except mirror HD /usr in which I get
> > > > > a plethora of errors complaining about soft update inconsistencies.
> > > > >
> > > > > If I attempt to examine mirror HD /usr with ls -l, I see there's also a
> > > > > raft of BAD FILE DESCRIPTORs.
> > > > >
> > > > > Further, the dd completion screen shows... an error coming from the
> > > > > original HD, i.e. not the mirrored HD, drive. It only reports "[original
> > > > > HD] input error". But when I fsck all partitions of that original HD,
> > > > > all is reported as satisfactory.
> > > >
> > > > Looks like you have a bad sector somewhere on the disk.
> > > 
> > > I agree it looks like it, maybe. But why doesn't fsck find this ?
> > > 
> > > > Is there something in the log file?
> > > 
> > > Which log file ?
> > > 
> > > > The /usr partition is probably only partially copied.
> > > 
> > > You are correct on this too.
> > > >
> > > > You can try the following:
> > > > 1) tar up /usr so that all used data blocks will be read.
> > > >    This may indicate an unreadable file .. or you may be lucky and the
> > > > bad sector is in unused space.
> > > 
> > > tar needs a mounted filesystem right ? [I don't use tar.]
> 
> Yes
> 
> > > 
> > > > 2) Locate the defect sector (dd to /dev/null with offset,counts...)
> > > 
> > > > 3) Write zeroes to the defect sector to "repair" it and fsck..
> > > 
> > > I understand the writing zeroes to the bad sector using an offset, but
> > > how do I exactly determine how many zeroes to write ?
> 
> Mhhh... doesn't dd tell you how many bytes it copied?
> Then with a block size of 512 bytes and a block count of one you can use
> iseek to try to copy sector by sector.
> 
>  
> > > Also, I assume you recall that fsck gives no error message now.
> 
> fsck only reads meta data - it does not try to read the actual file
> data. If you are really,really lucky the bad sector is even unallocated.
> 
> > > > 4) Restore the file that was not readable in 1)
> > > >
> > > > I believe that there are disk repair tools in the ports tree but never
> > > > had the need to try them.
> > > >
> > > > > What's goin' on here and how can I remedy it. This is my gateway server
> > > > > and I urgently need to resolve this.
> > > > >
> > > > > Appreciatively,
> > > > >
> > > > > Courtney
> > > > >
> > 
> > Basically it sounds like your disk crapped out.
> > 
> > Trying to salvage the old disk itself is likely a waste of time and
> > will lead to future problems.
> 
> Yes and no.
> A single read error does not mean that a disk is crapping out.
> On most IDE disks the specs rate one bit error in 10^14 bits. (SCSI is
> usual 10^15)
> A power failure while writing a sector will also destroy a sector.
> I believe 20% of disks returned to some manufacturers are just send out
> again after testing without any repair being done.  
> 
> 
> > I know you were trying to make a backup, but do have a recent one you
> > could use?
> > 
> > If you do have to disk recovery of the data, I would reperform your dd as:
> > 
> > dd if=/dev/orig of=/dev/target conv=noerror,sync
> > 
> > Then run fsck etc. on the new disk.  Or better yet on a cc copy of the
> > new disk.  ie. Save away the dd image for repeated recovery attempts.
> 
> I totally agree.
> 
> > In the above, noerror says to continue copying even in the presence of
> > errors.  sync says to fill the failed read blocks with zeros. 
> > (Default behavior is to skip the block.  Very bad if you need to
> > reconstruct the filesytem because the offsets will be wrong.)
> > 
> > You can also try dd_repair (I think).  I have not used it, but it is
> > designed to get data off of a failing disk.
> > 
> > Greg
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale