[ale] Re: Hardware diagnostics

Greg Freemyer greg.freemyer at gmail.com
Fri Jan 7 11:59:08 EST 2005


memtest did show that one bit was occasionally being set high when it
should not be.

Unfortunately, swapping out ram did not help the problem.

I've decided to replace the motherboard/cpu (an older P3/700) with newer stuff.

Greg


On Thu, 6 Jan 2005 13:58:49 -0500, Greg Freemyer
<greg.freemyer at gmail.com> wrote:
> I have a machine that is introducing occasional data corruption.
> 
> For instance, I just copied 200 GB between two drives (one 3ware
> raid0, one 250GB PATA).
> 
> I then verified the data was the same using cmp --verbose.
> 
> It found 2 one-byte differences, so I'm having 1 byte/100 GB of corruption.
> 
> Assuming 'cmp --verbose' outputs octal, both bytes had a single bit
> set in the copy that was not set in the original.
> 
> i.e. 20 --> 220   and   0 --> 200
> 
> I know I need to run a memchk, but are there any other diagnostics I
> could run to try to figure out what hardware is bad?
> 
> I'm also wondering if I need to byte the bullet and replace the
> motherboard with one that has ECC RAM.
> 
> FYI: I'm pretty sure it is not the 3ware card (I've had corruption
> when none of the data was on disks controlled by that card.)  I've
> also already changed out the ATA controller.
> 
> Thanks
> Greg
> --
> Greg Freemyer
>



More information about the Ale mailing list