[ale] final plea for help (more kernel panic info)

James P. Kinney III jkinney at localnetsolutions.com
Sat Jan 4 12:29:59 EST 2003


Hmm. This may take a while. :)

Chips are made by stacking layers of type A onto layers of type B. Then
a mask is applied and the top layer is etched off. Then another layer of
type C is applied to fill in the etched areas. Finally, the Layer C on
top of layer B, but above the fill layer is etched off. Sometimes a top
inert layer is added. Somewhere in this process, a micro wire is bonded
to the appropriate layer at the appropriate point. This bonding is VERY
fragile (I've done them on some custom chips). The dangling end of the
wire is bonded to the chip carrier pin.

Now the etchings are not perfect. Ideally they have vertical walls. In
reality, they have sloped walls. Sometimes, an area of the etching
doesn't proceed at the same rate as other areas. This area as a
deficiency of the fill layer C material. This results in a deviation
from the designed electrical characteristics.  If the C material is
conductive, the bad area is not conductive enough so localized heating
occurs. Heat causes an increase in resistance for most materials (tends
to scatter the electron flow instead of allowing a linear march).
Resistance causes heat.  If the type C material is resistive, it would
normally be acting as a buffer material. If the buffer material is weak,
it allows an excess of current to flow. This causes heat in adjacent
areas, namely the receiving bucket for the current. As the bucket heats,
it's resistance increases thus raising the heat, ad infinitum. The thin
buffer keeps dumping more current that designed into the bucket.

Another effect that is known to occur is current-induced aging of chip
etchings. Just as a river current can make the banks of the river erode
over time, an electrical current can erode the walls of the etched areas
over time. Atoms of the wall material can become detached from the wall
and can migrate withing the type C material of the fill. They can then
reattach to other areas "downstream" and cause electrical changes in the
device. This "peninsula" now acts as a "magnet" for other migratory
atoms of wall material. Heat advances this process by lowering the
migration threshold energy of the wall material.

Conversely, the fill material C can "include" into the wall, or form
pockets. Think teeth and cavities here. 

All of this depends on the material involved, the temperatures of all
the components, the quality of the device manufacturing, etc.

As CPU's and RAM gets faster, the board chipsets must get faster as
well. I have seen cooling devices on motherboard chipsets recently. Some
of the SGI hardware I have worked on has such devices are long ago as
1995. Heat is the principle destructor of modern electronics. 

Cosmic rays can also destroy devices. If they disintegrate on or near
the device, the output particles can physically dislocate material from
chips, rapidly overheat components to destruction, or generate a current
rush that destroys components. That's why the case mods that cut a large
hole that gets filled with plastic is a bad idea. 

Those holes also ruin the FCC class-B status for electromagnetic
radiation output. That metal case serves to shield the world outside the
box from the frequencies inside the box. A 2.2 GHz CPU outputs microwave
radiation. 

Sometime I want to see if I can detect radiation induced interference by
placing two different frequencies of CPUs in close proximity. It may
prove difficult as the heat sink will tend to block all but the edges. 


On Sat, 2003-01-04 at 11:56, John Wells wrote:
> So you're implying that her chipset is overheating somehow?  What could
> fail in a chip to cause it to overheat in this manner?  Broken paths that
> lead to a build up of electrons (like a clog in a pipe)?  I'm not an EE,
> so humor me ;-).
> 
> Thanks,
> 
> John
> 
> James P. Kinney III said:
> > Also, chips are temperature sensitive. As the temp rises, the elctrons
> > available to due the designed task increases until the average electron
> > is above the barrier energy and ,WHOOSH, a cascade of current happens
> > that makes no sense to the receiving buckets expecting only a few
> > million per second. So it sends a WTF?! and that get amplified until the
> > only data on the output pin is a very loud WTF?!?!?! And the system
> > panics and dies.
> >
> > Spot cooling on a chip that is near the temperature failure point will
> > avoid that problem. Likewise, a localized heat source (tube within a
> > tube pumping near boiling water) is a great way to tip a chip over the
> > edge while keeping the rest of the board at operating temps.
> >
> >
> > On Sat, 2003-01-04 at 11:17, Doug McNash wrote:
> >> It's a hardware thing.  Heat up the components and they
> >> expand, cool down and they contract (ever so little) but
> >> enough to short or open a circuit on a marginal solder
> >> joint or internal chip connection.  The problem statement
> >> describes 10-30 min of running time before failure.  So
> >> some part of the system failing when it gets warm.  The
> >> hair dryer just speeds the process and lets one isolate
> >> the component.
> >>
> >> >Where the hell do you come up with these ideas?  Is there
> >> >some sort of
> >> >"Home Remedies for the PC" book I've overlooked?  ;-p
> >> >
> >> >John
> >> >
> >> >> Final test, get a hair
> >> >> dryer and a can of compressed air. with the box running
> >> >>, warm the board
> >> >> from the back until it dies, reboot, use the air cans to
> >> >>cool the
> >> >> chipset and test again. If cooling the chipset with a
> >> >>warm board
> >> >> otherwise runs well, the chipset is bad.
> >>
> >> --
> >> Doug McNash
> >> dmcnash at smyrnacable.net
> >> _______________________________________________
> >> Ale mailing list
> >> Ale at ale.org
> >> http://www.ale.org/mailman/listinfo/ale
> > --
> > James P. Kinney III   \Changing the mobile computing world/
> > President and CEO      \          one Linux user         /
> > Local Net Solutions,LLC \           at a time.          /
> > 770-493-8244             \.___________________________./
> >
> > GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
> > <jkinney at localnetsolutions.com> Fingerprint = 3C9E 6366 54FC A3FE BA4D
> > 0659 6190 ADC3 829C 6CA7
> 
> 
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
-- 
James P. Kinney III   \Changing the mobile computing world/
President and CEO      \          one Linux user         /
Local Net Solutions,LLC \           at a time.          /
770-493-8244             \.___________________________./

GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics) <jkinney at localnetsolutions.com>
Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C 6CA7 



 This is a digitally signed message part




More information about the Ale mailing list