[ale] Network issue that makes absolutely NO sense

James P. Kinney III jkinney at localnetsolutions.com
Fri May 12 08:07:17 EDT 2006


Try setting the 100Mbps NIC to FD and all the others to HD. That will at
least allow the 100Mbps stuff to run full speed and the HD on the gigabit
NICs will slow them down and likely not over run the Oracle box.

The switch _should_ support store and forward so it can operate in dual
100/1000 mode. It sounds like it doesn't. :\  Or it may be a switch config
issue. If you can do config on a port-by-port basis, at the switch set the
port to 100Mbps-FD. This will allow the switch to transmit "slow down" to
the other ports.

Another issue to check is the MTU of the Gbit NICs. Many are set to a much
larger value than 1500 for better LAN throughput. If they are too high for
the Oracle box, you may a massive fragmentation problem that is hosing the
backup process.

If possible, get a third box with a Gbit NIC and run tcpdump on the line
and see what's happening. A Gbit _hub_ would be good but...

If you are careful, a Cat5e socket can be inserted in a patch cable (think
cat5 vampire tap:). By slitting the sheath and exposing the wires and the
CAREFULLY using a punchdown tool with no cutter, you can add a "sniffer"
socket to the patch cable. Set up the sniffer NIC on the same network and
run tcpdump.



> The first thing I did when I got the switch was update the firmware to
> 4.0.3.15.  However, it seems this is still an issue within the firmware.
>
> In this case all three servers are in the same VLAN.  Two have gigabit
> NICs
> and one (the Oracle server) does not.  Setting everything to 100BaseTx-FD
> and forcing the ports on the switch to 100FD has allowed the jobs to run
> simultaneously however I continuously lose connectivity to the NAS from
> both
> boxes that are writing to it at this time.  Since the NFS file systems are
> hard mounted they keep going once the NAS can be contacted again so at
> least
> I am not losing data; it is just painfully slow to backup.
>
> -Ryan
>
>
> -----Original Message-----
> From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of Jerry
> Yu
> Sent: Thursday, May 11, 2006 11:35 PM
> To: Atlanta Linux Enthusiasts
> Subject: Re: [ale] Network issue that makes absolutely NO sense
>
> wonder what version of fimware you have on that switch.  One of the
> bug fixed in a newer firmware indicates the the switch itself can't
> handle well if the speed varies on different ports of the switch, with
> the initial version 4.0.3.7.
>
> Info can be found at http://kbserver.netgear.com/products/GSM7248.asp
>
> Firmware Version 4.0.3.15       Published Feb. 10, 2006
> Fixes
> 1. Fixed: Switch may crash and reboot if packets are forwarded from
> lower 24 ports (1 to 24) to higher 24 ports (25-48) where the port
> speeds are different, e.g., if one port is using 10M half duplex, and
> the other is using 1000M bps full duplex.
>
>
>
> On 5/11/06, Jerry Yu <jjj863 at gmail.com> wrote:
>>
>> am not familiar with that model. a rule of thumb is to set both switch
>> and
> nic to 'auto_negotiation' such that the best common speed/dupex combo can
> be
> automatically negotiated when the link comes up between the nic and the
> switch port. short of that,  one'd force a combo of speed/duplex on both
> ends. The bottom line is that both ends should be in sync in terms of
> auto_negotiation (or manual/forced), speed (1000 or 100 or 10), and duplex
> (full or half).
>>
>>
>>
>> On 5/11/06, Ryan Fish <FishR at bellsouth.net> wrote:
>> >
>> >
>> >
>> >
>> >
>> > The switch is a Netgear 7248.  It shows me that everything is in FD on
> all used ports (although I had to force some of them to FD even though the
> NICs are set at FD).
>> >
>> >
>> >
>> > My next step is going to be matching the NICs and switch ports at
>> 100FD
> because the NIC in the Oracle box can only go that fast.  I just have to
> wait for a job to complete.
>> >
>> >
>> >
>> > Thank you.
>> >
>> > -Ryan
>> >
>> >
>> >
>> >   ________________________________
>
>> >
>> > From: ale-bounces at ale.org [mailto: ale-bounces at ale.org] On Behalf Of
> Jerry Yu
>> >  Sent: Thursday, May 11, 2006 9:31 PM
>> >  To: Atlanta Linux Enthusiasts
>> >  Subject: Re: [ale] Network issue that makes absolutely NO sense
>> >
>> >
>> >
>> >
>> > to have a nice smooth communication,
>> >  1) nic and the switch port it connects to should have matching
> speed/duplex/auto_neg|manual
>> >  2) two-end points of a switched communication should have matching
> speed and duplex.
>> >
>> >  what's the model of the new switch?  You may find
>> speed/duplex/auto_neg
> settings per port on the switch itself.
>> >
>> >
>> >
>> >
>> > On 5/11/06, Ryan Fish < FishR at bellsouth.net> wrote:
>> >
>> >
>> >
>> >
>> > A bit more info that may be helpful:
>> >
>> >
>> >
>> > - The Oracle server only fails because it is unable to read from the
> NAS.  This causes the IOWait on the processors to hit the high 90% range
> and
> stay there until the box eventually is too busy to respond to requests
> from
> the application that uses it.
>
>> >
>> >
>> >
>> > Is there some way to test if a switch is truly using Full Duplex on a
> port?
>> >
>> > Does it make any difference if the NIC in the Oracle server is set to
> 100FD (the highest it can go) and the NIC on the server running the other
> backup scripts is set to 1000FD?  The NAS is set to 1000FD.  Is there
> something in the way 100FD and 1000FD work that keeps them from being able
> to truly work together properly?
>> >
>> >
>> >
>> > Thank you again.
>> >
>> > -Ryan
>> >
>> >
>> >
>> >   ________________________________
>
>> >
>> > From:  ale-bounces at ale.org [mailto: ale-bounces at ale.org] On Behalf Of
> Ryan Fish
>> >  Sent: Thursday, May 11, 2006 8:53 PM
>> >  To: 'Atlanta Linux Enthusiasts'
>> >  Subject: [ale] Network issue that makes absolutely NO sense
>> >
>> >
>> >
>> >
>> > I have found the following issue with two different backup processes
> after putting a new switch in place within the network:
>> >
>> >
>> >
>> > 1) RHEL3 AS/Oracle 9i server using RMAN and Export for backups.
>> >
>> >     - As long as the NIC on the NAS device to where all backup
> information is written is set to 100FD the backup processes will run as
> per
> normal and all is well.  Once the NIC on the NAS is set to 1000FD the
> backups fail because the Oracle server is unable to connect to the NAS
> device over the NFS mount.
>> >
>> >
>> >
>> > 2) RHEL3 ES server running multiple bash scripts to back up portions
>> of
> almost every other box in the same network.  The backup scripts run fine
> when the NIC on the NAS is set to 1000FD but fail when I set it to 100FD.
>> >
>> >
>> >
>> > Prior to replacing the failed switch this was never an issue as all
> backups ran fine every night with the exception of one that ran fine most
> times.  Only the switch was swapped out did this network strangeness
> occur.
>> >
>> >
>> >
>> > What could/would cause this?
>> >
>> > Why would it matter when speed the NIC on the NAS is set to for
> particular backup processes to function properly?
>> >
>> > Is there anywhere within the RMAN and/or Export processes that the NIC
> speed on the receiving end could or would be hard coded to only accept
> 100FD?  If so, why?
>> >
>> >
>> >
>> > I am at a complete loss here and have been fighting this for two weeks
> already so any help will be greatly appreciated.
>> >
>> >
>> >
>> > Thank you.
>> >
>> > -Ryan
>> >
>> >
>> >  _______________________________________________
>> >  Ale mailing list
>> >  Ale at ale.org
>> >  http://www.ale.org/mailman/listinfo/ale
>> >
>> >
>> >
>> > _______________________________________________
>> > Ale mailing list
>> > Ale at ale.org
>> >  http://www.ale.org/mailman/listinfo/ale
>> >
>> >
>>
>>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
>




More information about the Ale mailing list