[ale] very intermittent weird SCP failure?

Neal Rhodes neal at mnopltd.com
Fri Aug 11 14:04:58 EDT 2017


Thanks for the reply.   No, no chance of running out of space. 

Yes, if I could find the original scp invokation, I could save the
return code, assuming scp posts a different return when it doesn't
succeed.  And as you note, I could run a sum or md5sum and send that
along.    I've got host-ssh equivalence,  so I could do some comparison.
Although as Jim noted, you can only chase squirrels so hard and so
long. 

On Mon, 2017-08-07 at 16:48 +0000, Lightner, Jeffrey wrote:
> Any chance the filesystem on the backend server has gone temporarily
> full around the time you do the scp’s that have the issue?  You’d see
> it in /var/log/messages if it happened.   
> 
>  
> 
> One thing you could do to avoid finding this later and manually
> resending is put your scp inside a script and before the scp run
> md5sum on the file that is being sent then after the scp run md5sum
> (via ssh) on the backend server and compare the values.  If they’re
> not the same have the script resend and check the md5sum again.   You
> could try it multiple times (e.g. 5 with appropriate pauses between
> attempts) then have it send email to you on last failed attempt.
> 
>  
> 
> We use sftp/scp fairly heavily here on RHEL5/RHEL6 and I’ll have to
> say I’ve never run into much trouble with the actual file transfers.
> Having said that I will say I have a preference for sftp usually and
> it may have its own built in retransmit of packets like the old ftp
> did.   I’m not sure scp does that.
> 
>  
> 
>  
> 
> 
> From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of
> Neal Rhodes
> Sent: Monday, August 07, 2017 12:08 PM
> To: Atlanta Linux Enthusiasts
> Subject: [ale] very intermittent weird SCP failure?
> 
> 
> 
>  
> 
> We have a client running a ----------------------------- application
> on three linux servers running 
> 
> Linux HDISATBE3 2.6.32-696.1.1.el6.x86_64 #1 SMP Tue Apr 11 17:13:24
> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> 
> The two primary event servers accumulate a file of
> connection/heartbeat activity, and once a week, a crontab job does an
> SCP of this file to the backend server, which imports these two files
> to calculate uptime.   The respective user has ssh host equivalence,
> so this proceeds without password challenge. 
> 
> This has worked for about 10 years. 
> 
> Very occasionally, like, once in 3 months, we will find the copied
> file on the backend server is garbled, to wit: 
> 
> - total size is correct; matches source
> - the first XXX bytes of the file is NULL characters. 
> 
> Which hoses up everything.   I usually figure out which file is
> boogered, re-do the scp by hand, and re-do the import.  Then all is
> well. 
> 
> I am just bumfuzzled as to what would cause this.   It's always on the
> front of the file. 
> 
> I should check and see exactly how many NULLS, but usually when it
> happens my hair is on fire.   I'm guessing about 512 or 1K. 
> 
> Neal Rhodes
> MNOP Ltd
> 
> 
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20170811/4685e4d2/attachment.html>


More information about the Ale mailing list