[ale] mass file modifcation
Mike Harrison
meuon at geeklabs.com
Mon Mar 31 07:53:17 EDT 2008
On Sun, 30 Mar 2008, Jim Kinney wrote:
> So "cleaning up bad MS-HTML" did not include "unlink $crapfile". Your
> patience and tolerance is astounding :-)
Well, I kinda cut and pasted that together..
It's missing about 30 regex's and more.
And I like manually comparing a few before I delete stuff.
Even when I have a backup.
Including my favorite:
$page =~ s/class=MsoNormal//g ; # strips MSO
but I often clean up bad HTML in PHP, and love the strip_tags command
that strips out -all- html/xml except the tags you specify.
<?php
$bad = file_get_contents("bad") ;
$good = strip_tags($bad,"<b><emp><p><br>");
$fileout = fopen("good","w") ;
fputs($fileout,"$good") ;
fclose($fileout) ;
?>
Most of my variants of the above stuff the file in MySQL when done.
I like keeping users in a web interface and without file system access
of any kind.
More information about the Ale
mailing list