[ale] mass file modifcation

Mike Harrison meuon at geeklabs.com
Mon Mar 31 07:53:17 EDT 2008


On Sun, 30 Mar 2008, Jim Kinney wrote:

> So "cleaning up bad MS-HTML" did not include "unlink $crapfile". Your
> patience and tolerance is astounding :-)

Well, I kinda cut and pasted that together..
It's missing about 30 regex's and more.
And I like manually comparing a few before I delete stuff.
Even when I have a backup.

Including my favorite:

   $page =~ s/class=MsoNormal//g ;   # strips MSO

but I often clean up bad HTML in PHP, and love the strip_tags command
that strips out -all- html/xml except the tags you specify.

<?php
$bad = file_get_contents("bad") ;
$good = strip_tags($bad,"<b><emp><p><br>");
$fileout = fopen("good","w") ;
fputs($fileout,"$good") ;
fclose($fileout) ;
?>


Most of my variants of the above stuff the file in MySQL when done.
I like keeping users in a web interface and without file system access
of any kind.




More information about the Ale mailing list