To: Jerry Yu... Re: [ale] 2 Perl questions ?

Benjamin Scherrey scherrey at proteus-tech.com
Wed Dec 15 13:44:41 EST 2004


Been a long time since I used perl extensively enough to need sorted container types so you'll have 
to translate from my C++ terminology...

Couple of issues to keep in mind. Its important to address the file sizes as block sizes (which is 
dependent on the block size of your destination media, which may be different than the block size 
for your source media) or else you'll find your 650MB chunk is actually taking up 687MB (or 
somesuch) when you write it out.

I would store the file list in a tree structure that supports duplicate keys, keyed by block size - a 
multi_map< unsigned long long, filedesc, greater< unsigned long long > > for example. This puts the 
largest chucks in front. Write out the biggest chunk you can then best fit the remaining pieces. 
Here's some pseudo-code to give you an idea...

typedef unsigned long long block_count;
typedef std::multi_map< block_count, filedesc, greater< block_count > > filetree;

filetree source; // your populated list of files.

block_count remaining = BLANK_MEDIA_SIZE;

file = source.begin(); // syntax isn't correct for  a multi_map but you understand
    
while( source.size() )  // keep going while there are still items in our list.
{
    if( file.size > remaining ) // is the file too big to fit on single (or remaining) media?
    {
        // deal with breaking the file apart immediately so backup will be contiguous.
        // write out each chunk until you have a chunk smaller than BLANK_MEDIA_SIZE.
        do
        {
            destination.write( file.get_chunk( remaining ) );
            change_media();
            remaining = BLANK_MEDIA_SIZE;
        }
        while( file.size_left_unwritten() >= remaining )  // still got enough left to fill current media
    }

    if( file.size_left_unwritten() > 0 )
    {
        remaining -= file.size_left_unwritten(); 
        destination.write( file );  // write out whatever remains of the file
    }

    source.erase( file );       // again - pseudo syntax here but take it out of the list.
        
    // Find the next file with a size closest to but less than or equal to the remaining size.
    file = source.lower_bounds( remaining );
         
    if( !file ) 
    {
         // If we didn't find a small enough file just grab the biggest one from the front.
         file = source.begin();
    }

    if( file.size() > remaining && source.sum_of_sizes() < BLANK_MEDIA_SIZE )
    {
        // our next file is too big for the remaining media space but our entire total
        // remaining is small enough to fit on a single new media.
        change_media();
        remaining = BLANK_MEDIA_SIZE;
     }
}

    That's on the fly so hopefully its not too far off from correctness. This algorithm makes maximum 
use of backup media at the expense of breaking files into chunks at the end so every bit of media 
is used. If your preference is to keep files intact at the expense of some wasted space on your 
backup media, you could set a MINIMUM_CHUNK_SIZE limit and test that, if the file size is less 
than this limit, change media early. I'd put that test immediately after the first while statement. Also, 
this algorithm pays no attention to directory structure grouping. If  you wanted to do this then you'd 
make the whole thing a tree of trees (matching your directory structure), recurse it descent first, and 
then apply the above algorithm to individual nodes.

	Good luck,

		Ben Scherrey

12/15/2004 12:02:22 PM, Aditya Srinivasan <sriad at uab.edu> wrote:

>The other aspect of doing it on your own is :
>
>The algorithm to separate files into groups <=650 sounds non trivial.
>
>1.List files and sizes
>2.Sort
>3.Pick appropriate combinations from the list (The hard part ??)
>
>Is anyone aware of an algorithm which analyzes this sort of problem and 
>solves it ?
>Sounds like :
>
>Given a set of n objects of different values, separate them into K
>groups such that sum of the objects of that group are less than M. 
>Optimize to minimize K
> 
>
>Thanks,
>sriad






More information about the Ale mailing list