[ale] ext2fs and non-fragmentation

Chris Ricker kaboom at gatech.edu
Thu May 30 10:34:55 EDT 2002


On 30 May 2002, Danny Cox wrote:

> Steve,
> 
> On Thu, 2002-05-30 at 07:03, sangell at nan.net wrote:
> > Anyone know of a good site that explains in detail the how ext2fs avoids
> > fragmenting disks, (or maybe you can explain it yourself). I am trying to
> > replace some more MS servers and was asked the question in a meeting
> > yesterday, "How does linux avoid fragmenting drives?" and to be quite
> > honest I couldn't answer, and to some "It just doesn't!" is not a
> > sufficient answer. I tried a few searches on google but the previous answer
> > was about all I could find.
> 
> 	One main concept works on the idea of allocating inodes/data within
> "cylinder groups", keeping the data and meta data together.  When
> growing a file, if it can find the necessary room in the current
> cylinder group, it uses that.  Only when it's full does it change to
> another, which becomes the "current" cylinder group for the next
> allocation.
> 
> 	It *does* eventually fragment badly, but only when the FS is 95% (or
> some magic percentage), and then it *really* slows down.  That's why 10%
> (or 5%) is only reserved for root.  Normal users can't use that last
> little bit to really slow the system down.
> 
> 	So, the answer is: certainly files are fragmented, but usually within
> one cylinder group, so next-block-lookup is still fast, and doesn't move
> the head assembly too much.
> 
> 	As to where I saw this, it was long ago, in a collection of papers on
> BSD.  The paper was entitled something like "Implementation of the (a)
> Fast File System" or "The Berkely Fast File System".  So, looking on the
> various BSD sites may get you further.

 
All this is true for the FFS / UFS file system, and documentation of it,
like you say, is in Kirk McKusick's papers ("A Fast File System for Unix",
etc.).

ext2 is conceptually similar, but the terminology's different.  Check out
/usr/src/linux/fs/ext2/ialloc.c to see how fragmentation of directories and
inodes are handled, and /usr/src/linux/fs/ext2/balloc.c to see how
fragmentation of data blocks is handled.

The basic structure of ext2 is that the fs is divided into block groups
(these are basically the same as McKusick's cylinder groups, with the
difference primarily being that cylinder groups are based on real or, these
days, imagined disk geometry, while block groups don't even pretend to
correspond to the underlying physical structure).  Each block group contains
a map of its blocks and a map of its inodes.  When a new normal
(non-directory) inode is allocated, ext2 just grabs a free inode from the
inode map for the block group of that new file's parent directory (ensuring
that directories and their contents are co-localized on disk, so directory
lookups will be quick).  When a new directory inode is allocated, ext2
searches for the nearest block group which has both lots of free data blocks
(so that the directory can grow in the future w/o fragmenting) and which has
a low number of existing directories (giving each directory local room to
grow, so that normal inode allocation can be done w/in the same group).

When allocating data blocks, ext2 behaves similarly.  If it's growing an
existent file, it looks for adjacent blocks (which were pre-allocated when
the file was created; see next sentence).  If it's a new file, it looks for
a large contiguous group of free blocks w/in the file's inode's block group,
and then creates the file there, allocating the needed blocks and
pre-allocating the adjacent blocks so the file can later grow locally.

This doesn't give you 100% non-fragmented file systems, and as Danny
mentioned, the fragmentation does increase as the file system fills, since
ext2 can no longer cluster inode and block allocations so that files don't
fragment (contrary to popular opinion, the 5% reserved for root is for
performance reasons, not for security reasons).  In practice, though, it's
Good Enough.  There are ext2 defragmentation tools kicking around, but 
no one uses them because the problem's never that bad.

> 	If you can get him to respond, contact Ted Tso (see the MAINTAINERS
> file in /usr/src/linux), and he may point you to some useful
> information.  Then again, he may not. ;-)

Ted Ts'o also maintains a web page about ext2.  I don't have the URL handy, 
but I'm sure Google does ;-).  I think it had a couple of white papers about 
ext2 on it....

later,
chris


---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
sent to listmaster at ale dot org.






More information about the Ale mailing list