31 Mar, 2011

1 commit


06 May, 2010

5 commits

  • When we allocate some bits from the reservation, we always
    allocate from the r_start(see ocfs2_resmap_resv_bits).
    So there should be no reason to check between r_start
    and start. And I don't think we will change this behaviour
    later by allocating from some bits after r_start. Why not make
    ocfs2_adjust_resv_from_alloc simple for now?

    The only chance we have to adjust the reservation is when we haven't
    reached the end. With this patch, the function is more readable.

    Note:
    btw, this patch also fixes an original bug in the function
    which I haven't found before.
    if (end < ocfs2_resv_end(resv))
    rhs = end - ocfs2_resv_end(resv);
    This code is of course buggy. ;)

    Signed-off-by: Tao Ma
    Acked-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Tao Ma
     
  • The default behavior for directory reservations stays the same, but we add a
    mount option so people can tweak the size of directory reservations
    according to their workloads.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Mark Fasheh
     
  • The default reservation size of 4 (32-bit windows) is a bit too ambitious.
    Scale it back to 16 bits (resv_level=2). I have been testing various sizes
    on a 4-node cluster which runs a mixed workload that is heavily threaded.
    With a 256MB local alloc, I get *roughly* the following levels of average file
    fragmentation:

    resv_level=0 70%
    resv_level=1 21%
    resv_level=2 23%
    resv_level=3 24%
    resv_level=4 60%
    resv_level=5 did not test
    resv_level=6 60%

    resv_level=2 seemed like a good compromise between not letting windows be
    too small, but not so big that heavier workloads will immediately suffer
    without tuning.

    This patch also change the behavior of directory reservations - they now
    track file reservations. The previous compromise of giving directory
    windows only 8 bits wound up fragmenting more at some window sizes because
    file allocations had smaller unused windows to poach from.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Mark Fasheh
     
  • Use the reservations system for unindexed dir tree allocations. We don't
    bother with the indexed tree as reads from it are mostly random anyway.
    Directory reservations are marked seperately, to allow the reservations code
    a chance to optimize their window sizes. This patch allocates only 8 bits
    for directory windows as they generally are not expected to grow as quickly
    as file data. Future improvements to dir window sizing can trivially be
    made.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • This patch improves Ocfs2 allocation policy by allowing an inode to
    reserve a portion of the local alloc bitmap for itself. The reserved
    portion (allocation window) is advisory in that other allocation
    windows might steal it if the local alloc bitmap becomes
    full. Otherwise, the reservations are honored and guaranteed to be
    free. When the local alloc window is moved to a different portion of
    the bitmap, existing reservations are discarded.

    Reservation windows are represented internally by a red-black
    tree. Within that tree, each node represents the reservation window of
    one inode. An LRU of active reservations is also maintained. When new
    data is written, we allocate it from the inodes window. When all bits
    in a window are exhausted, we allocate a new one as close to the
    previous one as possible. Should we not find free space, an existing
    reservation is pulled off the LRU and cannibalized.

    Signed-off-by: Mark Fasheh

    Mark Fasheh