04 Apr, 2009

1 commit


06 Jan, 2009

2 commits

  • The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
    commit triggers and allow us to compute metadata ecc right before the
    buffers are written out. This commit provides ecc for inodes, extent
    blocks, group descriptors, and quota blocks. It is not safe to use
    extened attributes and metaecc at the same time yet.

    The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
    the type of block at their root. Before, it didn't matter, but now the
    root block must use the appropriate ocfs2_journal_access_*() function.
    To keep this abstract, the structures now have a pointer to the matching
    journal_access function and a wrapper call to call it.

    A few places use naked ocfs2_write_block() calls instead of adding the
    blocks to the journal. We make sure to calculate their checksum and ecc
    before the write.

    Since we pass around the journal_access functions. Let's typedef them
    in ocfs2.h.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2 code currently reads inodes off disk with a simple
    ocfs2_read_block() call. Each place that does this has a different set
    of sanity checks it performs. Some check only the signature. A couple
    validate the block number (the block read vs di->i_blkno). A couple
    others check for VALID_FL. Only one place validates i_fs_generation. A
    couple check nothing. Even when an error is found, they don't all do
    the same thing.

    We wrap inode reading into ocfs2_read_inode_block(). This will validate
    all the above fields, going readonly if they are invalid (they never
    should be). ocfs2_read_inode_block_full() is provided for the places
    that want to pass read_block flags. Every caller is passing a struct
    inode with a valid ip_blkno, so we don't need a separate blkno argument
    either.

    We will remove the validation checks from the rest of the code in a
    later commit, as they are no longer necessary.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

15 Oct, 2008

3 commits

  • ocfs2_read_blocks() currently requires the CACHED flag for cached I/O.
    However, that's the common case. Let's flip it around and provide an
    IGNORE_CACHE flag for the special users. This has the added benefit of
    cleaning up the code some (ignore_cache takes on its special meaning
    earlier in the loop).

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
    Only six pass a different flag set. Rather than have every caller care,
    let's make ocfs2_read_block() take no flags and always do a cached read.
    The remaining six places can call ocfs2_read_blocks() directly.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Now that synchronous readers are using ocfs2_read_blocks_sync(), all
    callers of ocfs2_read_blocks() are passing an inode. Use it
    unconditionally. Since it's there, we don't need to pass the
    ocfs2_super either.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

14 Oct, 2008

6 commits

  • This is pointless as brelse() already does the check.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • It can also be moved into ocfs2_la_debug_read().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • ocfs2 inode numbers are block numbers. For any filesystem with less
    than 2^32 blocks, this is not a problem. However, when ocfs2 starts
    using JDB2, it will be able to support filesystems with more than 2^32
    blocks. This would result in inode numbers higher than 2^32.

    The problem is that stat(2) can't handle those numbers on 32bit
    machines. The simple solution is to have ocfs2 allocate all inodes
    below that boundary.

    The suballoc code is changed to honor an optional block limit. Only the
    inode suballocator sets that limit - all other allocations stay unlimited.

    The biggest trick is to grow the inode suballocator beneath that limit.
    There's no point in allocating block groups that are above the limit,
    then rejecting their elements later on. We want to prevent the inode
    allocator from ever having block groups above the limit. This involves
    a little gyration with the local alloc code. If the local alloc window
    is above the limit, it signals the caller to try the global bitmap but
    does not disable the local alloc file (which can be used for other
    allocations).

    [ Minor cleanup - removed an ML_NOTICE comment. --Mark ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • A per-mount debugfs file, "local_alloc" is created which when read will
    expose live state of the nodes local alloc file. Performance impact is
    minimal, only a bit of memory overhead per mount point. Still, the code is
    hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug
    local alloc performance problems on a live system.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Ocfs2's local allocator disables itself for the duration of a mount point
    when it has trouble allocating a large enough area from the primary bitmap.
    That can cause performance problems, especially for disks which were only
    temporarily full or fragmented. This patch allows for the allocator to
    shrink it's window first, before being disabled. Later, it can also be
    re-enabled so that any performance drop is minimized.

    To do this, we allow the value of osb->local_alloc_bits to be shrunk when
    needed. The default value is recorded in a mostly read-only variable so that
    we can re-initialize when required.

    Locking had to be updated so that we could protect changes to
    local_alloc_bits. Mostly this involves protecting various local alloc values
    with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
    is used when the local allocator is has shrunk, but is not disabled. If the
    available space dips below 1 megabyte, the local alloc file is disabled. In
    either case, local alloc is re-enabled 30 seconds after the event, or when
    an appropriate amount of bits is seen in the primary bitmap.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Do this instead of tracking absolute local alloc size. This avoids
    needless re-calculatiion of bits from bytes in localalloc.c. Additionally,
    the value is now in a more natural unit for internal file system bitmap
    work.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

15 Jul, 2008

1 commit


01 May, 2008

1 commit

  • kmalloc() during a localalloc window move can trigger the mm to prune
    the dcache which inturn can trigger the fs to delete an inode causing
    it start a recursive transaction.

    The fix also makes the change in kmalloc during localalloc shutdown
    just to be safe.

    Fixes oss bugzilla#901
    http://oss.oracle.com/bugzilla/show_bug.cgi?id=901

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     

18 Apr, 2008

2 commits

  • Inode allocation is modified to look in other nodes allocators during
    extreme out of space situations. We retry our own slot when space is freed
    back to the global bitmap, or whenever we've allocated more than 1024 inodes
    from another slot.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • In inode stealing, we no longer restrict the allocation to
    happen in the local node. So it is neccessary for us to add
    a new member in ocfs2_alloc_context to indicate which slot
    we are using for allocation. We also modify the process of
    local alloc so that this member can be used there also.

    Signed-off-by: Tao Ma
    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Tao Ma
     

04 Mar, 2008

2 commits

  • replace all:
    little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
    expression_in_cpu_byteorder);
    with:
    leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
    generated with semantic patch

    Signed-off-by: Marcin Slusarz
    Signed-off-by: Mark Fasheh

    Marcin Slusarz
     
  • Commit 2fbe8d1ebe004425b4f7b8bba345623d2280be82 disabled localalloc
    for local mounts. This caused issues as ocfs2 uses localalloc to
    provide write locality. This patch enables localalloc for local mounts.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     

26 Jan, 2008

2 commits

  • Local alloc is a performance optimization in ocfs2 in which a node
    takes a window of bits from the global bitmap and then uses that for
    all small local allocations. This window size is fixed to 8MB currently.
    This patch allows users to specify the window size in MB including
    disabling it by passing in 0. If the number specified is too large,
    the fs will use the default value of 8MB.

    mount -o localalloc=X /dev/sdX /mntpoint

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • Call this the "inode_lock" now, since it covers both data and meta data.
    This patch makes no functional changes.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

28 Nov, 2007

1 commit

  • Enable expensive bitmap scanning only if DEBUG option is enabled.
    The bitmap scanning quite loads the CPU and on my machine the write
    throughput of dd if=/dev/zero of=/ocfs2/file bs=1M count=500 conv=sync
    improves from 37 MB/s to 45.4 MB/s in local mode...

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     

04 Oct, 2007

1 commit


21 Sep, 2007

1 commit

  • The ocfs2 write code loops through a page much like the block code, except
    that ocfs2 allocation units can be any size, including larger than page
    size. Typically it's equal to or larger than page size - most kernels run 4k
    pages, the minimum ocfs2 allocation (cluster) size.

    Some changes introduced during 2.6.23 changed the way writes to pages are
    handled, and inadvertantly broke support for > 4k page size. Instead of just
    writing one cluster at a time, we now handle the whole page in one pass.

    This means that multiple (small) seperate allocations might happen in the
    same pass. The allocation code howver typically optimizes by getting the
    maximum which was reserved. This triggered a BUG_ON in the extend code where
    it'd ask for a single bit (for one part of a > 4k page) and get back more
    than it asked for.

    Fix this by providing a variant of the high level allocation function which
    allows the caller to specify a maximum. The traditional function remains and
    just calls the new one with a maximum determined from the initial
    reservation.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

26 May, 2007

1 commit

  • We weren't cleaning up our inode reference on error in
    ocfs2_reserve_local_alloc_bits(). Add a check for error return and iput() if
    need be. Move the code to set the alloc context inode info to the end of the
    function so we don't have any possibility of passing back a bad pointer.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

14 Dec, 2006

1 commit

  • All kcalloc() calls of the form "kcalloc(1,...)" are converted to the
    equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect
    ordering of the first two arguments are fixed.

    Signed-off-by: Robert P. J. Day
    Cc: Jeff Garzik
    Cc: Alan Cox
    Cc: Dominik Brodowski
    Cc: Adam Belay
    Cc: James Bottomley
    Cc: Greg KH
    Cc: Mark Fasheh
    Cc: Trond Myklebust
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

02 Dec, 2006

7 commits


08 Aug, 2006

1 commit

  • Record the most recently used allocation group on the allocation context, so
    that subsequent allocations can attempt to optimize for contiguousness.
    Local alloc especially should benefit from this as the current chain search
    tends to let it spew across the disk.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

25 Mar, 2006

1 commit


10 Jan, 2006

1 commit


04 Jan, 2006

1 commit