19 May, 2010

2 commits

  • Joel Becker
     
  • Truncate is just a special case of punching holes(from new i_size to
    end), we therefore could take advantage of the existing
    ocfs2_remove_btree_range() to reduce the comlexity and redundancy in
    alloc.c. The goal here is to make truncate more generic and
    straightforward.

    Several functions only used by ocfs2_commit_truncate() will smiply be
    removed.

    ocfs2_remove_btree_range() was originally used by the hole punching
    code, which didn't take refcount trees into account (definitely a bug).
    We therefore need to change that func a bit to handle refcount trees.
    It must take the refcount lock, calculate and reserve blocks for
    refcount tree changes, and decrease refcounts at the end. We replace
    ocfs2_lock_allocators() here by adding a new func
    ocfs2_reserve_blocks_for_rec_trunc() which accepts some extra blocks to
    reserve. This will not hurt any other code using
    ocfs2_remove_btree_range() (such as dir truncate and hole punching).

    I merged the following steps into one patch since they may be
    logically doing one thing, though I know it looks a little bit fat
    to review.

    1). Remove redundant code used by ocfs2_commit_truncate(), since we're
    moving to ocfs2_remove_btree_range anyway.

    2). Add a new func ocfs2_reserve_blocks_for_rec_trunc() for purpose of
    accepting some extra blocks to reserve.

    3). Change ocfs2_prepare_refcount_change_for_del() a bit to fit our
    needs. It's safe to do this since it's only being called by
    truncate.

    4). Change ocfs2_remove_btree_range() a bit to take refcount case into
    account.

    5). Finally, we change ocfs2_commit_truncate() to call
    ocfs2_remove_btree_range() in a proper way.

    The patch has been tested normally for sanity check, stress tests
    with heavier workload will be expected.

    Based on this patch, fixing the punching holes bug will be fairly easy.

    Signed-off-by: Tristan Ye
    Acked-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Tristan Ye
     

06 May, 2010

4 commits

  • The default behavior for directory reservations stays the same, but we add a
    mount option so people can tweak the size of directory reservations
    according to their workloads.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Mark Fasheh
     
  • Use the reservations system for unindexed dir tree allocations. We don't
    bother with the indexed tree as reads from it are mostly random anyway.
    Directory reservations are marked seperately, to allow the reservations code
    a chance to optimize their window sizes. This patch allocates only 8 bits
    for directory windows as they generally are not expected to grow as quickly
    as file data. Future improvements to dir window sizing can trivially be
    made.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0
    since before the kernel moved to git. There is no point in checking
    this error.

    ocfs2_journal_dirty() has been faithfully returning the status since the
    beginning. All over ocfs2, we have blocks of code checking this can't
    fail status. In the past few years, we've tried to avoid adding these
    checks, because they are pointless. But anyone who looks at our code
    assumes they are needed.

    Finally, ocfs2_journal_dirty() is made a void function. All error
    checking is removed from other files. We'll BUG_ON() the status of
    jbd2_journal_dirty_metadata() just in case they change it someday. They
    won't.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • They all take an ocfs2_alloc_context, which has the allocation inode.

    Signed-off-by: Joel Becker
    Signed-off-by: Tao Ma

    Joel Becker
     

26 Mar, 2010

1 commit


22 Mar, 2010

1 commit


06 Mar, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     

05 Mar, 2010

1 commit

  • Get rid of the alloc_space, free_space, reserve_space, claim_space and
    release_rsv dquot operations - they are always called from the filesystem
    and if a filesystem really needs their own (which none currently does)
    it can just call into it's own routine directly.

    Move shared logic into the common __dquot_alloc_space,
    dquot_claim_space_nodirty and __dquot_free_space low-level methods,
    and rationalize the wrappers around it to move as much as possible
    code into the common block for CONFIG_QUOTA vs not. Also rename
    all these helpers to be named dquot_* instead of vfs_dq_*.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

27 Feb, 2010

1 commit

  • This patch add extent block (metadata) stealing mechanism for
    extent allocation. This mechanism is same as the inode stealing.
    if no room in slot specific extent_alloc, we will try to
    allocate extent block from the next slot.

    Signed-off-by: Tiger Yang
    Acked-by: Tao Ma
    Signed-off-by: Joel Becker

    Tiger Yang
     

05 Sep, 2009

6 commits


04 Jun, 2009

1 commit


22 Apr, 2009

1 commit

  • fs/ocfs2/dir.c: In function ‘ocfs2_extend_dir’:
    fs/ocfs2/dir.c:2700: warning: ‘ret’ may be used uninitialized in this function

    fs/ocfs2/suballoc.c: In function ‘ocfs2_get_suballoc_slot_bit’:
    fs/ocfs2/suballoc.c:2216: warning: comparison is always true due to limited range of data type

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     

08 Apr, 2009

1 commit


04 Apr, 2009

6 commits

  • ocfs2_dx_dir_rebalance() is passed the block offset of a dx leaf which needs
    rebalancing. Since we rebalance an entire cluster at a time however, this
    function needs to calculate the beginning of that cluster, in blocks. The
    calculation was wrong, which would result in a read of non-leaf blocks. Fix
    the calculation by adding ocfs2_block_to_cluster_start() which is a more
    straight-forward way of determining this.

    Reported-by: Tristan Ye
    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • This little bit of extra accounting speeds up ocfs2_empty_dir()
    dramatically by allowing us to short-circuit the full directory scan.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • The only operation which doesn't get faster with directory indexing is
    insert, which still has to walk the entire unindexed directory portion to
    find a free block. This patch provides an improvement in directory insert
    performance by maintaining a singly linked list of directory leaf blocks
    which have space for additional dirents.

    Signed-off-by: Mark Fasheh
    Acked-by: Joel Becker

    Mark Fasheh
     
  • Allow us to store a small number of directory index records in the
    ocfs2_dx_root_block. This saves us a disk read on small to medium sized
    directories (less than about 250 entries). The inline root is automatically
    turned into a root block with extents if the directory size increases beyond
    it's capacity.

    Signed-off-by: Mark Fasheh
    Acked-by: Joel Becker

    Mark Fasheh
     
  • This patch makes use of Ocfs2's flexible btree code to add an additional
    tree to directory inodes. The new tree stores an array of small,
    fixed-length records in each leaf block. Each record stores a hash value,
    and pointer to a block in the traditional (unindexed) directory tree where a
    dirent with the given name hash resides. Lookup exclusively uses this tree
    to find dirents, thus providing us with constant time name lookups.

    Some of the hashing code was copied from ext3. Unfortunately, it has lots of
    unfixed checkpatch errors. I left that as-is so that tracking changes would
    be easier.

    Signed-off-by: Mark Fasheh
    Acked-by: Joel Becker

    Mark Fasheh
     
  • Many directory manipulation calls pass around a tuple of dirent, and it's
    containing buffer_head. Dir indexing has a bit more state, but instead of
    adding yet more arguments to functions, we introduce 'struct
    ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but
    future patches will add more state.

    Signed-off-by: Mark Fasheh
    Acked-by: Joel Becker

    Mark Fasheh
     

06 Jan, 2009

8 commits

  • Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the
    dirblocks.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Future ocfs2 features metaecc and indexed directories need to store a
    little bit of data in each dirblock. For compatibility, we place this
    in a trailer at the end of the dirblock. The trailer plays itself as an
    empty dirent, so that if the features are turned off, it can be reused
    without requiring a tunefs scan.

    This code adds the trailer and validates it when the block is read in.

    [ Mark is the original author, but I reinserted this code before his
    dir index work. -- Joel ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
    commit triggers and allow us to compute metadata ecc right before the
    buffers are written out. This commit provides ecc for inodes, extent
    blocks, group descriptors, and quota blocks. It is not safe to use
    extened attributes and metaecc at the same time yet.

    The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
    the type of block at their root. Before, it didn't matter, but now the
    root block must use the appropriate ocfs2_journal_access_*() function.
    To keep this abstract, the structures now have a pointer to the matching
    journal_access function and a wrapper call to call it.

    A few places use naked ocfs2_write_block() calls instead of adding the
    blocks to the journal. We make sure to calculate their checksum and ecc
    before the write.

    Since we pass around the journal_access functions. Let's typedef them
    in ocfs2.h.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Add quota calls for allocation and freeing of inodes and space, also update
    estimates on number of needed credits for a transaction. Move out inode
    allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
    outside of a transaction.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Now that we've centralized the ocfs2_read_virt_blocks() code, let's use
    it in ocfs2_read_dir_block().

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Add an optional validation hook to ocfs2_read_blocks(). Now the
    validation function is only called when a block was actually read off of
    disk. It is not called when the buffer was in cache.

    We add a buffer state bit BH_NeedsValidate to flag these buffers. It
    must always be one higher than the last JBD2 buffer state bit.

    The dinode, dirblock, extent_block, and xattr_block validators are
    lifted to this scheme directly. The group_descriptor validator needs to
    be split into two pieces. The first part only needs the gd buffer and
    is passed to ocfs2_read_block(). The second part requires the dinode as
    well, and is called every time. It's only 3 compares, so it's tiny.
    This also allows us to clean up the non-fatal gd check used by resize.c.
    It now has no magic argument.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • We have ocfs2_bread() as a vestige of the original ext-based dir code.
    It's only used by directories, though. Turn it into
    ocfs2_read_dir_block(), with a prototype matching the other metadata
    read functions. It's set up to validate dirblocks when the time comes.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2 code currently reads inodes off disk with a simple
    ocfs2_read_block() call. Each place that does this has a different set
    of sanity checks it performs. Some check only the signature. A couple
    validate the block number (the block read vs di->i_blkno). A couple
    others check for VALID_FL. Only one place validates i_fs_generation. A
    couple check nothing. Even when an error is found, they don't all do
    the same thing.

    We wrap inode reading into ocfs2_read_inode_block(). This will validate
    all the above fields, going readonly if they are invalid (they never
    should be). ocfs2_read_inode_block_full() is provided for the places
    that want to pass read_block flags. Every caller is passing a struct
    inode with a valid ip_blkno, so we don't need a separate blkno argument
    either.

    We will remove the validation checks from the rest of the code in a
    later commit, as they are no longer necessary.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

15 Oct, 2008

5 commits

  • ocfs2_read_blocks() currently requires the CACHED flag for cached I/O.
    However, that's the common case. Let's flip it around and provide an
    IGNORE_CACHE flag for the special users. This has the added benefit of
    cleaning up the code some (ignore_cache takes on its special meaning
    earlier in the loop).

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • ocfs2's cached buffer I/O goes through ocfs2_read_block(s)(). dir.c had
    a naked wait_on_buffer() to wait for some readahead, but it should
    use ocfs2_read_block() instead.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • dir.c is the only place using ocfs2_bread(), so let's make it static to
    that file.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
    Only six pass a different flag set. Rather than have every caller care,
    let's make ocfs2_read_block() take no flags and always do a cached read.
    The remaining six places can call ocfs2_read_blocks() directly.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Now that synchronous readers are using ocfs2_read_blocks_sync(), all
    callers of ocfs2_read_blocks() are passing an inode. Use it
    unconditionally. Since it's there, we don't need to pass the
    ocfs2_super either.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     

14 Oct, 2008

1 commit