17 Nov, 2011

1 commit

  • There are three cases found that in error cases, journal transactions are not
    committed nor aborted. We should take care of these case by committing the
    transactions. Otherwise, there would left a journal handle which will lead to
    , in same process context, the comming ocfs2_start_trans() gets wrong credits.

    Signed-off-by: Wengang Wang
    Signed-off-by: Joel Becker

    Wengang Wang
     

24 May, 2011

2 commits


31 Mar, 2011

1 commit


07 Mar, 2011

1 commit

  • mlog_exit is used to record the exit status of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    This patch just try to remove it or change it. So:
    1. if all the error paths already use mlog_errno, it is just removed.
    Otherwise, it will be replaced by mlog_errno.
    2. if it is used to print some return value, it is replaced with
    mlog(0,...).
    mlog_exit_ptr is changed to mlog(0.
    All those mlog(0,...) will be replaced with trace events later.

    Signed-off-by: Tao Ma

    Tao Ma
     

22 Feb, 2011

2 commits

  • Since all 4 files, localalloc.c, suballoc.c, alloc.c and
    resize.c, which use DISK_ALLOC are changed to trace events,
    Remove masklog DISK_ALLOC totally.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • This is the first try of replacing debug mlog(0,...) to
    trace events. Wengang has did some work in his original
    patch
    http://oss.oracle.com/pipermail/ocfs2-devel/2009-November/005513.html
    But he didn't finished it.

    So this patch removes all mlog(0,...) from alloc.c and adds
    the corresponding trace events. Different mlogs have different
    solutions.
    1. Some are replaced with trace event directly.
    2. Some are replaced and some new parameters are added since
    I think we need to know the btree owner in that case.
    3. Some are combined into one trace events.
    4. Some redundant mlogs are removed.
    What's more, it defines some event classes so that we can use
    them later.

    Cc: Wengang Wang
    Signed-off-by: Tao Ma

    Tao Ma
     

21 Feb, 2011

1 commit

  • ENTRY is used to record the entry of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    So for mlog_entry_void, we just remove it.
    for mlog_entry(...), we replace it with mlog(0,...), and they
    will be replace by trace event later.

    Signed-off-by: Tao Ma

    Tao Ma
     

08 Jan, 2011

1 commit


16 Dec, 2010

1 commit

  • Recently, one of our colleagues meet with a problem that if we
    write/delete a 32mb files repeatly, we will get an ENOSPC in
    the end. And the corresponding bug is 1288.
    http://oss.oracle.com/bugzilla/show_bug.cgi?id=1288

    The real problem is that although we have freed the clusters,
    they are in truncate log and they will be summed up so that
    we can free them once in a whole.

    So this patch just try to resolve it. In case we see -ENOSPC
    in ocfs2_write_begin_no_lock, we will check whether the truncate
    log has enough clusters for our need, if yes, we will try to
    flush the truncate log at that point and try again. This method
    is inspired by Mark Fasheh . Thanks.

    Cc: Mark Fasheh
    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     

08 Sep, 2010

1 commit

  • We cannot call grab_cache_page() when holding filesystem locks or with
    a transaction started as grab_cache_page() calls page allocation with
    GFP_KERNEL flag and thus page reclaim can recurse back into the filesystem
    causing deadlocks or various assertion failures. We have to use
    find_or_create_page() instead and pass it GFP_NOFS as we do with other
    allocations.

    Acked-by: Mark Fasheh
    Signed-off-by: Jan Kara
    Signed-off-by: Tao Ma

    Jan Kara
     

19 May, 2010

3 commits

  • Joel Becker
     
  • The original idea to pull ocfs2_find_cpos_for_left_leaf() out of
    alloc.c is to benefit punching-holes optimization patch, it however,
    can also be referred by other funcs in the future who want to do the
    same job.

    Signed-off-by: Tristan Ye
    Acked-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Tristan Ye
     
  • Truncate is just a special case of punching holes(from new i_size to
    end), we therefore could take advantage of the existing
    ocfs2_remove_btree_range() to reduce the comlexity and redundancy in
    alloc.c. The goal here is to make truncate more generic and
    straightforward.

    Several functions only used by ocfs2_commit_truncate() will smiply be
    removed.

    ocfs2_remove_btree_range() was originally used by the hole punching
    code, which didn't take refcount trees into account (definitely a bug).
    We therefore need to change that func a bit to handle refcount trees.
    It must take the refcount lock, calculate and reserve blocks for
    refcount tree changes, and decrease refcounts at the end. We replace
    ocfs2_lock_allocators() here by adding a new func
    ocfs2_reserve_blocks_for_rec_trunc() which accepts some extra blocks to
    reserve. This will not hurt any other code using
    ocfs2_remove_btree_range() (such as dir truncate and hole punching).

    I merged the following steps into one patch since they may be
    logically doing one thing, though I know it looks a little bit fat
    to review.

    1). Remove redundant code used by ocfs2_commit_truncate(), since we're
    moving to ocfs2_remove_btree_range anyway.

    2). Add a new func ocfs2_reserve_blocks_for_rec_trunc() for purpose of
    accepting some extra blocks to reserve.

    3). Change ocfs2_prepare_refcount_change_for_del() a bit to fit our
    needs. It's safe to do this since it's only being called by
    truncate.

    4). Change ocfs2_remove_btree_range() a bit to take refcount case into
    account.

    5). Finally, we change ocfs2_commit_truncate() to call
    ocfs2_remove_btree_range() in a proper way.

    The patch has been tested normally for sanity check, stress tests
    with heavier workload will be expected.

    Based on this patch, fixing the punching holes bug will be fairly easy.

    Signed-off-by: Tristan Ye
    Acked-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Tristan Ye
     

06 May, 2010

4 commits

  • In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's
    blocks. But if jbd2_journal_extend() fails, it will only restart
    with the the new number of blocks. This tends to be awkward since
    in most cases we want additional reserved blocks. It makes our code
    harder to mantain since the caller can't be sure all the original
    blocks will not be accessed and dirtied again. There are 15 callers
    of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add
    h_buffer_credits before they call ocfs2_extend_trans(). This makes
    ocfs2_extend_trans() really extend atop the original block count.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     
  • Add a per-inode reservations structure and pass it through to the
    reservations code.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0
    since before the kernel moved to git. There is no point in checking
    this error.

    ocfs2_journal_dirty() has been faithfully returning the status since the
    beginning. All over ocfs2, we have blocks of code checking this can't
    fail status. In the past few years, we've tried to avoid adding these
    checks, because they are pointless. But anyone who looks at our code
    assumes they are needed.

    Finally, ocfs2_journal_dirty() is made a void function. All error
    checking is removed from other files. We'll BUG_ON() the status of
    jbd2_journal_dirty_metadata() just in case they change it someday. They
    won't.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • They all take an ocfs2_alloc_context, which has the allocation inode.

    Signed-off-by: Joel Becker
    Signed-off-by: Tao Ma

    Joel Becker
     

26 Mar, 2010

1 commit


22 Mar, 2010

1 commit


06 Mar, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     

05 Mar, 2010

1 commit

  • Get rid of the alloc_space, free_space, reserve_space, claim_space and
    release_rsv dquot operations - they are always called from the filesystem
    and if a filesystem really needs their own (which none currently does)
    it can just call into it's own routine directly.

    Move shared logic into the common __dquot_alloc_space,
    dquot_claim_space_nodirty and __dquot_free_space low-level methods,
    and rationalize the wrappers around it to move as much as possible
    code into the common block for CONFIG_QUOTA vs not. Also rename
    all these helpers to be named dquot_* instead of vfs_dq_*.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

27 Feb, 2010

1 commit

  • This patch add extent block (metadata) stealing mechanism for
    extent allocation. This mechanism is same as the inode stealing.
    if no room in slot specific extent_alloc, we will try to
    allocate extent block from the next slot.

    Signed-off-by: Tiger Yang
    Acked-by: Tao Ma
    Signed-off-by: Joel Becker

    Tiger Yang
     

24 Dec, 2009

1 commit


17 Dec, 2009

1 commit


04 Dec, 2009

1 commit

  • That is "success", "unknown", "through", "performance", "[re|un]mapping"
    , "access", "default", "reasonable", "[con]currently", "temperature"
    , "channel", "[un]used", "application", "example","hierarchy", "therefore"
    , "[over|under]flow", "contiguous", "threshold", "enough" and others.

    Signed-off-by: André Goddard Rosa
    Signed-off-by: Jiri Kosina

    André Goddard Rosa
     

03 Dec, 2009

1 commit

  • ocfs2 refcount tree is stored as an extent tree while
    the leaf ocfs2_refcount_rec points to a refcount block.

    The following step can trip a kernel panic.
    mkfs.ocfs2 -b 512 -C 1M --fs-features=refcount $DEVICE
    mount -t ocfs2 $DEVICE $MNT_DIR
    FILE_NAME=$RANDOM
    FILE_NAME_1=$RANDOM
    FILE_REF="${FILE_NAME}_ref"
    FILE_REF_1="${FILE_NAME}_ref_1"
    for((i=0;i> $MNT_DIR/$FILE_NAME
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
    done
    for((i=0;i> $MNT_DIR/$FILE_NAME
    done

    for((i=0;i> $MNT_DIR/$FILE_NAME
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
    done

    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME

    for((i=0;i> $MNT_DIR/$FILE_NAME
    cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
    done
    reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF
    # write_f is a program which will write some bytes to a file at offset.
    # write_f -f file_name -l offset -w write_bytes.
    ./write_f -f $MNT_DIR/$FILE_REF -l $[310*1048576] -w 4096
    ./write_f -f $MNT_DIR/$FILE_REF -l $[306*1048576] -w 4096
    ./write_f -f $MNT_DIR/$FILE_REF -l $[311*1048576] -w 4096
    ./write_f -f $MNT_DIR/$FILE_NAME -l $[310*1048576] -w 4096
    ./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
    reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF_1
    ./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
    #kernel panic here.

    The reason is that if the ocfs2_extent_rec is the last record
    in a leaf extent block, the old solution fails to find the
    suitable end cpos. So this patch try to walk through the b-tree,
    find the next sub root and get the c_pos the next sub-tree starts
    from.

    btw, I have runned tristan's test case against the patched kernel
    for several days and this type of kernel panic never happens again.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     

23 Sep, 2009

10 commits

  • In ocfs2_extend_rotate_transaction, op_credits is the orignal
    credits in the handle and we only want to extend the credits
    for the rotation, but the old solution always double it. It
    is harmless for some minor operations, but for actions like
    reflink we may rotate tree many times and cause the credits
    increase dramatically. So this patch try to only increase
    the desired credits.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • During CoW, if the old extent record is refcounted, we allocate
    som new clusters and do CoW. Actually we can have some improvement
    here. If the old extent has refcount=1, that means now it is only
    used by this file. So we don't need to allocate new clusters, just
    remove the refcounted flag and it is OK. We also have to remove
    it from the refcount tree while not deleting it.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • This patch try CoW support for a refcounted record.

    the whole process will be:
    1. Calculate how many clusters we need to CoW and where we start.
    Extents that are not completely encompassed by the write will
    be broken on 1MB boundaries.
    2. Do CoW for the clusters with the help of page cache.
    3. Change the b-tree structure with the new allocated clusters.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • Add 'Decrement refcount for delete' in to the normal truncate
    process. So for a refcounted extent record, call refcount rec
    decrementation instead of cluster free.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • Add function ocfs2_mark_extent_refcounted which can mark
    an extent refcounted.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • Given a physical cpos and length, decrement the refcount
    in the tree. If the refcount for any portion of the extent goes
    to zero, that portion is queued for freeing.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • Now fs/ocfs2/alloc.c has more than 7000 lines. It contains our
    basic b-tree operation. Although we have already make our b-tree
    operation generic, the basic structrue ocfs2_path which is used
    to iterate one b-tree branch is still static and limited to only
    used in alloc.c. As refcount tree need them and I don't want to
    add any more b-tree unrelated code to alloc.c, export them out.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • Add refcount b-tree as a new extent tree so that it can
    use the b-tree to store and maniuplate ocfs2_refcount_rec.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • ocfs2_mark_extent_written actually does the following things:
    1. check the parameters.
    2. initialize the left_path and split_rec.
    3. call __ocfs2_mark_extent_written. it will do:
    1) check the flags of unwritten
    2) do the real split work.
    The whole process is packed tightly somehow. So this patch
    will abstract 2 different functions so that future b-tree
    operation can work with it.

    1. __ocfs2_split_extent will accept path and split_rec and do
    the real split work.
    2. ocfs2_change_extent_flag will accept a new flag and initialize
    path and split_rec.

    So now ocfs2_mark_extent_written will do:
    1. check the parameters.
    2. call ocfs2_change_extent_flag.
    1) initalize the left_path and split_rec.
    2) check whether the new flags conflict with the old one.
    3) call __ocfs2_split_extent to do the split.

    Signed-off-by: Tao Ma

    Tao Ma
     
  • Add a new operation eo_ocfs2_extent_contig int the extent tree's
    operations vector. So that with the new refcount tree, We want
    this so that refcount trees can always return CONTIG_NONE and
    prevent extent merging.

    Signed-off-by: Tao Ma

    Tao Ma
     

05 Sep, 2009

3 commits