16 Apr, 2015

1 commit


20 Nov, 2014

1 commit


04 Nov, 2014

1 commit


04 Apr, 2014

1 commit

  • The following patches are reverted in this patch because these patches
    caused performance regression in the remote unlink() calls.

    ea455f8ab683 - ocfs2: Push out dropping of dentry lock to ocfs2_wq
    f7b1aa69be13 - ocfs2: Fix deadlock on umount
    5fd131893793 - ocfs2: Don't oops in ocfs2_kill_sb on a failed mount

    Previous patches in this series removed the possible deadlocks from
    downconvert thread so the above patches shouldn't be needed anymore.

    The regression is caused because these patches delay the iput() in case
    of dentry unlocks. This also delays the unlocking of the open lockres.
    The open lockresource is required to test if the inode can be wiped from
    disk or not. When the deleting node does not get the open lock, it
    marks it as orphan (even though it is not in use by another
    node/process) and causes a journal checkpoint. This delays operations
    following the inode eviction. This also moves the inode to the orphaned
    inode which further causes more I/O and a lot of unneccessary orphans.

    The following script can be used to generate the load causing issues:

    declare -a create
    declare -a remove
    declare -a iterations=(1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384)
    unique="`mktemp -u XXXXX`"
    script="/tmp/idontknow-${unique}.sh"
    cat < "${script}"
    for n in {1..8}; do mkdir -p test/dir\${n}
    eval touch test/dir\${n}/foo{1.."\$1"}
    done
    EOF
    chmod 700 "${script}"

    function fcreate ()
    {
    exec 2>&1 /usr/bin/time --format=%E "${script}" "$1"
    }

    function fremove ()
    {
    exec 2>&1 /usr/bin/time --format=%E ssh node2 "cd `pwd`; rm -Rf test*"
    }

    function fcp ()
    {
    exec 2>&1 /usr/bin/time --format=%E ssh node3 "cd `pwd`; cp -R test test.new"
    }

    echo -------------------------------------------------
    echo "| # files | create #s | copy #s | remove #s |"
    echo -------------------------------------------------
    for ((x=0; x < ${#iterations[*]} ; x++)) do
    create[$x]="`fcreate ${iterations[$x]}`"
    copy[$x]="`fcp ${iterations[$x]}`"
    remove[$x]="`fremove`"
    printf "| %8d | %9s | %9s | %9s |\n" ${iterations[$x]} ${create[$x]} ${copy[$x]} ${remove[$x]}
    done
    rm "${script}"
    echo "------------------------"

    Signed-off-by: Srinivas Eeda
    Signed-off-by: Goldwyn Rodrigues
    Signed-off-by: Jan Kara
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     

30 Sep, 2013

1 commit


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

14 Jul, 2012

3 commits


10 Mar, 2011

1 commit


07 Mar, 2011

1 commit

  • mlog_exit is used to record the exit status of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    This patch just try to remove it or change it. So:
    1. if all the error paths already use mlog_errno, it is just removed.
    Otherwise, it will be replaced by mlog_errno.
    2. if it is used to print some return value, it is replaced with
    mlog(0,...).
    mlog_exit_ptr is changed to mlog(0.
    All those mlog(0,...) will be replaced with trace events later.

    Signed-off-by: Tao Ma

    Tao Ma
     

23 Feb, 2011

1 commit


21 Feb, 2011

1 commit

  • ENTRY is used to record the entry of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    So for mlog_entry_void, we just remove it.
    for mlog_entry(...), we replace it with mlog(0,...), and they
    will be replace by trace event later.

    Signed-off-by: Tao Ma

    Tao Ma
     

07 Jan, 2011

6 commits

  • dcache_inode_lock can be replaced with per-inode locking. Use existing
    inode->i_lock for this. This is slightly non-trivial because we sometimes
    need to find the inode from the dentry, which requires d_inode to be
    stabilised (either with refcount or d_lock).

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Require filesystems be aware of .d_revalidate being called in rcu-walk
    mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
    -ECHILD from all implementations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dget_locked was a shortcut to avoid the lazy lru manipulation when we already
    held dcache_lock (lru manipulation was relatively cheap at that point).
    However, how that the lru lock is an innermost one, we never hold it at any
    caller, so the lock cost can now be avoided. We already have well working lazy
    dcache LRU, so it should be fine to defer LRU manipulations to scan time.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dcache_lock no longer protects anything. remove it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
    from concurrent modification. d_alias is also protected by d_lock.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Protect d_unhashed(dentry) condition with d_lock. This means keeping
    DCACHE_UNHASHED bit in synch with hash manipulations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

19 Nov, 2010

1 commit

  • I suddenly hit the problem during 2.6.37-rc1 regression test, which was
    introduced by commit '5e98d492406818e6a94c0ba54c61f59d40cefa4a'(Track
    negative entries v3), following scenario reproduces the issue easily:

    Node A Node B
    ================ ============
    $touch testfile
    $ls testfile
    $rm -rf testfile
    $touch testfile
    $ls testfile
    ls: cannot access testfile: No such file or directory

    This patch stops tracking the dentry which was negativated by a inode deletion,
    so as to force the revaliation in next lookup, in case we'll touch the inode
    again in the same node.

    It didn't hurt the performance of multiple lookup for none-existed files anyway,
    while regresses a bit in the first try after a file deletion.

    Signed-off-by: Tristan Ye
    Signed-off-by: Joel Becker

    Tristan Ye
     

11 Sep, 2010

1 commit

  • Track negative dentries by recording the generation number of the parent
    directory in d_fsdata. The generation number for the parent directory is
    recorded in the inode_info, which increments every time the lock on the
    directory is dropped.

    If the generation number of the parent directory and the negative dentry
    matches, there is no need to perform the revalidate, else a revalidate
    is forced. This improves performance in situations where nodes look for
    the same non-existent file multiple times.

    Thanks Mark for explaining the DLM sequence.

    Signed-off-by: Goldwyn Rodrigues
    Signed-off-by: Joel Becker

    Goldwyn Rodrigues
     

28 Aug, 2009

1 commit

  • In commit a5a0a630922a2f6a774b6dac19f70cb5abd86bb0, when
    ocfs2_attch_dentry_lock fails, we call an extra iput and reset
    dentry->d_fsdata to NULL. This resolve a bug, but it isn't
    completed and the dentry is still there. When we want to use
    it again, ocfs2_dentry_revalidate doesn't catch it and return
    true. That make future ocfs2_dentry_lock panic out.
    One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162.

    The resolution is to add a check for dentry->d_fsdata in
    revalidate process and return false if dentry->d_fsdata is NULL,
    so that a new ocfs2_lookup will be called again.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     

22 Jul, 2009

1 commit

  • In commit ea455f8ab68338ba69f5d3362b342c115bea8e13, we moved the dentry lock
    put process into ocfs2_wq. This causes problems during umount because ocfs2_wq
    can drop references to inodes while they are being invalidated by
    invalidate_inodes() causing all sorts of nasty things (invalidate_inodes()
    ending in an infinite loop, "Busy inodes after umount" messages etc.).

    We fix the problem by stopping ocfs2_wq from doing any further releasing of
    inode references on the superblock being unmounted, wait until it finishes
    the current round of releasing and finally cleaning up all the references in
    dentry_lock_list from ocfs2_put_super().

    The issue was tracked down by Tao Ma .

    Signed-off-by: Jan Kara
    Signed-off-by: Joel Becker

    Jan Kara
     

24 Apr, 2009

1 commit

  • In ocfs2_dentry_attach_lock(), if unable to get the dentry lock, we need to
    call iput(inode) because a failure here means no d_instantiate(), which means
    the normally matching iput() will not be called during dput(dentry).

    This patch fixes the oops that accompanies the following message:
    (3996,1):dlm_empty_lockres:2708 ERROR: lockres W00000000000000000a1046b06a4382 still has local locks!
    kernel BUG in dlm_empty_lockres at /rpmbuild/smushran/BUILD/ocfs2-1.4.2/fs/ocfs2/dlm/dlmmaster.c:2709!

    Signed-off-by: Sunil Mushran
    Signed-off-by: Joel Becker

    Sunil Mushran
     

28 Mar, 2009

1 commit


03 Feb, 2009

1 commit

  • Dropping of last reference to dentry lock is a complicated operation involving
    dropping of reference to inode. This can get complicated and quota code in
    particular needs to obtain some quota locks which leads to potential deadlock.
    Thus we defer dropping of inode reference to ocfs2_wq.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     

26 Jan, 2008

1 commit

  • The node maps that are set/unset by these votes are no longer relevant, thus
    we can remove the mount and umount votes. Since those are the last two
    remaining votes, we can also remove the entire vote infrastructure.

    The vote thread has been renamed to the downconvert thread, and the small
    amount of functionality related to managing it has been moved into
    fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

28 Nov, 2007

1 commit

  • The existing bug statement didn't take into account unhashed dentries which
    might not have a cluster lock on them. This could happen if a node exporting
    the file system via NFS is rebooted, re-exported to nfs clients and then
    unmounted. It's fine in this case to not have a dentry cluster lock.

    Just remove the bug statement and replace it with an error print, which
    does the proper checks. Though we want to know if something has happened
    which might have prevented a cluster lock from being created, it's
    definitely not necessary to panic the machine for this.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

07 Nov, 2007

1 commit

  • Do this to avoid a theoretical (I haven't seen this in practice) race where
    the downconvert thread might drop the dentry lock, allowing a remote unlink
    to proceed before dropping the inode locks. This could bounce access to the
    orphan dir between nodes.

    There doesn't seem to be a need to do the same in ocfs2_dentry_iput() as
    that's never called for the last ref drop from the downconvert thread.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

20 Oct, 2007

1 commit


25 Sep, 2006

4 commits

  • We can't use LKM_LOCAL for new dentry locks because an unlink and subsequent
    re-create of a name/inode pair may result in the lock still being mastered
    somewhere in the cluster.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Make use of FS_RENAME_DOES_D_MOVE to avoid a race condition that can occur
    during ->rename() if we d_move() outside of the parent directory cluster
    locks, and another node discovers the new name (created during the rename)
    and unlinks it. d_move() will unconditionally rehash a dentry - which will
    leave stale data in the system.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Replace the dentry vote mechanism with a cluster lock which covers a set
    of dentries. This allows us to force d_delete() only on nodes which actually
    care about an unlink.

    Every node that does a ->lookup() gets a read only lock on the dentry, until
    an unlink during which the unlinking node, will request an exclusive lock,
    forcing the other nodes who care about that dentry to d_delete() it. The
    effect is that we retain a very lightweight ->d_revalidate(), and at the
    same time get to make large improvements to the average case performance of
    the ocfs2 unlink and rename operations.

    This patch adds the higher level API and the dentry manipulation code.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Replace the dentry vote mechanism with a cluster lock which covers a set
    of dentries. This allows us to force d_delete() only on nodes which actually
    care about an unlink.

    Every node that does a ->lookup() gets a read only lock on the dentry, until
    an unlink during which the unlinking node, will request an exclusive lock,
    forcing the other nodes who care about that dentry to d_delete() it. The
    effect is that we retain a very lightweight ->d_revalidate(), and at the
    same time get to make large improvements to the average case performance of
    the ocfs2 unlink and rename operations.

    This patch adds the cluster lock type which OCFS2 can attach to
    dentries. A small number of fs/ocfs2/dcache.c functions are stubbed
    out so that this change can compile.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

25 Mar, 2006

1 commit


04 Jan, 2006

1 commit