15 Jul, 2011

1 commit


08 Jun, 2011

1 commit


25 May, 2011

2 commits

  • Change each shrinker's API by consolidating the existing parameters into
    shrink_control struct. This will simplify any further features added w/o
    touching each file of shrinker.

    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: fix warning]
    [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
    [akpm@linux-foundation.org: fix xfs warning]
    [akpm@linux-foundation.org: update gfs2]
    Signed-off-by: Ying Han
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Acked-by: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han
     
  • This patch fixes a race in the GFS2 glock state machine that may
    result in lockups. The symptom is that all nodes but one will
    hang, waiting for a particular glock. All the holder records
    will have the "W" (Waiting) bit set. The other node will
    typically have the glock stuck in Exclusive mode (EX) with no
    holder records, but the dinode will be cached. In other words,
    an entry with "I:" will appear in the glock dump for that glock,
    but nothing else.

    The race has to do with the glock "Pending Demote" bit, which
    can be set, then immediately reset, thus losing the fact that
    another node needs the glock. The sequence of events is:

    1. Something schedules the glock workqueue (e.g. glock request from fs)
    2. The glock workqueue gets to the point between the test of the reply pending
    bit and the spin lock:

    if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
    finish_xmote(gl, gl->gl_reply);
    drop_ref = 1;
    }
    down_read(&gfs2_umount_flush_sem); gl_spin);

    3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
    (b) the demote request which sets GLF_PENDING_DEMOTE

    4. The following test is executed:

    if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
    gl->gl_state != LM_ST_UNLOCKED &&
    gl->gl_demote_state != LM_ST_EXCLUSIVE) {

    This resets the pending demote flag, and gl->gl_demote_state is not equal to
    exclusive, however because the reply from the dlm arrived after we checked for
    the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
    although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
    GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.

    The patch closes the timing window by only transitioning the
    "Pending demote" bit to the "demote" flag once we know the
    other conditions (not unlocked and not exclusive) are met.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

21 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (32 commits)
    GFS2: Move all locking inside the inode creation function
    GFS2: Clean up symlink creation
    GFS2: Clean up mkdir
    GFS2: Use UUID field in generic superblock
    GFS2: Rename ops_inode.c to inode.c
    GFS2: Inode.c is empty now, remove it
    GFS2: Move final part of inode.c into super.c
    GFS2: Move most of the remaining inode.c into ops_inode.c
    GFS2: Move gfs2_refresh_inode() and friends into glops.c
    GFS2: Remove gfs2_dinode_print() function
    GFS2: When adding a new dir entry, inc link count if it is a subdir
    GFS2: Make gfs2_dir_del update link count when required
    GFS2: Don't use gfs2_change_nlink in link syscall
    GFS2: Don't use a try lock when promoting to a higher mode
    GFS2: Double check link count under glock
    GFS2: Improve bug trap code in ->releasepage()
    GFS2: Fix ail list traversal
    GFS2: make sure fallocate bytes is a multiple of blksize
    GFS2: Add an AIL writeback tracepoint
    GFS2: Make writeback more responsive to system conditions
    ...

    Linus Torvalds
     

05 May, 2011

1 commit

  • Previously we marked all locks being promoted to a higher mode
    with the try flag to avoid any potential deadlocks issues. The
    DLM is able to detect these and report them in way that GFS2 can
    deal with them correctly. So we can just request the required mode
    and wait for a response without needing to perform this check.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

26 Apr, 2011

1 commit

  • Now that the whole dcache_hash_bucket crap is gone, go all the way and
    also remove the weird locking layering violations for locking the hash
    buckets. Add hlist_bl_lock/unlock helpers to move the locking into the
    list abstraction instead of requiring each caller to open code it.
    After all allowing for the bit locks is the whole point of these helpers
    over the plain hlist variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

20 Apr, 2011

4 commits

  • This patch adds writeback_control to writing back the AIL
    list. This means that we can then take advantage of the
    information we get in ->write_inode() in order to set off
    some pre-emptive writeback.

    In addition, the AIL code is cleaned up a bit to make it
    a bit simpler to understand.

    There is still more which can usefully be done in this area,
    but this is a good start at least.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The GLF_LRU flag introduced in the previous patch can be
    used to check if a glock is on the lru list when a new
    holder is queued and if so remove it, without having first
    to get the lru_lock.

    The main purpose of this patch however is to optimise the
    glocks left over when an inode at end of life is being
    evicted. Previously such glocks were left with the GLF_LFLUSH
    flag set, so that when reclaimed, each one required a log flush.
    This patch resets the GLF_LFLUSH flag when there is nothing
    left to flush thus preventing later log flushes as glocks are
    reused or demoted.

    In order to do this, we need to keep track of the number of
    revokes which are outstanding, and also to clear the GLF_LFLUSH
    bit after a log commit when only revokes have been processed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This adds support for two new flags. One keeps track of whether
    the glock is on the LRU list or not. The other isn't really a
    flag as such, but an indication of whether the glock has an
    attached object or not. This indication is reported without
    any locking, which is ok since we do not dereference the object
    pointer but merely report whether it is NULL or not.

    Also, this fixes one place where a tracepoint was missing, which
    was at the point we remove deallocated blocks from the journal.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Rather than allowing the glocks to be scheduled for possible
    reclaim as soon as they have exited the journal, this patch
    delays their entry to the list until the glocks in question
    are no longer in use.

    This means that we will rely on the vm for writeback of all
    dirty data and metadata from now on. When glocks are added
    to the lru list they should be freeable much faster since all
    the I/O required to free them should have already been completed.

    This should lead to much better I/O patterns under low memory
    conditions.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

31 Mar, 2011

1 commit


16 Mar, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
    GFS2: Don't use _raw version of RCU dereference
    GFS2: Adding missing unlock_page()
    GFS2: Update to AIL list locking
    GFS2: introduce AIL lock
    GFS2: fix block allocation check for fallocate
    GFS2: Optimize glock multiple-dequeue code
    GFS2: Remove potential race in flock code
    GFS2: Fix glock deallocation race
    GFS2: quota allows exceeding hard limit
    GFS2: deallocation performance patch
    GFS2: panics on quotacheck update
    GFS2: Improve cluster mmap scalability
    GFS2: Fix glock queue trace point
    GFS2: Post-VFS scale update for RCU path walk
    GFS2: Use RCU for glock hash table

    Linus Torvalds
     

15 Mar, 2011

1 commit


11 Mar, 2011

1 commit

  • This is a small patch that optimizes multiple glock dequeue
    operations. It changes the unlock order to be more efficient
    and makes it easier for lock debugging tools to unravel. It
    also eliminates the need for the temp variable x, although
    that would likely be optimized out.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

09 Mar, 2011

1 commit

  • This patch fixes a race in deallocating glocks which was introduced
    in the RCU glock patch. We need to ensure that the glock count is
    kept correct even in the case that there is a race to add a new
    glock into the hash table. Also, to avoid having to wait for an
    RCU grace period, the glock counter can be decremented before
    call_rcu() is called.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

17 Feb, 2011

1 commit

  • There are two spellings in use for 'freeze' + 'able' - 'freezable' and
    'freezeable'. The former is the more prominent one. The latter is
    mostly used by workqueue and in a few other odd places. Unify the
    spelling to 'freezable'.

    Signed-off-by: Tejun Heo
    Reported-by: Alan Stern
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Greg Kroah-Hartman
    Acked-by: Dmitry Torokhov
    Cc: David Woodhouse
    Cc: Alex Dubov
    Cc: "David S. Miller"
    Cc: Steven Whitehouse

    Tejun Heo
     

31 Jan, 2011

1 commit


21 Jan, 2011

1 commit

  • This has a number of advantages:

    - Reduces contention on the hash table lock
    - Makes the code smaller and simpler
    - Should speed up glock dumps when under load
    - Removes ref count changing in examine_bucket
    - No longer need hash chain lock in glock_put() in common case

    There are some further changes which this enables and which
    we may do in the future. One is to look at using SLAB_RCU,
    and another is to look at using a per-cpu counter for the
    per-sb glock counter, since that is touched twice in the
    lifetime of each glock (but only used at umount time).

    Signed-off-by: Steven Whitehouse
    Cc: Paul E. McKenney

    Steven Whitehouse
     

30 Nov, 2010

5 commits


15 Nov, 2010

1 commit

  • This area of the code has always been a bit delicate due to the
    subtleties of lock ordering. The problem is that for "normal"
    alloc/dealloc, we always grab the inode locks first and the rgrp lock
    later.

    In order to ensure no races in looking up the unlinked, but still
    allocated inodes, we need to hold the rgrp lock when we do the lookup,
    which means that we can't take the inode glock.

    The solution is to borrow the technique already used by NFS to solve
    what is essentially the same problem (given an inode number, look up
    the inode carefully, checking that it really is in the expected
    state).

    We cannot do that directly from the allocation code (lock ordering
    again) so we give the job to the pre-existing delete workqueue and
    carry on with the allocation as normal.

    If we find there is no space, we do a journal flush (required anyway
    if space from a deallocation is to be released) which should block
    against the pending deallocations, so we should always get the space
    back.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

29 Sep, 2010

1 commit

  • The tests further down the recovery function relating to
    unlocking the journal need to be updated to match the
    intial test. Also, a test in the umount code which was
    surplus to requirements has been removed. Umounting
    spectator mounts now works correctly, as expected.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Sep, 2010

2 commits

  • The recovery workqueue can be freezable since
    we want it to finish what it is doing if the system is to
    be frozen (although why you'd want to freeze a cluster node
    is beyond me since it will result in it being ejected from
    the cluster). It does still make sense for single node
    GFS2 filesystems though.

    The glock workqueue will benefit from being able to run more
    work items concurrently. A test running postmark shows
    improved performance and multi-threaded workloads are likely
    to benefit even more. It needs to be high priority because
    the latency directly affects the latency of filesystem glock
    operations.

    The delete workqueue is similar to the recovery workqueue in
    that it must not get blocked by memory allocations, and may
    run for a long time.

    Potentially other GFS2 threads might also be converted to
    workqueues, but I'll leave that for a later patch.

    Signed-off-by: Steven Whitehouse
    Acked-by: Tejun Heo

    Steven Whitehouse
     
  • Due to the design of the VFS, it is quite usual for operations on GFS2
    to consist of a lookup (requiring a shared lock) followed by an
    operation requiring an exclusive lock. If a remote node has cached an
    exclusive lock, then it will receive two demote events in rapid succession
    firstly for a shared lock and then to unlocked. The existing min hold time
    code was triggering in this case, even if the node was otherwise idle
    since the state change time was being updated by the initial demote.

    This patch introduces logic to skip the min hold timer in the case that
    a "double demote" of this kind has occurred. The min hold timer will
    still be used in all other cases.

    A new glock flag is introduced which is used to keep track of whether
    there have been any newly queued holders since the last glock state
    change. The min hold time is only applied if the flag is set.

    Signed-off-by: Steven Whitehouse
    Tested-by: Abhijith Das

    Steven Whitehouse
     

02 Aug, 2010

1 commit


29 Jul, 2010

2 commits

  • This reverts commit b7dc2df5725fe7355fd76000ead7e39728e1b8a9.

    The initial patch didn't quite work since it doesn't cover all
    the possible routes by which the GLF_FROZEN flag might be set.
    A revised fix is coming up in the next patch.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This looks like a big change, but in reality its only a single line of actual
    code change, the rest is just moving a function to before its new caller.
    The "try" flag for glocks is a rather subtle and delicate setting since it
    requires that the state machine tries just hard enough to ensure that it has
    a good chance of getting the requested lock, but no so hard that the
    request can land up blocked behind another.

    The patch adds in an additional check which will fail any queued try
    locks if there is another request blocking the try lock request which
    is not granted and compatible, nor in progress already. The check is made
    only after all pending locks which may be granted have been granted.

    I've checked this with the reproducer for the reported flock bug which
    this is intended to fix, and it now passes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

19 Jul, 2010

1 commit

  • The current shrinker implementation requires the registered callback
    to have global state to work from. This makes it difficult to shrink
    caches that are not global (e.g. per-filesystem caches). Pass the shrinker
    structure to the callback so that users can embed the shrinker structure
    in the context the shrinker needs to operate on and get back to it in the
    callback via container_of().

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

15 Jul, 2010

1 commit

  • This patch fixes bugzilla bug #590878: GFS2: recovery stuck on
    transaction lock. We set the frozen flag on the glock when we receive
    a completion that cannot be delivered due to blocked locks. At that
    point we check to see whether the first waiting holder has the noexp
    flag set. If the noexp lock is queued later, then we need to unfreeze
    the glock at that point in time, namely, in the glock work function.

    This patch was originally written by Steve Whitehouse, but since
    he's on holiday, I'm submitting it. It's been well tested with a
    complex recovery test called revolver.

    Signed-off-by: Steve Whitehouse
    Signed-off-by: Bob Peterson

    Bob Peterson
     

14 Apr, 2010

1 commit

  • This patch fixes a couple gfs2 problems with the reclaiming of
    unlinked dinodes. First, there were a couple of livelocks where
    everything would come to a halt waiting for a glock that was
    seemingly held by a process that no longer existed. In fact, the
    process did exist, it just had the wrong pid number in the holder
    information. Second, there was a lock ordering problem between
    inode locking and glock locking. Third, glock/inode contention
    could sometimes cause inodes to be improperly marked invalid by
    iget_failed.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

01 Mar, 2010

3 commits

  • This patch changes glock numbers from printing in decimal to hex.
    Since DLM prints corresponding resource IDs in hex, it makes debugging
    easier.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • As a consequence of the previous patch, we can now remove the
    loop which used to be required due to the circular dependency
    between the inodes and glocks. Instead we can just invalidate
    the inodes, and then clear up any glocks which are left.

    Also we no longer need the rwsem since there is no longer any
    danger of the inode invalidation calling back into the glock
    code (and from there back into the inode code).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Since the start of GFS2, an "extra" inode has been used to store
    the metadata belonging to each inode. The only reason for using
    this inode was to have an extra address space, the other fields
    were unused. This means that the memory usage was rather inefficient.

    The reason for keeping each inode's metadata in a separate address
    space is that when glocks are requested on remote nodes, we need to
    be able to efficiently locate the data and metadata which relating
    to that glock (inode) in order to sync or sync and invalidate it
    (depending on the remotely requested lock mode).

    This patch adds a new type of glock, which has in addition to
    its normal fields, has an address space. This applies to all
    inode and rgrp glocks (but to no other glock types which remain
    as before). As a result, we no longer need to have the second
    inode.

    This results in three major improvements:
    1. A saving of approx 25% of memory used in caching inodes
    2. A removal of the circular dependency between inodes and glocks
    3. No confusion between "normal" and "metadata" inodes in super.c

    Although the first of these is the more immediately apparent, the
    second is just as important as it now enables a number of clean
    ups at umount time. Those will be the subject of future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Feb, 2010

1 commit

  • Although all glocks are, by the time of the umount glock wait,
    scheduled for demotion, some of them haven't made it far
    enough through the process for the original set of waiting
    code to wait for them.

    This extends the ref count to the whole glock lifetime in order
    to ensure that the waiting does catch all glocks. It does make
    it a bit more invasive, but it seems the only sensible solution
    at the moment.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Dec, 2009

2 commits

  • This patch fixes some ref counting issues. Firstly by moving
    the point at which we drop the ref count after a dlm lock
    operation has completed we ensure that we never call
    gfs2_glock_hold() on a lock with a zero ref count.

    Secondly, by using atomic_dec_and_lock() in gfs2_glock_put()
    we ensure that at no time will a glock with zero ref count
    appear on the lru_list. That means that we can remove the
    check for this in our shrinker (which was racy).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • We need to be careful of the ordering between clearing the
    GLF_LOCK bit and scheduling the workqueue.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse