26 Aug, 2017

1 commit

  • This patch cleans up various pieces of GFS2 to avoid sparse errors.
    This doesn't fix them all, but it fixes several. The first error,
    in function glock_hash_walk was a genuine bug where the rhashtable
    could be started and not stopped.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

16 Aug, 2017

1 commit

  • When using cman-3.0.12.1 and gfs2-utils-3.0.12.1, mounting and
    unmounting GFS2 file system would cause kernel to hang. The slab
    allocator suggests that it is likely a double free memory corruption.
    The issue is traced back to v3.9-rc6 where a patch is submitted to
    use kzalloc() for storing a bitmap instead of using a local variable.
    The intention is to allocate memory during mount and to free memory
    during unmount. The original patch misses a code path which has
    already freed the memory and caused memory corruption. This patch sets
    the memory pointer to NULL after the memory is freed, so that double
    free memory corruption will not happen.

    gdlm_mount()
    '-- set_recover_size() which use kzalloc()
    '-- if dlm does not support ops callbacks then
    '--- free_recover_size() which use kfree()

    gldm_unmount()
    '-- free_recover_size() which use kfree()

    Previous patch which introduced the double free issue is
    commit 57c7310b8eb9 ("GFS2: use kmalloc for lvb bitmap")

    Signed-off-by: Thomas Tai
    Signed-off-by: Bob Peterson
    Reviewed-by: Liam R. Howlett

    Thomas Tai
     

02 Mar, 2017

1 commit


10 Nov, 2015

1 commit

  • Switch everything to the new and more capable implementation of abs().
    Mainly to give the new abs() a bit of a workout.

    Cc: Michal Nazarewicz
    Cc: John Stultz
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

04 Sep, 2015

2 commits


28 Jul, 2014

1 commit


18 Jul, 2014

1 commit


16 Jul, 2014

1 commit

  • The current "wait_on_bit" interface requires an 'action'
    function to be provided which does the actual waiting.
    There are over 20 such functions, many of them identical.
    Most cases can be satisfied by one of just two functions, one
    which uses io_schedule() and one which just uses schedule().

    So:
    Rename wait_on_bit and wait_on_bit_lock to
    wait_on_bit_action and wait_on_bit_lock_action
    to make it explicit that they need an action function.

    Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
    which are *not* given an action function but implicitly use
    a standard one.
    The decision to error-out if a signal is pending is now made
    based on the 'mode' argument rather than being encoded in the action
    function.

    All instances of the old wait_on_bit and wait_on_bit_lock which
    can use the new version have been changed accordingly and their
    action functions have been discarded.
    wait_on_bit{_lock} does not return any specific error code in the
    event of a signal so the caller must check for non-zero and
    interpolate their own error code as appropriate.

    The wait_on_bit() call in __fscache_wait_on_invalidate() was
    ambiguous as it specified TASK_UNINTERRUPTIBLE but used
    fscache_wait_bit_interruptible as an action function.
    David Howells confirms this should be uniformly
    "uninterruptible"

    The main remaining user of wait_on_bit{,_lock}_action is NFS
    which needs to use a freezer-aware schedule() call.

    A comment in fs/gfs2/glock.c notes that having multiple 'action'
    functions is useful as they display differently in the 'wchan'
    field of 'ps'. (and /proc/$PID/wchan).
    As the new bit_wait{,_io} functions are tagged "__sched", they
    will not show up at all, but something higher in the stack. So
    the distinction will still be visible, only with different
    function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
    gfs2/glock.c case).

    Since first version of this patch (against 3.15) two new action
    functions appeared, on in NFS and one in CIFS. CIFS also now
    uses an action function that makes the same freezer aware
    schedule call as NFS.

    Signed-off-by: NeilBrown
    Acked-by: David Howells (fscache, keys)
    Acked-by: Steven Whitehouse (gfs2)
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steve French
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
    Signed-off-by: Ingo Molnar

    NeilBrown
     

18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Mar, 2014

2 commits

  • Add pr_fmt, remove embedded "GFS2: " prefixes.
    This now consistently emits lower case "gfs2: " for each message.

    Other miscellanea around these changes:

    o Add missing newlines
    o Coalesce formats
    o Realign arguments

    Signed-off-by: Joe Perches
    Signed-off-by: Steven Whitehouse

    Joe Perches
     
  • -All printk(KERN_foo converted to pr_foo().
    -Messages updated to fit in 80 columns.
    -fs_macros converted as well.
    -fs_printk removed.

    Signed-off-by: Fabian Frederick
    Signed-off-by: Steven Whitehouse

    Fabian Frederick
     

17 Feb, 2014

1 commit


16 Nov, 2013

1 commit


04 Apr, 2013

2 commits


28 Jan, 2013

1 commit


02 Jan, 2013

1 commit

  • When generating the DLM lock name, a value of 0 would skip
    the loop and leave the string unchanged. This left locks with
    a value of 0 unlabeled. Initializing the string to '0' fixes this.

    Signed-off-by: Nathan Straz
    Signed-off-by: Steven Whitehouse

    Nathan Straz
     

15 Nov, 2012

2 commits


14 Nov, 2012

1 commit

  • When unmounting, gfs2 does a full dlm_unlock operation on every
    cached lock. This can create a very large amount of work and can
    take a long time to complete. However, the vast majority of these
    dlm unlock operations are unnecessary because after all the unlocks
    are done, gfs2 leaves the dlm lockspace, which automatically clears
    the locks of the leaving node, without unlocking each one individually.
    So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
    remove the locks implicitly. The one exception is when the lock's lvb is
    being used. In this case, dlm_unlock is called because it may update the
    lvb of the resource.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

21 Aug, 2012

1 commit

  • flush[_delayed]_work_sync() are now spurious. Mark them deprecated
    and convert all users to flush[_delayed]_work().

    If you're cc'd and wondering what's going on: Now all workqueues are
    non-reentrant and the regular flushes guarantee that the work item is
    not pending or running on any CPU on return, so there's no reason to
    use the sync flushes at all and they're going away.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Ian Campbell
    Cc: Jens Axboe
    Cc: Mattia Dongili
    Cc: Kent Yoder
    Cc: David Airlie
    Cc: Jiri Kosina
    Cc: Karsten Keil
    Cc: Bryan Wu
    Cc: Benjamin Herrenschmidt
    Cc: Alasdair Kergon
    Cc: Mauro Carvalho Chehab
    Cc: Florian Tobias Schandinat
    Cc: David Woodhouse
    Cc: "David S. Miller"
    Cc: linux-wireless@vger.kernel.org
    Cc: Anton Vorontsov
    Cc: Sangbeom Kim
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: Eric Van Hensbergen
    Cc: Takashi Iwai
    Cc: Steven Whitehouse
    Cc: Petr Vandrovec
    Cc: Mark Fasheh
    Cc: Christoph Hellwig
    Cc: Avi Kivity

    Tejun Heo
     

03 May, 2012

1 commit

  • The "nodir" mode (statically assign master nodes instead
    of using the resource directory) has always been highly
    experimental, and never seriously used. This commit
    fixes a number of problems, making nodir much more usable.

    - Major change to recovery: recover all locks and restart
    all in-progress operations after recovery. In some
    cases it's not possible to know which in-progess locks
    to recover, so recover all. (Most require recovery
    in nodir mode anyway since rehashing changes most
    master nodes.)

    - Change the way nodir mode is enabled, from a command
    line mount arg passed through gfs2, into a sysfs
    file managed by dlm_controld, consistent with the
    other config settings.

    - Allow recovering MSTCPY locks on an rsb that has not
    yet been turned into a master copy.

    - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
    from a previous, aborted recovery cycle. Base this
    on the local recovery status not being in the state
    where any nodes should be sending LOCK messages for the
    current recovery cycle.

    - Hold rsb lock around dlm_purge_mstcpy_locks() because it
    may run concurrently with dlm_recover_master_copy().

    - Maintain highbast on process-copy lkb's (in addition to
    the master as is usual), because the lkb can switch
    back and forth between being a master and being a
    process copy as the master node changes in recovery.

    - When recovering MSTCPY locks, flag rsb's that have
    non-empty convert or waiting queues for granting
    at the end of recovery. (Rename flag from LOCKS_PURGED
    to RECOVER_GRANT and similar for the recovery function,
    because it's not only resources with purged locks
    that need grant a grant attempt.)

    - Replace a couple of unnecessary assertion panics with
    error messages.

    Signed-off-by: David Teigland

    David Teigland
     

24 Apr, 2012

1 commit

  • This patch instructs DLM to prevent an "in place" conversion, where the
    lock just stays on the granted queue, and instead forces the conversion to
    the back of the convert queue. This is done on upward conversions only.

    This is useful in cases where, for example, a lock is frequently needed in
    PR on one node, but another node needs it temporarily in EX to update it.
    This may happen, for example, when the rindex is being updated by gfs2_grow.
    The gfs2_grow needs to have the lock in EX, but the other nodes need to
    re-read it to retrieve the updates. The glock is already granted in PR on
    the non-growing nodes, so this prevents them from continually re-granting
    the lock in PR, and forces the EX from gfs2_grow to go through.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

29 Feb, 2012

1 commit

  • The stats are divided into two sets: those relating to the
    super block and those relating to an individual glock. The
    super block stats are done on a per cpu basis in order to
    try and reduce the overhead of gathering them. They are also
    further divided by glock type.

    In the case of both the super block and glock statistics,
    the same information is gathered in each case. The super
    block statistics are used to provide default values for
    most of the glock statistics, so that newly created glocks
    should have, as far as possible, a sensible starting point.

    The statistics are divided into three pairs of mean and
    variance, plus two counters. The mean/variance pairs are
    smoothed exponential estimates and the algorithm used is
    one which will be very familiar to those used to calculation
    of round trip times in network code.

    The three pairs of mean/variance measure the following
    things:

    1. DLM lock time (non-blocking requests)
    2. DLM lock time (blocking requests)
    3. Inter-request time (again to the DLM)

    A non-blocking request is one which will complete right
    away, whatever the state of the DLM lock in question. That
    currently means any requests when (a) the current state of
    the lock is exclusive (b) the requested state is either null
    or unlocked or (c) the "try lock" flag is set. A blocking
    request covers all the other lock requests.

    There are two counters. The first is there primarily to show
    how many lock requests have been made, and thus how much data
    has gone into the mean/variance calculations. The other counter
    is counting queueing of holders at the top layer of the glock
    code. Hopefully that number will be a lot larger than the number
    of dlm lock requests issued.

    So why gather these statistics? There are several reasons
    we'd like to get a better idea of these timings:

    1. To be able to better set the glock "min hold time"
    2. To spot performance issues more easily
    3. To improve the algorithm for selecting resource groups for
    allocation (to base it on lock wait time, rather than blindly
    using a "try lock")
    Due to the smoothing action of the updates, a step change in
    some input quantity being sampled will only fully be taken
    into account after 8 samples (or 4 for the variance) and this
    needs to be carefully considered when interpreting the
    results.

    Knowing both the time it takes a lock request to complete and
    the average time between lock requests for a glock means we
    can compute the total percentage of the time for which the
    node is able to use a glock vs. time that the rest of the
    cluster has its share. That will be very useful when setting
    the lock min hold time.

    The other point to remember is that all times are in
    nanoseconds. Great care has been taken to ensure that we
    measure exactly the quantities that we want, as accurately
    as possible. There are always inaccuracies in any
    measuring system, but I hope this is as accurate as we
    can reasonably make it.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

11 Jan, 2012

1 commit

  • This new method of managing recovery is an alternative to
    the previous approach of using the userland gfs_controld.

    - use dlm slot numbers to assign journal id's
    - use dlm recovery callbacks to initiate journal recovery
    - use a dlm lock to determine the first node to mount fs
    - use a dlm lock to track journals that need recovery

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

04 Jan, 2012

1 commit

  • These new callbacks notify the dlm user about lock recovery.
    GFS2, and possibly others, need to be aware of when the dlm
    will be doing lock recovery for a failed lockspace member.

    In the past, this coordination has been done between dlm and
    file system daemons in userspace, which then direct their
    kernel counterparts. These callbacks allow the same
    coordination directly, and more simply.

    Signed-off-by: David Teigland

    David Teigland
     

09 Mar, 2011

1 commit

  • This patch fixes a race in deallocating glocks which was introduced
    in the RCU glock patch. We need to ensure that the glock count is
    kept correct even in the case that there is a race to add a new
    glock into the hash table. Also, to avoid having to wait for an
    RCU grace period, the glock counter can be decremented before
    call_rcu() is called.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

21 Jan, 2011

1 commit

  • This has a number of advantages:

    - Reduces contention on the hash table lock
    - Makes the code smaller and simpler
    - Should speed up glock dumps when under load
    - Removes ref count changing in examine_bucket
    - No longer need hash chain lock in glock_put() in common case

    There are some further changes which this enables and which
    we may do in the future. One is to look at using SLAB_RCU,
    and another is to look at using a per-cpu counter for the
    per-sb glock counter, since that is touched twice in the
    lifetime of each glock (but only used at umount time).

    Signed-off-by: Steven Whitehouse
    Cc: Paul E. McKenney

    Steven Whitehouse
     

30 Nov, 2010

2 commits

  • We can only merge the fields into a bitfield if the locking
    rules for them are the same. In this case gl_spin covers all
    of the fields (write side) but a couple of them are used
    with GLF_LOCK as the read side lock, which should be ok
    since we know that the field in question won't be changing
    at the time.

    The gl_req setting has to be done earlier (in glock.c) in order
    to place it under gl_spin. The gl_reply setting also has to be
    brought under gl_spin in order to comply with the new rules.

    This saves 4*sizeof(unsigned int) per glock.

    Signed-off-by: Steven Whitehouse
    Cc: Bob Peterson

    Steven Whitehouse
     
  • The DLM never returns -EAGAIN in response to dlm_lock(), and even
    if it did, the test in gdlm_lock() was wrong anyway. Once that
    test is removed, it is possible to greatly simplify this code
    by simply using a "normal" error return code (0 for success).

    We then no longer need the LM_OUT_ASYNC return code which can
    be removed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Sep, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

01 Mar, 2010

1 commit

  • Since the start of GFS2, an "extra" inode has been used to store
    the metadata belonging to each inode. The only reason for using
    this inode was to have an extra address space, the other fields
    were unused. This means that the memory usage was rather inefficient.

    The reason for keeping each inode's metadata in a separate address
    space is that when glocks are requested on remote nodes, we need to
    be able to efficiently locate the data and metadata which relating
    to that glock (inode) in order to sync or sync and invalidate it
    (depending on the remotely requested lock mode).

    This patch adds a new type of glock, which has in addition to
    its normal fields, has an address space. This applies to all
    inode and rgrp glocks (but to no other glock types which remain
    as before). As a result, we no longer need to have the second
    inode.

    This results in three major improvements:
    1. A saving of approx 25% of memory used in caching inodes
    2. A removal of the circular dependency between inodes and glocks
    3. No confusion between "normal" and "metadata" inodes in super.c

    Although the first of these is the more immediately apparent, the
    second is just as important as it now enables a number of clean
    ups at umount time. Those will be the subject of future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Feb, 2010

2 commits

  • Although all glocks are, by the time of the umount glock wait,
    scheduled for demotion, some of them haven't made it far
    enough through the process for the original set of waiting
    code to wait for them.

    This extends the ref count to the whole glock lifetime in order
    to ensure that the waiting does catch all glocks. It does make
    it a bit more invasive, but it seems the only sensible solution
    at the moment.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch adds a wait on umount between the point at which we
    dispose of all glocks and the point at which we unmount the
    lock protocol. This ensures that we've received all the replies
    to our unlock requests before we stop the locking.

    Signed-off-by: Steven Whitehouse
    Reported-by: Fabio M. Di Nitto

    Steven Whitehouse
     

24 Mar, 2009

2 commits

  • After calling out to the dlm, GFS2 sets the new state of a glock to
    gl_target in gdlm_ast(). However, gl_target is not always the lock
    state that was requested. If a conversion from shared to exclusive
    fails, finish_xmote() will call do_xmote() with LM_ST_UNLOCKED, instead
    of gl->gl_target, so that it can reacquire the lock in exlusive the next
    time around. In this case, setting the lock to gl_target in gdlm_ast()
    will make GFS2 think that it has the glock in exclusive mode, when
    really, it doesn't have the glock locked at all. This patch adds a new
    field to the gfs2_glock structure, gl_req, to track the mode that was
    requested.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse