09 Dec, 2011

1 commit

  • The delaylog mode has been the default for a long time, and the nodelaylog
    option has been scheduled for removal in Linux 3.3. Remove it and code
    only used by it now that we have opened the 3.3 window.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

07 Dec, 2011

1 commit

  • Apply the scheme used in log_regrant_write_log_space to wake up any other
    threads waiting for log space before the newly added one to
    log_regrant_write_log_space as well, and factor the code into readable
    helpers. For each of the queues we have add two helpers:

    - one to try to wake up all waiting threads. This helper will also be
    usable by xfs_log_move_tail once we remove the current opportunistic
    wakeups in it.
    - one to sleep on t_wait until enough log space is available, loosely
    modelled after Linux waitqueues.

    And use them to reimplement the guts of log_regrant_write_log_space and
    log_regrant_write_log_space. These two function now use one and the same
    algorithm for waiting on log space instead of subtly different ones before,
    with an option to completely unify them in the near future.

    Also move the filesystem shutdown handling to the common caller given
    that we had to touch it anyway.

    Based on hard debugging and an earlier patch from
    Chandra Seetharaman .

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Chandra Seetharaman
    Tested-by: Chandra Seetharaman
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

09 Nov, 2011

1 commit

  • The log item ops aren't nessecarily the biggest exploit vector, but marking
    them const is easy enough. Also remove the unused xfs_item_ops_t typedef
    while we're at it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Alex Elder

    Christoph Hellwig
     

12 Oct, 2011

3 commits

  • Instead of passing the block number and mount structure explicitly
    get them off the bp and fix make the argument order more natural.

    Also move it to xfs_buf.c and stop printing the device name given
    that we already get the fs name as part of xfs_alert, and we know
    what device is operates on because of the caller that gets printed,
    finally rename it to xfs_buf_ioerror_alert and pass __func__ as
    argument where it makes sense.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Change _xfs_buf_initialize to allocate the buffer directly and rename it to
    xfs_buf_alloc now that is the only buffer allocation routine. Also remove
    the xfs_buf_deallocate wrapper around the kmem_zone_free calls for buffers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

26 Jul, 2011

4 commits


13 Jul, 2011

3 commits


08 Jul, 2011

5 commits

  • There is no need for a pre-flush when doing writing the second part of a
    split log buffer, and if we are using an external log there is no need
    to do a full cache flush of the log device at all given that all writes
    to it use the FUA flag.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER
    to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and
    XBF_FLUSH to allow more fine grained control over the bio flags. Also
    cleanup processing of the flags in _xfs_buf_ioapply to make more sense,
    and renumber the sparse flag number space to group flags by purpose.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • All other xfs_buf_get/read-like helpers return the buffer locked, make sure
    xfs_buf_get_uncached isn't different for no reason. Half of the callers
    already lock it directly after, and the others probably should also keep
    it locked if only for consistency and beeing able to use xfs_buf_rele,
    but I'll leave that for later.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Rename xfs_buf_cond_lock and reverse it's return value to fit most other
    trylock operations in the Kernel and XFS (with the exception of down_trylock,
    after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val
    with an xfs_buf_islocked for use in asserts, or and opencoded variant in
    tracing. remove the XFS_BUF_* wrappers for all the locking helpers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Micro-optimize various comparisms by always byteswapping the constant
    instead of the variable, which allows to do the swap at compile instead
    of runtime.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

16 Jun, 2011

1 commit

  • There's no reason not to support cache flushing on external log devices.
    The only thing this really requires is flushing the data device first
    both in fsync and log commits. A side effect is that we also have to
    remove the barrier write test during mount, which has been superflous
    since the new FLUSH+FUA code anyway. Also use the chance to flush the
    RT subvolume write cache before the fsync commit, which is required
    for correct semantics.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

20 May, 2011

1 commit

  • When we free a vmapped buffer, we need to ensure the vmap address
    and length we free is the same as when it was allocated. In various
    places in the log code we change the memory the buffer is pointing
    to before issuing IO, but we never reset the buffer to point back to
    it's original memory (or no memory, if that is the case for the
    buffer).

    As a result, when we free the buffer it points to memory that is
    owned by something else and attempts to unmap and free it. Because
    the range does not match any known mapped range, it can trigger
    BUG_ON() traps in the vmap code, and potentially corrupt the vmap
    area tracking.

    Fix this by always resetting these buffers to their original state
    before freeing them.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

29 Apr, 2011

1 commit

  • Update the extent tree in case we have to reuse a busy extent, so that it
    always is kept uptodate. This is done by replacing the busy list searches
    with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
    in case of a reuse. This allows us to allow reusing metadata extents
    unconditionally, and thus avoid log forces especially for allocation btree
    blocks.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

08 Apr, 2011

2 commits

  • On the Power platform, the log tail debug checks fire excessively
    causing the system to panic early in testing. The debug checks are
    known to be racy, though on x86_64 there is no evidence that they
    trigger at all.

    We want to keep the checks active on debug systems to alert us to
    problems with log space accounting, but we need to reduce the impact
    of a racy check on testing on the Power platform.

    As a result, convert the ASSERT conditions to warnings, and
    allow them to fire only once per filesystem mount. This will prevent
    false positives from interfering with testing, whilst still
    providing us with the indication that they may be a problem with log
    space accounting should that occur.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Dave Chinner
     
  • When we are short on memory, we want to expedite the cleaning of
    dirty objects. Hence when we run short on memory, we need to kick
    the AIL flushing into action to clean as many dirty objects as
    quickly as possible. To implement this, sample the lsn of the log
    item at the head of the AIL and use that as the push target for the
    AIL flush.

    Further, we keep items in the AIL that are dirty that are not
    tracked any other way, so we can get objects sitting in the AIL that
    don't get written back until the AIL is pushed. Hence to get the
    filesystem to the idle state, we might need to push the AIL to flush
    out any remaining dirty objects sitting in the AIL. This requires
    the same push mechanism as the reclaim push.

    This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
    match the new xfs_ail_max_lsn() function introduced in this patch.
    Similarly for xfs_trans_ail_push -> xfs_ail_push.

    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder

    Dave Chinner
     

07 Mar, 2011

1 commit

  • Convert the xfs log operations to use the new error logging
    interfaces. This removes the xlog_{warn,panic} wrappers and makes
    almost all errors emit the device they belong to instead of just
    refering to "XFS".

    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

12 Jan, 2011

1 commit

  • We currently have a global error message buffer in cmn_err that is
    protected by a spin lock that disables interrupts. Recently there
    have been reports of NMI timeouts occurring when the console is
    being flooded by SCSI error reports due to cmn_err() getting stuck
    trying to print to the console while holding this lock (i.e. with
    interrupts disabled). The NMI watchdog is seeing this CPU as
    non-responding and so is triggering a panic. While the trigger for
    the reported case is SCSI errors, pretty much anything that spams
    the kernel log could cause this to occur.

    Realistically the only reason that we have the intemediate message
    buffer is to prepend the correct kernel log level prefix to the log
    message. The only reason we have the lock is to protect the global
    message buffer and the only reason the message buffer is global is
    to keep it off the stack. Hence if we can avoid needing a global
    message buffer we avoid needing the lock, and we can do this with a
    small amount of cleanup and some preprocessor tricks:

    1. clean up xfs_cmn_err() panic mask functionality to avoid
    needing debug code in xfs_cmn_err()
    2. remove the couple of "!" message prefixes that still exist that
    the existing cmn_err() code steps over.
    3. redefine CE_* levels directly to KERN_*
    4. redefine cmn_err() and friends to use printk() directly
    via variable argument length macros.

    By doing this, we can completely remove the cmn_err() code and the
    lock that is causing the problems, and rely solely on printk()
    serialisation to ensure that we don't get garbled messages.

    A series of followup patches is really needed to clean up all the
    cmn_err() calls and related messages properly, but that results in a
    series that is not easily back portable to enterprise kernels. Hence
    this initial fix is only to address the direct problem in the lowest
    impact way possible.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

21 Dec, 2010

9 commits

  • The only thing that the grant lock remains to protect is the grant head
    manipulations when adding or removing space from the log. These calculations
    are already based on atomic variables, so we can already update them safely
    without locks. However, the grant head manpulations require atomic multi-step
    calculations to be executed, which the algorithms currently don't allow.

    To make these multi-step calculations atomic, convert the algorithms to
    compare-and-exchange loops on the atomic variables. That is, we sample the old
    value, perform the calculation and use atomic64_cmpxchg() to attempt to update
    the head with the new value. If the head has not changed since we sampled it,
    it will succeed and we are done. Otherwise, we rerun the calculation again from
    a new sample of the head.

    This allows us to remove the grant lock from around all the grant head space
    manipulations, and that effectively removes the grant lock from the log
    completely. Hence we can remove the grant lock completely from the log at this
    point.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • The log grant ticket wait queues are currently protected by the log
    grant lock. However, the queues are functionally independent from
    each other, and operations on them only require serialisation
    against other queue operations now that all of the other log
    variables they use are atomic values.

    Hence, we can make them independent of the grant lock by introducing
    new locks just to protect the lists operations. because the lists
    are independent, we can use a lock per list and ensure that reserve
    and write head queuing do not contend.

    To ensure forced shutdowns work correctly in conjunction with the
    new fast paths, ensure that we check whether the log has been shut
    down in the grant functions once we hold the relevant spin locks but
    before we go to sleep. This is needed to co-ordinate correctly with
    the wakeups that are issued on the ticket queues so we don't leave
    any processes sleeping on the queues during a shutdown.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • log->l_tail_lsn is currently protected by the log grant lock. The
    lock is only needed for serialising readers against writers, so we
    don't really need the lock if we make the l_tail_lsn variable an
    atomic. Converting the l_tail_lsn variable to an atomic64_t means we
    can start to peel back the grant lock from various operations.

    Also, provide functions to safely crack an atomic LSN variable into
    it's component pieces and to recombined the components into an
    atomic variable. Use them where appropriate.

    This also removes the need for explicitly holding a spinlock to read
    the l_tail_lsn on 32 bit platforms.

    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • The xlog_grant_push_ail() currently takes the grant lock internally to sample
    the tail lsn, last sync lsn and the reserve grant head. Most of the callers
    already hold the grant lock but have to drop it before calling
    xlog_grant_push_ail(). This is a left over from when the AIL tail pushing was
    done in line and hence xlog_grant_push_ail had to drop the grant lock. AIL push
    is now done in another thread and hence we can safely hold the grant lock over
    the entire xlog_grant_push_ail call.

    Push the grant lock outside of xlog_grant_push_ail() to simplify the locking
    and synchronisation needed for tail pushing. This will reduce traffic on the
    grant lock by itself, but this is only one step in preparing for the complete
    removal of the grant lock.

    While there, clean up the formatting of xlog_grant_push_ail() to match the
    rest of the XFS code.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • The log grant queues are one of the few places left using sv_t
    constructs for waiting. Given we are touching this code, we should
    convert them to plain wait queues. While there, convert all the
    other sv_t users in the log code as well.

    Seeing as this removes the last users of the sv_t type, remove the
    header file defining the wrapper and the fragments that still
    reference it.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • Prepare for switching the grant heads to atomic variables by
    combining the two 32 bit values that make up the grant head into a
    single 64 bit variable. Provide wrapper functions to combine and
    split the grant heads appropriately for calculations and use them as
    necessary.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • The log grant space calculations are repeated for both write and
    reserve grant heads. To make it simpler to convert the calculations
    toa different algorithm, factor them so both the gratn heads use the
    same calculation functions. Once this is done we can drop the
    wrappers that are used in only a couple of place to update both
    grant heads at once as they don't provide any particular value.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • Factor repeated debug code out of grant head manipulation functions into a
    separate function. This removes ifdef DEBUG spagetti from the code and makes
    the code easier to follow.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • The grant write and reserve queues use a roll-your-own double linked
    list, so convert it to a standard list_head structure and convert
    all the list traversals to use list_for_each_entry(). We can also
    get rid of the XLOG_TIC_IN_Q flag as we can use the list_empty()
    check to tell if the ticket is in a list or not.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

03 Dec, 2010

2 commits

  • Convert the log grant heads to atomic64_t types in preparation for
    converting the accounting algorithms to atomic operations. his patch
    just converts the variables; the algorithmic changes are in a
    separate patch for clarity.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • log->l_last_sync_lsn is updated in only one critical spot - log
    buffer Io completion - and is protected by the grant lock here. This
    requires the grant lock to be taken for every log buffer IO
    completion. Converting the l_last_sync_lsn variable to an atomic64_t
    means that we do not need to take the grant lock in log buffer IO
    completion to update it.

    This also removes the need for explicitly holding a spinlock to read
    the l_last_sync_lsn on 32 bit platforms.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

23 Oct, 2010

1 commit

  • * 'for-linus' of git://oss.sgi.com/xfs/xfs: (36 commits)
    xfs: semaphore cleanup
    xfs: Extend project quotas to support 32bit project ids
    xfs: remove xfs_buf wrappers
    xfs: remove xfs_cred.h
    xfs: remove xfs_globals.h
    xfs: remove xfs_version.h
    xfs: remove xfs_refcache.h
    xfs: fix the xfs_trans_committed
    xfs: remove unused t_callback field in struct xfs_trans
    xfs: fix bogus m_maxagi check in xfs_iget
    xfs: do not use xfs_mod_incore_sb_batch for per-cpu counters
    xfs: do not use xfs_mod_incore_sb for per-cpu counters
    xfs: remove XFS_MOUNT_NO_PERCPU_SB
    xfs: pack xfs_buf structure more tightly
    xfs: convert buffer cache hash to rbtree
    xfs: serialise inode reclaim within an AG
    xfs: batch inode reclaim lookup
    xfs: implement batched inode lookups for AG walking
    xfs: split out inode walk inode grabbing
    xfs: split inode AG walking into separate code for reclaim
    ...

    Linus Torvalds
     

19 Oct, 2010

3 commits

  • Conflicts:
    block/blk-core.c
    drivers/block/loop.c
    mm/swapfile.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Stop having two different names for many buffer functions and use
    the more descriptive xfs_buf_* names directly.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • xfs_buf_get_nodaddr() is really used to allocate a buffer that is
    uncached. While it is not directly assigned a disk address, the fact
    that they are not cached is a more important distinction. With the
    upcoming uncached buffer read primitive, we should be consistent
    with this disctinction.

    While there, make page allocation in xfs_buf_get_nodaddr() safe
    against memory reclaim re-entrancy into the filesystem by allowing
    a flags parameter to be passed.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Dave Chinner