12 Sep, 2013

40 commits

  • Now that sem, msgque and shm, through *_down(), all use the lockless
    variant of ipcctl_pre_down(), go ahead and delete it.

    [akpm@linux-foundation.org: fix function name in kerneldoc, cleanups]
    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Instead of holding the ipc lock for the entire function, use the
    ipcctl_pre_down_nolock and only acquire the lock for specific commands:
    RMID and SET.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This is the third and final patchset that deals with reducing the amount
    of contention we impose on the ipc lock (kern_ipc_perm.lock). These
    changes mostly deal with shared memory, previous work has already been
    done for semaphores and message queues:

    http://lkml.org/lkml/2013/3/20/546 (sems)
    http://lkml.org/lkml/2013/5/15/584 (mqueues)

    With these patches applied, a custom shm microbenchmark stressing shmctl
    doing IPC_STAT with 4 threads a million times, reduces the execution
    time by 50%. A similar run, this time with IPC_SET, reduces the
    execution time from 3 mins and 35 secs to 27 seconds.

    Patches 1-8: replaces blindly taking the ipc lock for a smarter
    combination of rcu and ipc_obtain_object, only acquiring the spinlock
    when updating.

    Patch 9: renames the ids rw_mutex to rwsem, which is what it already was.

    Patch 10: is a trivial mqueue leftover cleanup

    Patch 11: adds a brief lock scheme description, requested by Andrew.

    This patch:

    Add shm_obtain_object() and shm_obtain_object_check(), which will allow us
    to get the ipc object without acquiring the lock. Just as with other
    forms of ipc, these functions are basically wrappers around
    ipc_obtain_object*().

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Command line option rootfstype=ramfs to obtain old initramfs behavior, and
    use ramfs instead of tmpfs for stub when root= defined (for cosmetic
    reasons).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • Conditionally call the appropriate fs_init function and fill_super
    functions. Add a use once guard to shmem_init() to simply succeed on a
    second call.

    (Note that IS_ENABLED() is a compile time constant so dead code
    elimination removes unused function calls when CONFIG_TMPFS is disabled.)

    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • When the rootfs code was a wrapper around ramfs, having them in the same
    file made sense. Now that it can wrap another filesystem type, move it in
    with the init code instead.

    This also allows a subsequent patch to access rootfstype= command line
    arg.

    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • Even though ramfs hasn't got a backing device, commit e0bf68ddec4f ("mm:
    bdi init hooks") added one anyway, and put the initialization in
    init_rootfs() since that's the first user, leaving it out of init_ramfs()
    to avoid duplication.

    But initmpfs uses init_tmpfs() instead, so move the init into the
    filesystem's init function, add a "once" guard to prevent duplicate
    initialization, and call the filesystem init from rootfs init.

    This goes part of the way to allowing ramfs to be built as a module.

    [akpm@linux-foundation.org; using bit 1 was odd]
    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • Mounting MS_NOUSER prevents --bind mounts from rootfs. Prevent new rootfs
    mounts with a different mechanism that doesn't affect bind mounts.

    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
    one such possible user), the following race can happen:

    radix_tree_preload()
    ...
    radix_tree_insert()
    radix_tree_node_alloc()
    if (rtp->nr) {
    ret = rtp->nodes[rtp->nr - 1];

    ...
    radix_tree_preload()
    ...
    radix_tree_insert()
    radix_tree_node_alloc()
    if (rtp->nr) {
    ret = rtp->nodes[rtp->nr - 1];

    And we give out one radix tree node twice. That clearly results in radix
    tree corruption with different results (usually OOPS) depending on which
    two users of radix tree race.

    We fix the problem by making radix_tree_node_alloc() always allocate fresh
    radix tree nodes when in interrupt. Using preloading when in interrupt
    doesn't make sense since all the allocations have to be atomic anyway and
    we cannot steal nodes from process-context users because some users rely
    on radix_tree_insert() succeeding after radix_tree_preload().
    in_interrupt() check is somewhat ugly but we cannot simply key off passed
    gfp_mask as that is acquired from root_gfp_mask() and thus the same for
    all preload users.

    Another part of the fix is to avoid node preallocation in
    radix_tree_preload() when passed gfp_mask doesn't allow waiting. Again,
    preallocation in such case doesn't make sense and when preallocation would
    happen in interrupt we could possibly leak some allocated nodes. However,
    some users of radix_tree_preload() require following radix_tree_insert()
    to succeed. To avoid unexpected effects for these users,
    radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
    and we provide a new function radix_tree_maybe_preload() for those users
    which get different gfp mask from different call sites and which are
    prepared to handle radix_tree_insert() failure.

    Signed-off-by: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • The driver core clears the driver data to NULL after device_release or on
    probe failure. Thus, it is not needed to manually clear the device driver
    data to NULL.

    Signed-off-by: Jingoo Han
    Cc: Evgeniy Polyakov
    Cc: Greg KH
    Acked-by: Shawn Guo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jingoo Han
     
  • The usage of strict_strtol() is not preferred, because strict_strtol() is
    obsolete. Thus, kstrtol() should be used.

    Signed-off-by: Jingoo Han
    Cc: Evgeniy Polyakov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jingoo Han
     
  • Based partially on MS standard spec quotes from Alex Dubov.

    As any code that works with user data this driver isn't recommended to use
    to write cards that contain valuable data.

    It tries its best though to avoid data corruption and possible damage to
    the card.

    Tested on MS DUO 64 MB card on Ricoh R592 card reader.

    Signed-off-by: Maxim Levitsky
    Cc: Valdis Kletnieks
    Cc: Jens Axboe
    Cc: Alex Dubov
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maxim Levitsky
     
  • The driver core clears the driver data to NULL after device_release or on
    probe failure. Thus, it is not needed to manually clear the device driver
    data to NULL.

    Signed-off-by: Jingoo Han
    Cc: Maxim Levitsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release or on
    probe failure. Thus, it is not needed to manually clear the device driver
    data to NULL.

    Signed-off-by: Jingoo Han
    Cc: Rodolfo Giometti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jingoo Han
     
  • Fix thinkos where pkt_ needs a valid pktcdvd_device * and the
    pointer is known to be NULL.

    Signed-off-by: Joe Perches
    Reported-by: Dan Carpenter (go smatch!)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Allow the device name to be emitted with pkt_err when logging the sense
    data.

    Signed-off-by: Joe Perches
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Add a new pkt_info macro to prefix the name to the logging output.

    Signed-off-by: Joe Perches
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Add a new pkt_notice macro to prefix the name to the logging output.

    Signed-off-by: Joe Perches
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Add a new pkt_err macro to prefix the name to the logging output. Convert
    pr_err where there is a non-null struct pktcdvd_device.

    Includes improvements from Andy Shevchenko.

    Signed-off-by: Joe Perches
    Cc: Andy Shevchenko
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Add pd->name to output for these debugging messages.

    Remove normally compiled out pkt_dbg(2, ...) function entry tracing
    equivalents as it's better done via the function tracer.

    Signed-off-by: Joe Perches
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Use the more common pkt_dbg(level, fmt, ...) form.

    These messages are emitted at KERN_NOTICE.

    Always emit function name with pkt_dbg(2, ...) uses and remove the
    sometimes abbreviated embedded function name.

    This form always verifies the format and arguments.

    Signed-off-by: Joe Perches
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Use a more current logging style and add messages levels to the logging
    messages.

    Simplify pkt_dump_sense by using %*ph and adding a simple function to emit
    the sense string.

    Includes improvements from Andy Shevchenko and Dan Carpenter.

    Signed-off-by: Joe Perches
    Cc: Andy Shevchenko
    Cc: Dan Carpenter
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Macros should be converted to functions where feasible to
    verify arguments and the like.

    Signed-off-by: Joe Perches
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Since the panic handlers may produce additional information (via printk)
    for the kernel log, it should be reported as part of the panic output
    saved by kmsg_dump(). Without this re-ordering, nothing that adds
    information to a panic will show up in pstore's view when kmsg_dump runs,
    and is therefore not visible to crash reporting tools that examine pstore
    output.

    Signed-off-by: Kees Cook
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Acked-by: Tony Luck
    Cc: Stephen Boyd
    Cc: Vikram Mulukutla
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • It seems pretty unlikely that AFFS supports files over 4GB but we may as
    well leave use loff_t just for cleanness sake instead of truncating it to
    32 bits.

    Signed-off-by: Dan Carpenter
    Cc: Marco Stornelli
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • When the example udev rules in the documentation are used without
    modification, warnings like the one shown below appear in the system logs:

    /var/log/messages:Aug 22 11:09:11 kung udevd[445]: NAME="%k" \
    is superfluous and breaks kernel supplied names, please remove \
    it from /etc/udev/rules.d/60-aoe.rules:26

    Removing the term does not cause any problems with the creation of the
    special character and block device nodes.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • If the system has trouble allocating memory for the creation of the aoe
    debugfs directory or of a file inside it, the debugfs member of an aoedev
    can be NULL.

    Do not treat a NULL debugfs pointer as a BUG on aoedev shutdown, avoiding
    the user impact of an unecessary panic.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • This patch fixes following compiler warnings:

    drivers/block/aoe/aoecmd.c: In function `aoecmd_ata_rw':
    drivers/block/aoe/aoecmd.c:383:17: warning: variable `t' set but not used [-Wunused-but-set-variable]
    struct aoetgt *t;
    ^
    drivers/block/aoe/aoecmd.c: In function `resend':
    drivers/block/aoe/aoecmd.c:488:21: warning: variable `ah' set but not used [-Wunused-but-set-variable]
    struct aoe_atahdr *ah;
    ^

    Signed-off-by: Andy Shevchenko
    Cc: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • In the kernel we have a nice helper that may be used here. This patch
    substitutes the custom implementation by the native function call.

    Signed-off-by: Andy Shevchenko
    Cc: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • This information is presented in a compact format that has evolved for
    easy routine scanning by expert humans, mostly developers and support
    technicians helping to troubleshoot or test AoE-based systems.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • The place holder in the file contents is filled out in the following
    patch.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • This series adds the debugging information that the coraid.com-distributed
    aoe driver exports via sysfs, but instead of sysfs, it uses debugfs.

    With these patches applied, even without AoE targets on the network, KEDR
    reports new possible memory leaks, but these are from callers outside the
    aoe driver that have used aoe_devnode to get the name of the character
    devices through the aoe_class->devnode callback, and I believe they're
    responsible for freeing that memory.

    This patch:

    Create and destroy the debugfs directory.

    Signed-off-by: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed Cashin
     
  • Signed-off-by: Cody P Schafer
    Reviewed-by: Seth Jennings
    Cc: David Woodhouse
    Cc: Rik van Riel
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • No reason require rbtree test code to be a module, allow it to be builtin
    (streamlines my development process)

    Signed-off-by: Cody P Schafer
    Reviewed-by: Seth Jennings
    Cc: David Woodhouse
    Cc: Rik van Riel
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • Just check that we examine all nodes in the tree for the postorder
    iteration.

    Signed-off-by: Cody P Schafer
    Reviewed-by: Seth Jennings
    Cc: David Woodhouse
    Cc: Rik van Riel
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • Because deletion (of the entire tree) is a relatively common use of the
    rbtree_postorder iteration, and because doing it safely means fiddling
    with temporary storage, provide a helper to simplify postorder rbtree
    iteration.

    Signed-off-by: Cody P Schafer
    Reviewed-by: Seth Jennings
    Cc: David Woodhouse
    Cc: Rik van Riel
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • Postorder iteration yields all of a node's children prior to yielding the
    node itself, and this particular implementation also avoids examining the
    leaf links in a node after that node has been yielded.

    In what I expect will be its most common usage, postorder iteration allows
    the deletion of every node in an rbtree without modifying the rbtree nodes
    (no _requirement_ that they be nulled) while avoiding referencing child
    nodes after they have been "deleted" (most commonly, freed).

    I have only updated zswap to use this functionality at this point, but
    numerous bits of code (most notably in the filesystem drivers) use a hand
    rolled postorder iteration that NULLs child links as it traverses the
    tree. Each of those instances could be replaced with this common
    implementation.

    1 & 2 add rbtree postorder iteration functions.
    3 adds testing of the iteration to the rbtree runtime tests
    4 allows building the rbtree runtime tests as builtins
    5 updates zswap.

    This patch:

    Add postorder iteration functions for rbtree. These are useful for safely
    freeing an entire rbtree without modifying the tree at all.

    Signed-off-by: Cody P Schafer
    Reviewed-by: Seth Jennings
    Cc: David Woodhouse
    Cc: Rik van Riel
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer