02 Apr, 2014

2 commits

  • Pull block driver update from Jens Axboe:
    "On top of the core pull request, here's the pull request for the
    driver related changes for 3.15. It contains:

    - Improvements for msi-x registration for block drivers (mtip32xx,
    skd, cciss, nvme) from Alexander Gordeev.

    - A round of cleanups and improvements for drbd from Andreas
    Gruenbacher and Rashika Kheria.

    - A round of clanups and improvements for bcache from Kent.

    - Removal of sleep_on() and friends in DAC960, ataflop, swim3 from
    Arnd Bergmann.

    - Bug fix for a bug in the mtip32xx async completion code from Sam
    Bradshaw.

    - Bug fix for accidentally bouncing IO on 32-bit platforms with
    mtip32xx from Felipe Franciosi"

    * 'for-3.15/drivers' of git://git.kernel.dk/linux-block: (103 commits)
    bcache: remove nested function usage
    bcache: Kill bucket->gc_gen
    bcache: Kill unused freelist
    bcache: Rework btree cache reserve handling
    bcache: Kill btree_io_wq
    bcache: btree locking rework
    bcache: Fix a race when freeing btree nodes
    bcache: Add a real GC_MARK_RECLAIMABLE
    bcache: Add bch_keylist_init_single()
    bcache: Improve priority_stats
    bcache: Better alloc tracepoints
    bcache: Kill dead cgroup code
    bcache: stop moving_gc marking buckets that can't be moved.
    bcache: Fix moving_pred()
    bcache: Fix moving_gc deadlocking with a foreground write
    bcache: Fix discard granularity
    bcache: Fix another bug recovering from unclean shutdown
    bcache: Fix a bug recovering from unclean shutdown
    bcache: Fix a journalling reclaim after recovery bug
    bcache: Fix a null ptr deref in journal replay
    ...

    Linus Torvalds
     
  • Pull core block layer updates from Jens Axboe:
    "This is the pull request for the core block IO bits for the 3.15
    kernel. It's a smaller round this time, it contains:

    - Various little blk-mq fixes and additions from Christoph and
    myself.

    - Cleanup of the IPI usage from the block layer, and associated
    helper code. From Frederic Weisbecker and Jan Kara.

    - Duplicate code cleanup in bio-integrity from Gu Zheng. This will
    give you a merge conflict, but that should be easy to resolve.

    - blk-mq notify spinlock fix for RT from Mike Galbraith.

    - A blktrace partial accounting bug fix from Roman Pen.

    - Missing REQ_SYNC detection fix for blk-mq from Shaohua Li"

    * 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits)
    blk-mq: add REQ_SYNC early
    rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
    blk-mq: support partial I/O completions
    blk-mq: merge blk_mq_insert_request and blk_mq_run_request
    blk-mq: remove blk_mq_alloc_rq
    blk-mq: don't dump CPU -> hw queue map on driver load
    blk-mq: fix wrong usage of hctx->state vs hctx->flags
    blk-mq: allow blk_mq_init_commands() to return failure
    block: remove old blk_iopoll_enabled variable
    blktrace: fix accounting of partially completed requests
    smp: Rename __smp_call_function_single() to smp_call_function_single_async()
    smp: Remove wait argument from __smp_call_function_single()
    watchdog: Simplify a little the IPI call
    smp: Move __smp_call_function_single() below its safe version
    smp: Consolidate the various smp_call_function_single() declensions
    smp: Teach __smp_call_function_single() to check for offline cpus
    smp: Remove unused list_head from csd
    smp: Iterate functions through llist_for_each_entry_safe()
    block: Stop abusing rq->csd.list in blk-softirq
    block: Remove useless IPI struct initialization
    ...

    Linus Torvalds
     

01 Apr, 2014

1 commit

  • Pull workqueue changes from Tejun Heo:
    "PREPARE_[DELAYED_]WORK() were used to change the work function of work
    items without fully reinitializing it; however, this makes workqueue
    consider the work item as a different one from before and allows the
    work item to start executing before the previous instance is finished
    which can lead to extremely subtle issues which are painful to debug.

    The interface has never been popular. This pull request contains
    patches to remove existing usages and kill the interface. As one of
    the changes was routed during the last devel cycle and another
    depended on a pending change in nvme, for-3.15 contains a couple merge
    commits.

    In addition, interfaces which were deprecated quite a while ago -
    __cancel_delayed_work() and WQ_NON_REENTRANT - are removed too"

    * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: remove deprecated WQ_NON_REENTRANT
    workqueue: Spelling s/instensive/intensive/
    workqueue: remove PREPARE_[DELAYED_]WORK()
    staging/fwserial: don't use PREPARE_WORK
    afs: don't use PREPARE_WORK
    nvme: don't use PREPARE_WORK
    usb: don't use PREPARE_DELAYED_WORK
    floppy: don't use PREPARE_[DELAYED_]WORK
    ps3-vuart: don't use PREPARE_WORK
    wireless/rt2x00: don't use PREPARE_WORK in rt2800usb.c
    workqueue: Remove deprecated __cancel_delayed_work()

    Linus Torvalds
     

30 Mar, 2014

2 commits

  • Pull Ceph fix from Sage Weil:
    "This drops a bad assert that a few users have been hitting but we've
    only recently been able to track down"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: drop an unsafe assertion

    Linus Torvalds
     
  • Olivier Bonvalet reported having repeated crashes due to a failed
    assertion he was hitting in rbd_img_obj_callback():

    Assertion failure in rbd_img_obj_callback() at line 2165:
    rbd_assert(which >= img_request->next_completion);

    With a lot of help from Olivier with reproducing the problem
    we were able to determine the object and image requests had
    already been completed (and often freed) at the point the
    assertion failed.

    There was a great deal of discussion on the ceph-devel mailing list
    about this. The problem only arose when there were two (or more)
    object requests in an image request, and the problem was always
    seen when the second request was being completed.

    The problem is due to a race in the window between setting the
    "done" flag on an object request and checking the image request's
    next completion value. When the first object request completes, it
    checks to see if its successor request is marked "done", and if
    so, that request is also completed. In the process, the image
    request's next_completion value is updated to reflect that both
    the first and second requests are completed. By the time the
    second request is able to check the next_completion value, it
    has been set to a value *greater* than its own "which" value,
    which caused an assertion to fail.

    Fix this problem by skipping over any completion processing
    unless the completing object request is the next one expected.
    Test only for inequality (not >=), and eliminate the bad
    assertion.

    Tested-by: Olivier Bonvalet
    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil
    Reviewed-by: Ilya Dryomov

    Alex Elder
     

15 Mar, 2014

1 commit

  • If drivers do dynamic allocation in the hardware command init
    path, then we need to be able to handle and return failures.

    And if they do allocations or mappings in the init command path,
    then we need a cleanup function to free up that space at exit
    time. So add blk_mq_free_commands() as the cleanup function.

    This is required for the mtip32xx driver conversion to blk-mq.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

14 Mar, 2014

9 commits

  • This patch fixes 2 issues in the fast completion path:
    1) Possible double completions / double dma_unmap_sg() calls due to lack
    of atomicity in the check and subsequent dereference of the upper layer
    callback function. Fixed with cmpxchg before unmap and callback.
    2) Regression in unaligned IO constraining workaround for p420m devices.
    Fixed by checking if IO is unaligned and using proper semaphore if so.

    Signed-off-by: Sam Bradshaw
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Sam Bradshaw
     
  • If the buffers are unmapped after completing a request, then stale data
    might be in the request.

    Signed-off-by: Felipe Franciosi
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Felipe Franciosi
     
  • We need to set the queue bounce limit during the device initialization to
    prevent excessive bouncing on 32 bit architectures.

    Signed-off-by: Felipe Franciosi
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Felipe Franciosi
     
  • As result of deprecation of MSI-X/MSI enablement functions
    pci_enable_msix() and pci_enable_msi_block() all drivers
    using these two interfaces need to be updated to use the
    new pci_enable_msi_range() or pci_enable_msi_exact()
    and pci_enable_msix_range() or pci_enable_msix_exact()
    interfaces.

    Signed-off-by: Alexander Gordeev
    Cc: Keith Busch
    Cc: Matthew Wilcox
    Cc: Jens Axboe
    Cc: linux-nvme@lists.infradead.org
    Cc: linux-pci@vger.kernel.org
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Alexander Gordeev
     
  • Currently the driver falls back to INTx mode when MSI-X
    initialization failed. This is a suboptimal behaviour
    for chips that also support MSI. This update changes that
    behaviour and falls back to MSI mode in case MSI-X mode
    initialization failed.

    Signed-off-by: Alexander Gordeev
    Cc: Mike Miller
    Cc: iss_storagedev@hp.com
    Cc: Jens Axboe
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Jens Axboe

    Alexander Gordeev
     
  • interruptible_sleep_on is racy and going away. This replaces the one
    caller in the swim3 driver with the equivalent race-free
    wait_event_interruptible call. Since we're here already, this
    also fixes the case where we get interrupted from atomic context,
    which used to just spin in the loop.

    Signed-off-by: Arnd Bergmann
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • sleep_on() is inherently racy, and has been deprecated for a long time.
    This fixes two instances in the atari floppy driver:

    * fdc_wait/fdc_busy becomes an open-coded mutex. We cannot use the
    regular mutex since it gets released in interrupt context. The
    open-coded version using wait_event() and cmpxchg() is equivalent
    to the existing code but does the checks atomically, and we can
    now safely check the condition with irqs enabled.

    * format_wait becomes a completion, which is the natural structure
    here. The format ioctl waits for the background task to either
    complete or abort.

    This does not attempt to fix the preexisting bug of calling schedule
    with local interrupts disabled.

    Signed-off-by: Arnd Bergmann
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Michael Schmitz
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • sleep_on and its variants are going away. The use of sleep_on() in
    DAC960_V2_ExecuteUserCommand seems to be bogus because the command
    by the time we get there, the command has completed already and
    we just enter the timeout. Based on this interpretation, I concluded
    that we can replace it with a simple msleep(1000) and rearrange the
    code around it slightly.

    The interruptible_sleep_on_timeout in DAC960_gam_ioctl seems equivalent
    to the race-free version using wait_event_interruptible_timeout.
    I left the driver to return -EINTR rather than -ERESTARTSYS to preserve
    the timeout behavior.

    Signed-off-by: Arnd Bergmann
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • Commit "mtip32xx: Use pci_enable_msix_range() instead of
    pci_enable_msix()" was unnecessary, since pci_enable_msi()
    function is not deprecated and is still preferable for
    enabling the single MSI mode. This update reverts usage of
    pci_enable_msi() function.

    Besides, the changelog for that commit was bogus, since
    mtip32xx driver uses MSI interrupt, not MSI-X.

    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: linux-pci@vger.kernel.org

    Signed-off-by: Jens Axboe

    Alexander Gordeev
     

11 Mar, 2014

1 commit

  • mtip_pci_probe() dumps the current CPU when loaded, but it does
    so in a preemptible context. Hence smp_processor_id() correctly
    warns:

    BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/155
    caller is mtip_pci_probe+0x53/0x880 [mtip32xx]

    Switch to raw_smp_processor_id(), since it's just informational
    and persistent accuracy isn't important.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

08 Mar, 2014

1 commit

  • Pull block fixes from Jens Axboe:
    "Small collection of fixes for 3.14-rc. It contains:

    - Three minor update to blk-mq from Christoph.

    - Reduce number of unaligned (< 4kb) in-flight writes on mtip32xx to
    two. From Micron.

    - Make the blk-mq CPU notify spinlock raw, since it can't be a
    sleeper spinlock on RT. From Mike Galbraith.

    - Drop now bogus BUG_ON() for bio iteration with blk integrity. From
    Nic Bellinger.

    - Properly propagate the SYNC flag on requests. From Shaohua"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    blk-mq: add REQ_SYNC early
    rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
    bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter world
    blk-mq: support partial I/O completions
    blk-mq: merge blk_mq_insert_request and blk_mq_run_request
    blk-mq: remove blk_mq_alloc_rq
    mtip32xx: Reduce the number of unaligned writes to 2

    Linus Torvalds
     

07 Mar, 2014

2 commits

  • PREPARE_[DELAYED_]WORK() are being phased out. They have few users
    and a nasty surprise in terms of reentrancy guarantee as workqueue
    considers work items to be different if they don't have the same work
    function.

    nvme_dev->reset_work is multiplexed with multiple work functions.
    Introduce nvme_reset_workfn() which invokes nvme_dev->reset_workfn and
    always use it as the work function and update the users to set the
    ->reset_workfn field instead of overriding the work function using
    PREPARE_WORK().

    It would probably be best to route this with other related updates
    through the workqueue tree.

    Compile tested.

    Signed-off-by: Tejun Heo
    Cc: Matthew Wilcox
    Cc: linux-nvme@lists.infradead.org

    Tejun Heo
     
  • PREPARE_[DELAYED_]WORK() are being phased out. They have few users
    and a nasty surprise in terms of reentrancy guarantee as workqueue
    considers work items to be different if they don't have the same work
    function.

    floppy has been multiplexing floppy_work and fd_timer with multiple
    work functions. Introduce floppy_work_workfn() and fd_timer_workfn()
    which invoke floppy_work_fn and fd_timer_fn respectively and always
    use the two functions as the work functions and update the users to
    set floppy_work_fn and fd_timer_fn instead of overriding work
    functions using PREPARE_[DELAYED_]WORK().

    It would probably be best to route this with other related updates
    through the workqueue tree.

    Lightly tested using qemu.

    Signed-off-by: Tejun Heo
    Acked-by: Jiri Kosina

    Tejun Heo
     

04 Mar, 2014

2 commits

  • zram_meta_alloc could fail so caller should check it. Otherwise, your
    system will hang.

    Signed-off-by: Minchan Kim
    Acked-by: Jerome Marchand
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Commit bf6bddf1924e ("mm: introduce compaction and migration for
    ballooned pages") introduces page_count(page) into memory compaction
    which dereferences page->first_page if PageTail(page).

    This results in a very rare NULL pointer dereference on the
    aforementioned page_count(page). Indeed, anything that does
    compound_head(), including page_count() is susceptible to racing with
    prep_compound_page() and seeing a NULL or dangling page->first_page
    pointer.

    This patch uses Andrea's implementation of compound_trans_head() that
    deals with such a race and makes it the default compound_head()
    implementation. This includes a read memory barrier that ensures that
    if PageTail(head) is true that we return a head page that is neither
    NULL nor dangling. The patch then adds a store memory barrier to
    prep_compound_page() to ensure page->first_page is set.

    This is the safest way to ensure we see the head page that we are
    expecting, PageTail(page) is already in the unlikely() path and the
    memory barriers are unfortunately required.

    Hugetlbfs is the exception, we don't enforce a store memory barrier
    during init since no race is possible.

    Signed-off-by: David Rientjes
    Cc: Holger Kiehl
    Cc: Christoph Lameter
    Cc: Rafael Aquini
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: "Kirill A. Shutemov"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

22 Feb, 2014

7 commits

  • As result of deprecation of MSI-X/MSI enablement functions
    pci_enable_msix() and pci_enable_msi_block() all drivers
    using these two interfaces need to be updated to use the
    new pci_enable_msi_range() and pci_enable_msix_range()
    interfaces.

    Signed-off-by: Alexander Gordeev
    Cc: Jens Axboe
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Kyungmin Park
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Jens Axboe

    Alexander Gordeev
     
  • Signed-off-by: Alexander Gordeev
    Cc: Jens Axboe
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Kyungmin Park
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Jens Axboe

    Alexander Gordeev
     
  • When enabling MSI-X interrupts fails due to lack of memory
    the call to pci_disable_msix() is missed and the device is
    left with MSI-X interrupts enabled while the driver assumes
    otherwise. This update fixes the described misbehaviour and
    cleans up the code of skd_release_msix() function.

    Signed-off-by: Alexander Gordeev
    Cc: Jens Axboe
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Kyungmin Park
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Jens Axboe

    Alexander Gordeev
     
  • When enabling MSI-X, interrupts are requested for SKD_MAX_MSIX_COUNT
    entries in skdev->msix_entries array, while the number of actually
    allocated entries is skdev->msix_count. This might lead to an out of
    boundary access in case number of allocated entries is less than
    SKD_MAX_MSIX_COUNT. This update fixes the described misbehaviour.

    Signed-off-by: Alexander Gordeev
    Cc: Jens Axboe
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Kyungmin Park
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Jens Axboe

    Alexander Gordeev
     
  • As result of deprecation of MSI-X/MSI enablement functions
    pci_enable_msix() and pci_enable_msi_block() all drivers
    using these two interfaces need to be updated to use the
    new pci_enable_msi_range() and pci_enable_msix_range()
    interfaces.

    Signed-off-by: Alexander Gordeev
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Jens Axboe

    Alexander Gordeev
     
  • There is no need to call pci_disable_msi() in case
    the previous call to pci_enable_msi() failed

    Signed-off-by: Alexander Gordeev
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Jens Axboe

    Alexander Gordeev
     
  • Right now every resource has exactly one connection. But we are preparing
    for dynamic connections. I.e. in the future thre can be resources without
    connections.

    However smatch points this out as 'variable dereferenced before check',
    which is correct.

    This issue was introduced in
    drbd: get_one_status(): Iterate over resource->devices instead of connection->peer_devices

    Reported-by: Dan Carpenter
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Philipp Reisner
    Signed-off-by: Jens Axboe

    Andreas Gruenbacher
     

19 Feb, 2014

1 commit


17 Feb, 2014

11 commits