13 Apr, 2016

28 commits

  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • The driver calls it with 0 for flags, since it doesn't have a writeback
    cache. Just remove the call, as it's a no-op right now.

    Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Switch to the newer interface, instead of using blk_queue_flush()
    directly.

    Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig
    Acked-by: Martin K. Petersen

    Jens Axboe
     
  • Jens Axboe
     
  • Add an internal helper and flag for setting whether a queue has
    write back caching, or write through (or none). Add a sysfs file
    to show this as well, and make it changeable from user space.

    This will replace the (awkward) blk_queue_flush() interface that
    drivers currently use to inform the block layer of write cache state
    and capabilities.

    Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • No caller outside the blk-mq code so we can settle
    with it static.

    Signed-off-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • Only a single tags array anyway.

    Signed-off-by: Keith Busch
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • blk-mq offers a tagset iterator so let's use that
    instead of using nvme_clear_queues.

    Note, we changed nvme_queue_cancel_ios name to nvme_cancel_io
    as there is no concept of a queue now in this function (we
    also lost the print).

    Signed-off-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Acked-by: Keith Busch
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • If the controller is degraded, the driver should stay out of the way so
    the user can recover the drive. This patch skips driver initiated async
    event requests when the drive is in this state.

    Signed-off-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • This moves nvme_setup_{flush,discard,rw} calls into a common
    nvme_setup_cmd() helper. So we can eventually hide all the command
    setup in the core module and don't even need to update the fabrics
    drivers for any specific command type.

    Signed-off-by: Ming Lin
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Ming Lin
     
  • This rewrites nvme_setup_discard() with blk_add_request_payload().
    It allocates only the necessary amount(16 bytes) for the payload.

    Signed-off-by: Ming Lin
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Ming Lin
     
  • The helper returns the number of bytes that need to be mapped
    using PRPs/SGL entries.

    Signed-off-by: Ming Lin
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Ming Lin
     
  • When unloading driver, nvme_disable_io_queues() calls nvme_delete_queue()
    that sends nvme_admin_delete_cq command to admin sq. So when the command
    completed, the lock acquired by nvme_irq() actually belongs to admin queue.

    While the lock that nvme_del_cq_end() trying to acquire belongs to io queue.
    So it will not deadlock.

    This patch adds lock nesting notation to fix following report.

    [ 109.840952] =============================================
    [ 109.846379] [ INFO: possible recursive locking detected ]
    [ 109.851806] 4.5.0+ #180 Tainted: G E
    [ 109.856533] ---------------------------------------------
    [ 109.861958] swapper/0/0 is trying to acquire lock:
    [ 109.866771] (&(&nvmeq->q_lock)->rlock){-.....}, at: [] nvme_del_cq_end+0x26/0x70 [nvme]
    [ 109.876535]
    [ 109.876535] but task is already holding lock:
    [ 109.882398] (&(&nvmeq->q_lock)->rlock){-.....}, at: [] nvme_irq+0x1b/0x50 [nvme]
    [ 109.891547]
    [ 109.891547] other info that might help us debug this:
    [ 109.898107] Possible unsafe locking scenario:
    [ 109.898107]
    [ 109.904056] CPU0
    [ 109.906515] ----
    [ 109.908974] lock(&(&nvmeq->q_lock)->rlock);
    [ 109.913381] lock(&(&nvmeq->q_lock)->rlock);
    [ 109.917787]
    [ 109.917787] *** DEADLOCK ***
    [ 109.917787]
    [ 109.923738] May be due to missing lock nesting notation
    [ 109.923738]
    [ 109.930558] 1 lock held by swapper/0/0:
    [ 109.934413] #0: (&(&nvmeq->q_lock)->rlock){-.....}, at: [] nvme_irq+0x1b/0x50 [nvme]
    [ 109.944010]
    [ 109.944010] stack backtrace:
    [ 109.948389] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 4.5.0+ #180
    [ 109.955734] Hardware name: Dell Inc. OptiPlex 7010/0YXT71, BIOS A15 08/12/2013
    [ 109.962989] 0000000000000000 ffff88011e203c38 ffffffff81383d9c ffffffff81c13540
    [ 109.970478] ffffffff826711d0 ffff88011e203ce8 ffffffff810bb429 0000000000000046
    [ 109.977964] 0000000000000046 0000000000000000 0000000000b2e597 ffffffff81f4cb00
    [ 109.985453] Call Trace:
    [ 109.987911] [] dump_stack+0x85/0xc9
    [ 109.993711] [] __lock_acquire+0x19b9/0x1c60
    [ 109.999575] [] ? trace_hardirqs_off+0xd/0x10
    [ 110.005524] [] ? complete+0x3d/0x50
    [ 110.010688] [] lock_acquire+0x90/0xf0
    [ 110.016029] [] ? nvme_del_cq_end+0x26/0x70 [nvme]
    [ 110.022418] [] _raw_spin_lock_irqsave+0x4b/0x60
    [ 110.028632] [] ? nvme_del_cq_end+0x26/0x70 [nvme]
    [ 110.035019] [] nvme_del_cq_end+0x26/0x70 [nvme]
    [ 110.041232] [] blk_mq_end_request+0x35/0x60
    [ 110.047095] [] nvme_complete_rq+0x68/0x190 [nvme]
    [ 110.053481] [] __blk_mq_complete_request+0x8f/0x130
    [ 110.060043] [] blk_mq_complete_request+0x31/0x40
    [ 110.066343] [] __nvme_process_cq+0x83/0x240 [nvme]
    [ 110.072818] [] nvme_irq+0x25/0x50 [nvme]
    [ 110.078419] [] handle_irq_event_percpu+0x36/0x110
    [ 110.084804] [] handle_irq_event+0x37/0x60
    [ 110.090491] [] handle_edge_irq+0x93/0x150
    [ 110.096180] [] handle_irq+0xa6/0x130
    [ 110.101431] [] do_IRQ+0x5e/0x120
    [ 110.106333] [] common_interrupt+0x8c/0x8c

    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Ming Lin
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Ming Lin
     
  • Multiple users have reported device initialization failure due the driver
    not receiving legacy PCI interrupts. This is not unique to any particular
    controller, but has been observed on multiple platforms.

    There have been no issues reported or observed when with message signaled
    interrupts, so this patch attempts to use MSI-x during initialization,
    falling back to MSI. If that fails, legacy would become the default.

    The setup_io_queues error handling had to change as a result: the admin
    queue's msix_entry used to be initialized to the legacy IRQ. The case
    where nr_io_queues is 0 would fail request_irq when setting up the admin
    queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving
    the admin queue's msix_entry invalid. Instead, return success immediately.

    Reported-by: Tim Muhlemmer
    Reported-by: Jon Derrick
    Signed-off-by: Keith Busch
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • Its useful to iterate on all the active tags in cases
    where we will need to fail all the queues IO.

    Signed-off-by: Sagi Grimberg
    [hch: carefully check for valid tagsets]
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • We could kmalloc() the payload, so need the offset in page.

    Signed-off-by: Ming Lin
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Ming Lin
     
  • Commit 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use
    wb_domain aware operations") unintentionally changed this function's
    meaning from "are there more dirty pages than the background writeback
    threshold" to "are there more dirty pages than the writeback threshold".
    The background writeback threshold is typically half of the writeback
    threshold, so this had the effect of raising the number of dirty pages
    required to cause a writeback worker to perform background writeout.

    This can cause a very severe performance regression when a BDI uses
    BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker
    can now disagree on whether writeback should be initiated.

    For example, in a system having 1GB of RAM, a single spinning disk, and
    a "pass-through" FUSE filesystem mounted over the disk, application code
    mmapped a 128MB file on the disk and was randomly dirtying pages in that
    mapping.

    Because FUSE uses strictlimit and has a default max_ratio of only 1%,
    in balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the
    dirty_freerun_ceiling is the average of those, ~150. So, it pauses the
    dirtying processes when we have 151 dirty pages and wakes up a
    background writeback worker. But the worker tests the wrong threshold
    (200 instead of 100), so it does not initiate writeback and just
    returns.

    Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
    worker who will do nothing. It remains stuck in this state until the few
    dirty pages that we have finally expire and we write them back for that
    reason. Then the whole process repeats, resulting in near-zero
    throughput through the FUSE BDI.

    The fix is to call the parameterized variant of wb_calc_thresh, so that
    the worker will do writeback if the bg_thresh is exceeded which was the
    bahavior before the referenced commit.

    Fixes: 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use
    wb_domain aware operations")
    Signed-off-by: Howard Cochran
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Howard Cochran
     

11 Apr, 2016

5 commits

  • Linus Torvalds
     
  • Pull ARM fixes from Russell King:
    "A couple of small fixes, and wiring up the new syscalls which appeared
    during the merge window"

    * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
    ARM: 8550/1: protect idiv patching against undefined gcc behavior
    ARM: wire up preadv2 and pwritev2 syscalls
    ARM: SMP enable of cache maintanence broadcast

    Linus Torvalds
     
  • Pull MMC fixes from Ulf Hansson:
    "Here are a couple of mmc fixes intended for v4.6 rc3:

    MMC host:
    - sdhci: Fix regression setting power on Trats2 board
    - sdhci-pci: Add support and PCI IDs for more Broxton host controllers"

    * tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc:
    mmc: sdhci-pci: Add support and PCI IDs for more Broxton host controllers
    mmc: sdhci: Fix regression setting power on Trats2 board

    Linus Torvalds
     
  • Pull i2c fixes from Wolfram Sang:
    "Some bugfixes from I2C:

    - fix a uevent triggered boot problem by removing a useless debug
    print

    - fix sysfs-attributes of the new i2c-demux-pinctrl driver to follow
    standard kernel behaviour

    - fix a potential division-by-zero error (needed two takes)"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: jz4780: really prevent potential division by zero
    Revert "i2c: jz4780: prevent potential division by zero"
    i2c: jz4780: prevent potential division by zero
    i2c: mux: demux-pinctrl: Update docs to new sysfs-attributes
    i2c: mux: demux-pinctrl: Clean up sysfs attributes
    i2c: prevent endless uevent loop with CONFIG_I2C_DEBUG_CORE

    Linus Torvalds
     
  • This reverts commit 1028b55bafb7611dda1d8fed2aeca16a436b7dff.

    It's broken: it makes ext4 return an error at an invalid point, causing
    the readdir wrappers to write the the position of the last successful
    directory entry into the position field, which means that the next
    readdir will now return that last successful entry _again_.

    You can only return fatal errors (that terminate the readdir directory
    walk) from within the filesystem readdir functions, the "normal" errors
    (that happen when the readdir buffer fills up, for example) happen in
    the iterorator where we know the position of the actual failing entry.

    I do have a very different patch that does the "signal_pending()"
    handling inside the iterator function where it is allowable, but while
    that one passes all the sanity checks, I screwed up something like four
    times while emailing it out, so I'm not going to commit it today.

    So my track record is not good enough, and the stars will have to align
    better before that one gets committed. And it would be good to get some
    review too, of course, since celestial alignments are always an iffy
    debugging model.

    IOW, let's just revert the commit that caused the problem for now.

    Reported-by: Greg Thelen
    Cc: Theodore Ts'o
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Apr, 2016

7 commits

  • Pull parisc fixes from Helge Deller:
    "Since commit 0de798584bde ("parisc: Use generic extable search and
    sort routines") module loading is boken on parisc, because the parisc
    module loader wasn't prepared for the new R_PARISC_PCREL32 relocations.

    In addition, due to that breakage, Mikulas Patocka noticed that
    handling exceptions from modules probably never worked on parisc. It
    was just masked by the fact that exceptions from modules don't happen
    during normal use.

    This patch series fixes those issues and survives the tests of the
    lib/test_user_copy kernel module test. Some patches are tagged for
    stable"

    * 'parisc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Update comment regarding relative extable support
    parisc: Unbreak handling exceptions from kernel modules
    parisc: Fix kernel crash with reversed copy_from_user()
    parisc: Avoid function pointers for kernel exception routines
    parisc: Handle R_PARISC_PCREL32 relocations in kernel modules

    Linus Torvalds
     
  • Pull libnvdimm fixes from Dan Williams:
    "Three fixes, the first two are tagged for -stable:

    - The ndctl utility/library gained expanded unit tests illuminating a
    long standing bug in the libnvdimm SMART data retrieval
    implementation.

    It has been broken since its initial implementation, now fixed.

    - Another one line fix for the detection of stale info blocks.

    Without this change userspace can get into a situation where it is
    unable to reconfigure a namespace.

    - Fix the badblock initialization path in the presence of the new (in
    v4.6-rc1) section alignment workarounds.

    Without this change badblocks will be reported at the wrong offset.

    These have received a build success report from the kbuild robot and
    have appeared in -next with no reported issues"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment
    libnvdimm, pfn: fix uuid validation
    libnvdimm: fix smart data retrieval

    Linus Torvalds
     
  • Pull GPIO fixes from Linus Walleij:
    "Here is a set of four GPIO fixes. The two fixes to the core are
    serious as they are regressing minor architectures.

    Core fixes:

    - Defer GPIO device setup until after gpiolib is initialized.

    It turns out that a few very tightly integrated GPIO platform
    drivers initialize so early (befor core_initcall()) so that the
    gpiolib isn't even initialized itself. That limits what the
    library can do, and we cannot reference uninitialized fields until
    later.

    Defer some of the initialization until right after the gpiolib is
    initialized in these (rare) cases.

    - As a consequence: do not use devm_* resources when allocating the
    states in the initial set-up of the gpiochip.

    Driver fixes:

    - In ACPI retrieveal: ignore GpioInt when looking for output GPIOs.

    - Fix legacy builds on the PXA without a backing pin controller.

    - Use correct datatype on pca953x register writes"

    * tag 'gpio-v4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
    gpio: pca953x: Use correct u16 value for register word write
    gpiolib: Defer gpio device setup until after gpiolib initialization
    gpiolib: Do not use devm functions when registering gpio chip
    gpio: pxa: fix legacy non pinctrl aware builds
    gpio / ACPI: ignore GpioInt() GPIOs when requesting GPIO_OUT_*

    Linus Torvalds
     
  • Pull tty fixes from Greg KH:
    "Here are two tty fixes for issues found.

    One was due to a merge error in 4.6-rc1, and the other a regression
    fix for UML consoles that broke in 4.6-rc1.

    Both have been in linux-next for a while"

    * tag 'tty-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
    tty: Fix merge of "tty: Refactor tty_open()"
    tty: Fix UML console breakage

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are some USB fixes and new device ids for 4.6-rc3.

    Nothing major, the normal USB gadget fixes and usb-serial driver ids,
    along with some other fixes mixed in. All except the USB serial ids
    have been tested in linux-next, the id additions should be fine as
    they are 'trivial'"

    * tag 'usb-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
    USB: option: add "D-Link DWM-221 B1" device id
    USB: serial: cp210x: Adding GE Healthcare Device ID
    USB: serial: ftdi_sio: Add support for ICP DAS I-756xU devices
    usb: dwc3: keystone: drop dma_mask configuration
    usb: gadget: udc-core: remove manual dma configuration
    usb: dwc3: pci: add ID for one more Intel Broxton platform
    usb: renesas_usbhs: fix to avoid using a disabled ep in usbhsg_queue_done()
    usb: dwc2: do not override forced dr_mode in gadget setup
    usb: gadget: f_midi: unlock on error
    USB: digi_acceleport: do sanity checking for the number of ports
    USB: cypress_m8: add endpoint sanity check
    USB: mct_u232: add sanity checking in probe
    usb: fix regression in SuperSpeed endpoint descriptor parsing
    USB: usbip: fix potential out-of-bounds write
    usb: renesas_usbhs: disable TX IRQ before starting TX DMAC transfer
    usb: renesas_usbhs: avoid NULL pointer derefernce in usbhsf_pkt_handler()
    usb: gadget: f_midi: Fixed a bug when buflen was smaller than wMaxPacketSize
    usb: phy: qcom-8x16: fix regulator API abuse
    usb: ch9: Fix SSP Device Cap wFunctionalitySupport type
    usb: gadget: composite: Access SSP Dev Cap fields properly
    ...

    Linus Torvalds
     
  • Pull staging and IIO driver fixes from Greg KH:
    "Here are some IIO driver fixes, along with two staging driver fixes
    for 4.6-rc3.

    One staging driver patch reverts the deletion of a driver that
    happened in 4.6-rc1. We thought that laptop.org was dead, but it's
    still alive and kicking, and has users that were mad we broke their
    hardware by deleting a driver for their machines. So that driver is
    added back and everyone is happy again.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'staging-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    Revert "Staging: olpc_dcon: Remove obsolete driver"
    staging/rdma/hfi1: select CRC32
    iio: gyro: bmg160: fix buffer read values
    iio: gyro: bmg160: fix endianness when reading axes
    iio: accel: bmc150: fix endianness when reading axes
    iio: st_magn: always define ST_MAGN_TRIGGER_SET_STATE
    iio: fix config watermark initial value
    iio: health: max30100: correct FIFO check condition
    iio: imu: Fix inv_mpu6050 dependencies
    iio: adc: Fix build error of missing devm_ioremap_resource on UM
    iio: light: apds9960: correct FIFO check condition
    iio: adc: max1363: correct reference voltage
    iio: adc: max1363: add missing adc to max1363_id

    Linus Torvalds
     
  • Pull SCSI fixes from James Bottomley:
    "This is a set of eight fixes.

    Two are trivial gcc-6 updates (brace additions and unused variable
    removal). There's a couple of cxlflash regressions, a correction for
    sd being overly chatty on revalidation (causing excess log increases).
    A VPD issue which could crash USB devices because they seem very
    intolerant to VPD inquiries, an ALUA deadlock fix and a mpt3sas buffer
    overrun fix"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: Do not attach VPD to devices that don't support it
    sd: Fix excessive capacity printing on devices with blocks bigger than 512 bytes
    scsi_dh_alua: Fix a recently introduced deadlock
    scsi: Declare local symbols static
    cxlflash: Move to exponential back-off when cmd_room is not available
    cxlflash: Fix regression issue with re-ordering patch
    mpt3sas: Don't overreach ioc->reply_post[] during initialization
    aacraid: add missing curly braces

    Linus Torvalds