09 Oct, 2018

5 commits

  • There is a number of places in the lightnvm subsystem where the user
    iterates over the ppa list. Before iterating, the user must know if it
    is a single or multiple LBAs due to vector commands using either the
    nvm_rq ->ppa_addr or ->ppa_list fields on command submission, which
    leads to open-coding the if/else statement.

    Instead of having multiple if/else's, move it into a function that can
    be called by its users.

    A nice side effect of this cleanup is that this patch fixes up a
    bunch of cases where we don't consider the single-ppa case in pblk.

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Implement helpers to go from ppas to a chunk within a line and an
    address within a chunk.

    These helpers will be used on the patches adding trace support in pblk,
    which will be sent in this window.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • pblk implements two data paths for recovery line state. One for 1.2
    and another for 2.0, instead of having pblk implement these, combine
    them in the core to reduce complexity and make available to other
    targets.

    The new interface will adhere to the 2.0 chunk definition,
    including managing open chunks with an active write pointer. To provide
    this interface, a 1.2 device recovers the state of the chunks by
    manually detecting if a chunk is either free/open/close/offline, and if
    open, scanning the flash pages sequentially to find the next writeable
    page. This process takes on average ~10 seconds on a device with 64 dies,
    1024 blocks and 60us read access time. The process can be parallelized
    but is left out for maintenance simplicity, as the 1.2 specification is
    deprecated. For 2.0 devices, the logic is maintained internally in the
    drive and retrieved through the 2.0 interface.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • A 1.2 device is able to manage the logical to physical mapping
    table internally or leave it to the host.

    A target only supports one of those approaches, and therefore must
    check on initialization. Move this check to core to avoid each target
    implement the check.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Add nvm_set_flags helper to enable core to appropriately
    set the command flags for read/write/erase depending on which version
    a drive supports.

    The flags arguments can be distilled into the access hint,
    scrambling, and program/erase suspend. Replace the access hint with
    a "is_seq" parameter. The rest of the flags are dependent on the
    command opcode, which is trivial to detect and set.

    Signed-off-by: Matias Bjørling
    Reviewed-by: Javier González
    Signed-off-by: Jens Axboe

    Matias Bjørling
     

02 Oct, 2018

1 commit

  • When an io is rejected by nvmf_check_ready() due to validation of the
    controller state, the nvmf_fail_nonready_command() will normally return
    BLK_STS_RESOURCE to requeue and retry. However, if the controller is
    dying or the I/O is marked for NVMe multipath, the I/O is failed so that
    the controller can terminate or so that the io can be issued on a
    different path. Unfortunately, as this reject point is before the
    transport has accepted the command, blk-mq ends up completing the I/O
    and never calls nvme_complete_rq(), which is where multipath may preserve
    or re-route the I/O. The end result is, the device user ends up seeing an
    EIO error.

    Example: single path connectivity, controller is under load, and a reset
    is induced. An I/O is received:

    a) while the reset state has been set but the queues have yet to be
    stopped; or
    b) after queues are started (at end of reset) but before the reconnect
    has completed.

    The I/O finishes with an EIO status.

    This patch makes the following changes:

    - Adds the HOST_PATH_ERROR pathing status from TP4028
    - Modifies the reject point such that it appears to queue successfully,
    but actually completes the io with the new pathing status and calls
    nvme_complete_rq().
    - nvme_complete_rq() recognizes the new status, avoids resetting the
    controller (likely was already done in order to get this new status),
    and calls the multipather to clear the current path that errored.
    This allows the next command (retry or new command) to select a new
    path if there is one.

    Signed-off-by: James Smart
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig

    James Smart
     

01 Oct, 2018

1 commit

  • Merge -rc6 in, for two reasons:

    1) Resolve a trivial conflict in the blk-mq-tag.c documentation
    2) A few important regression fixes went into upstream directly, so
    they aren't in the 4.20 branch.

    Signed-off-by: Jens Axboe

    * tag 'v4.19-rc6': (780 commits)
    Linux 4.19-rc6
    MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
    cpufreq: qcom-kryo: Fix section annotations
    perf/core: Add sanity check to deal with pinned event failure
    xen/blkfront: correct purging of persistent grants
    Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
    selftests/powerpc: Fix Makefiles for headers_install change
    blk-mq: I/O and timer unplugs are inverted in blktrace
    dax: Fix deadlock in dax_lock_mapping_entry()
    x86/boot: Fix kexec booting failure in the SEV bit detection code
    bcache: add separate workqueue for journal_write to avoid deadlock
    drm/amd/display: Fix Edid emulation for linux
    drm/amd/display: Fix Vega10 lightup on S3 resume
    drm/amdgpu: Fix vce work queue was not cancelled when suspend
    Revert "drm/panel: Add device_link from panel device to DRM device"
    xen/blkfront: When purging persistent grants, keep them in the buffer
    clocksource/drivers/timer-atmel-pit: Properly handle error cases
    block: fix deadline elevator drain for zoned block devices
    ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
    drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
    ...

    Signed-off-by: Jens Axboe

    Jens Axboe
     

29 Sep, 2018

3 commits

  • Mark writes:
    "spi: Fixes for v4.19

    Quite a few fixes for the Renesas drivers in here, plus a fix for the
    Tegra driver and some documentation fixes for the recently added
    spi-mem code. The Tegra fix is relatively large but fairly
    straightforward and mechanical, it runs on probe so it's been
    reasonably well covered in -next testing."

    * tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
    spi: spi-mem: Move the DMA-able constraint doc to the kerneldoc header
    spi: spi-mem: Add missing description for data.nbytes field
    spi: rspi: Fix interrupted DMA transfers
    spi: rspi: Fix invalid SPI use during system suspend
    spi: sh-msiof: Fix handling of write value for SISTR register
    spi: sh-msiof: Fix invalid SPI use during system suspend
    spi: gpio: Fix copy-and-paste error
    spi: tegra20-slink: explicitly enable/disable clock

    Greg Kroah-Hartman
     
  • Mark writes:
    "regulator: Fixes for 4.19

    A collection of fairly minor bug fixes here, a couple of driver
    specific ones plus two core fixes. There's one fix for the new
    suspend state code which fixes some confusion with constant values
    that are supposed to indicate noop operation and another fixing a
    race condition with the creation of sysfs files on new regulators."

    * tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
    regulator: fix crash caused by null driver data
    regulator: Fix 'do-nothing' value for regulators without suspend state
    regulator: da9063: fix DT probing with constraints
    regulator: bd71837: Disable voltage monitoring for LDO3/4

    Greg Kroah-Hartman
     
  • Dave writes:
    "drm fixes for 4.19-rc6

    Looks like a pretty normal week for graphics,

    core: syncobj fix, panel link regression revert
    amd: suspend/resume fixes, EDID emulation fix
    mali-dp: NV12 writeback and vblank reset fixes
    etnaviv: DMA setup fix"

    * tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm:
    drm/amd/display: Fix Edid emulation for linux
    drm/amd/display: Fix Vega10 lightup on S3 resume
    drm/amdgpu: Fix vce work queue was not cancelled when suspend
    Revert "drm/panel: Add device_link from panel device to DRM device"
    drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
    drm/malidp: Fix writeback in NV12
    drm: mali-dp: Call drm_crtc_vblank_reset on device init
    drm/etnaviv: add DMA configuration for etnaviv platform device

    Greg Kroah-Hartman
     

28 Sep, 2018

3 commits

  • Update device_add_disk() to take an 'groups' argument so that
    individual drivers can register a device with additional sysfs
    attributes.
    This avoids race condition the driver would otherwise have if these
    groups were to be created with sysfs_add_groups().

    Signed-off-by: Martin Wilck
    Signed-off-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     
  • When debugging Kyber, it's really useful to know what latencies we've
    been having, how the domain depths have been adjusted, and if we've
    actually been throttling. Add three tracepoints, kyber_latency,
    kyber_adjust, and kyber_throttled, to record that.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Commit 4bc6339a583c ("block: move blk_stat_add() to
    __blk_mq_end_request()") consolidated some calls using ktime_get() so
    we'd only need to call it once. Kyber's ->completed_request() hook also
    calls ktime_get(), so let's move it to the same place, too.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

27 Sep, 2018

4 commits

  • This reverts commit 0c08754b59da5557532d946599854e6df28edc22.

    commit 0c08754b59da
    ("drm/panel: Add device_link from panel device to DRM device")
    creates a circular dependency under these circumstances:

    1. The panel depends on dsi-host because it is MIPI-DSI child
    device.
    2. dsi-host depends on the drm parent device (connector->dev->dev)
    this should be allowed.
    3. drm parent dev (connector->dev->dev) depends on the panel
    after this patch.

    This makes the dependency circular and while it appears it
    does not affect any in-tree drivers (they do not seem to have
    dsi hosts depending on the same parent device) this does not
    seem right.

    As noted in a response from Andrzej Hajda, the intent is
    likely to make the panel dependent on the DRM device
    (connector->dev) not its parent. But we have no way of
    doing that since the DRM device doesn't contain any
    struct device on its own (arguably it should).

    Revert this until a proper approach is figured out.

    Cc: Jyri Sarha
    Cc: Eric Anholt
    Cc: Andrzej Hajda
    Signed-off-by: Linus Walleij
    Signed-off-by: Sean Paul
    Link: https://patchwork.freedesktop.org/patch/msgid/20180927124130.9102-1-linus.walleij@linaro.org

    Linus Walleij
     
  • This function will be used in a later patch to switch the struct
    request_queue q_usage_counter from killed back to live. In contrast
    to percpu_ref_reinit(), this new function does not require that the
    refcount is zero.

    Signed-off-by: Bart Van Assche
    Acked-by: Tejun Heo
    Reviewed-by: Ming Lei
    Cc: Christoph Hellwig
    Cc: Jianchao Wang
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • The RQF_PREEMPT flag is used for three purposes:
    - In the SCSI core, for making sure that power management requests
    are executed even if a device is in the "quiesced" state.
    - For domain validation by SCSI drivers that use the parallel port.
    - In the IDE driver, for IDE preempt requests.
    Rename "preempt-only" into "pm-only" because the primary purpose of
    this mode is power management. Since the power management core may
    but does not have to resume a runtime suspended device before
    performing system-wide suspend and since a later patch will set
    "pm-only" mode as long as a block device is runtime suspended, make
    it possible to set "pm-only" mode from more than one context. Since
    with this change scsi_device_quiesce() is no longer idempotent, make
    that function return early if it is called for a quiesced queue.

    Signed-off-by: Bart Van Assche
    Acked-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Cc: Jianchao Wang
    Cc: Johannes Thumshirn
    Cc: Alan Stern
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Move the code for runtime power management from blk-core.c into the
    new source file blk-pm.c. Move the corresponding declarations from
    into . For CONFIG_PM=n, leave out
    the declarations of the functions that are not used in that mode.
    This patch not only reduces the number of #ifdefs in the block layer
    core code but also reduces the size of header file
    and hence should help to reduce the build time of the Linux kernel
    if CONFIG_PM is not defined.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Jianchao Wang
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Cc: Alan Stern
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

26 Sep, 2018

2 commits


25 Sep, 2018

10 commits

  • This changes UAPI, breaking iwd and libell:

    ell/key.c: In function 'kernel_dh_compute':
    ell/key.c:205:38: error: 'struct keyctl_dh_params' has no member named 'private'; did you mean 'dh_private'?
    struct keyctl_dh_params params = { .private = private,
    ^~~~~~~
    dh_private

    This reverts commit 8a2336e549d385bb0b46880435b411df8d8200e8.

    Fixes: 8a2336e549d3 ("uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name")
    Signed-off-by: Lubomir Rintel
    Signed-off-by: David Howells
    cc: Randy Dunlap
    cc: Mat Martineau
    cc: Stephan Mueller
    cc: James Morris
    cc: "Serge E. Hallyn"
    cc: Mat Martineau
    cc: Andrew Morton
    cc: Linus Torvalds
    cc:
    Signed-off-by: James Morris
    Signed-off-by: Greg Kroah-Hartman

    Lubomir Rintel
     
  • Dave writes:
    "Networking fixes:

    1) Fix multiqueue handling of coalesce timer in stmmac, from Jose
    Abreu.

    2) Fix memory corruption in NFC, from Suren Baghdasaryan.

    3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi.

    4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun.

    5) Fix TX done race in mvpp2, from Antoine Tenart.

    6) ipv6 metrics leak, from Wei Wang.

    7) Adjust firmware version requirements in mlxsw, from Petr Machata.

    8) Fix autonegotiation on resume in r8169, from Heiner Kallweit.

    9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff
    Barnhill.

    10) Fix double free in devlink, from Dan Carpenter.

    11) Fix ethtool regression from UFO feature removal, from Maciej
    Żenczykowski.

    12) Fix drivers that have a ndo_poll_controller() that captures the
    cpu entirely on loaded hosts by trying to drain all rx and tx
    queues, from Eric Dumazet.

    13) Fix memory corruption with jumbo frames in aquantia driver, from
    Friedemann Gerold."

    * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits)
    net: mvneta: fix the remaining Rx descriptor unmapping issues
    ip_tunnel: be careful when accessing the inner header
    mpls: allow routes on ip6gre devices
    net: aquantia: memory corruption on jumbo frames
    tun: remove ndo_poll_controller
    nfp: remove ndo_poll_controller
    bnxt: remove ndo_poll_controller
    bnx2x: remove ndo_poll_controller
    mlx5: remove ndo_poll_controller
    mlx4: remove ndo_poll_controller
    i40evf: remove ndo_poll_controller
    ice: remove ndo_poll_controller
    igb: remove ndo_poll_controller
    ixgb: remove ndo_poll_controller
    fm10k: remove ndo_poll_controller
    ixgbevf: remove ndo_poll_controller
    ixgbe: remove ndo_poll_controller
    bonding: use netpoll_poll_dev() helper
    netpoll: make ndo_poll_controller() optional
    rds: Fix build regression.
    ...

    Greg Kroah-Hartman
     
  • No need to pull in the BUG() defintion.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Now that we don't need an override for BIOVEC_PHYS_MERGEABLE there is
    no need to drag this header in.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • We only use it in biovec_phys_mergeable and a m68k paravirt driver,
    so just opencode it there. Also remove the pointless unsigned long cast
    for the offset in the opencoded instances.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Geert Uytterhoeven
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • These two checks should always be performed together, so merge them into
    a single helper.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Turn the macro into an inline, move it to blk.h and simplify the
    arch hooks a bit.

    Also rename the function to biovec_phys_mergeable as there is no need
    to shout.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • No need to expose these helpers outside the block layer.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Keep it close to the actual users instead of exposing the function to all
    drivers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • No need to expose these to drivers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 Sep, 2018

1 commit

  • As diagnosed by Song Liu, ndo_poll_controller() can
    be very dangerous on loaded hosts, since the cpu
    calling ndo_poll_controller() might steal all NAPI
    contexts (for all RX/TX queues of the NIC). This capture
    can last for unlimited amount of time, since one
    cpu is generally not able to drain all the queues under load.

    It seems that all networking drivers that do use NAPI
    for their TX completions, should not provide a ndo_poll_controller().

    NAPI drivers have netpoll support already handled
    in core networking stack, since netpoll_poll_dev()
    uses poll_napi(dev) to iterate through registered
    NAPI contexts for a device.

    This patch allows netpoll_poll_dev() to process NAPI
    contexts even for drivers not providing ndo_poll_controller(),
    allowing for following patches in NAPI drivers.

    Also we export netpoll_poll_dev() so that it can be called
    by bonding/team drivers in following patches.

    Reported-by: Song Liu
    Signed-off-by: Eric Dumazet
    Tested-by: Song Liu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Sep, 2018

2 commits


22 Sep, 2018

8 commits

  • blkg reference counting now uses percpu_ref rather than atomic_t. Let's
    make this consistent with css_tryget. This renames blkg_try_get to
    blkg_tryget and now returns a bool rather than the blkg or NULL.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou (Facebook)
     
  • Now that every bio is associated with a blkg, this puts the use of
    blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
    the refcnt in blkg to use percpu_ref.

    Signed-off-by: Dennis Zhou
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou (Facebook)
     
  • blk_get_rl is responsible for identifying which request_list a request
    should be allocated to. Try get logic was added earlier, but
    semantically the logic was not changed.

    This patch makes better use of the bio already having a reference to the
    blkg in the hot path. The cold path uses a better fallback of
    blkg_lookup_create rather than just blkg_lookup and then falling back to
    the q->root_rl. If lookup_create fails with anything but -ENODEV, it
    falls back to q->root_rl.

    A clarifying comment is added to explain why q->root_rl is used rather
    than the root blkg's rl.

    Signed-off-by: Dennis Zhou
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou (Facebook)
     
  • The previous patch in this series removed carrying around a pointer to
    the css in blkg. However, the blkg association logic still relied on
    taking a reference on the css to ensure we wouldn't fail in getting a
    reference for the blkg.

    Here the implicit dependency on the css is removed. The association
    continues to rely on the tryget logic walking up the blkg tree. This
    streamlines the three ways that association can happen: normal, swap,
    and writeback.

    Acked-by: Tejun Heo
    Signed-off-by: Dennis Zhou
    Signed-off-by: Jens Axboe

    Dennis Zhou (Facebook)
     
  • Prior patches ensured that all bios are now associated with some blkg.
    This now makes bio->bi_css unnecessary as blkg maintains a reference to
    the blkcg already.

    This patch removes the field bi_css and transfers corresponding uses to
    access via bi_blkg.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou (Facebook)
     
  • One of the goals of this series is to remove a separate reference to
    the css of the bio. This can and should be accessed via bio_blkcg. In
    this patch, the wbc_init_bio call is changed such that it must be called
    after a queue has been associated with the bio.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou (Facebook)
     
  • A prior patch in this series added blkg association to bios issued by
    cgroups. There are two other paths that we want to attribute work back
    to the appropriate cgroup: swap and writeback. Here we modify the way
    swap tags bios to include the blkg. Writeback will be tackle in the next
    patch.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou (Facebook)
     
  • bio_issue_init among other things initializes the timestamp for an IO.
    Rather than have this logic handled by policies, this consolidates it to
    be on the init paths (normal, clone, bounce clone).

    Signed-off-by: Dennis Zhou
    Acked-by: Tejun Heo
    Reviewed-by: Liu Bo
    Signed-off-by: Jens Axboe

    Dennis Zhou (Facebook)