09 Oct, 2018
5 commits
-
There is a number of places in the lightnvm subsystem where the user
iterates over the ppa list. Before iterating, the user must know if it
is a single or multiple LBAs due to vector commands using either the
nvm_rq ->ppa_addr or ->ppa_list fields on command submission, which
leads to open-coding the if/else statement.Instead of having multiple if/else's, move it into a function that can
be called by its users.A nice side effect of this cleanup is that this patch fixes up a
bunch of cases where we don't consider the single-ppa case in pblk.Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Implement helpers to go from ppas to a chunk within a line and an
address within a chunk.These helpers will be used on the patches adding trace support in pblk,
which will be sent in this window.Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
pblk implements two data paths for recovery line state. One for 1.2
and another for 2.0, instead of having pblk implement these, combine
them in the core to reduce complexity and make available to other
targets.The new interface will adhere to the 2.0 chunk definition,
including managing open chunks with an active write pointer. To provide
this interface, a 1.2 device recovers the state of the chunks by
manually detecting if a chunk is either free/open/close/offline, and if
open, scanning the flash pages sequentially to find the next writeable
page. This process takes on average ~10 seconds on a device with 64 dies,
1024 blocks and 60us read access time. The process can be parallelized
but is left out for maintenance simplicity, as the 1.2 specification is
deprecated. For 2.0 devices, the logic is maintained internally in the
drive and retrieved through the 2.0 interface.Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
A 1.2 device is able to manage the logical to physical mapping
table internally or leave it to the host.A target only supports one of those approaches, and therefore must
check on initialization. Move this check to core to avoid each target
implement the check.Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Add nvm_set_flags helper to enable core to appropriately
set the command flags for read/write/erase depending on which version
a drive supports.The flags arguments can be distilled into the access hint,
scrambling, and program/erase suspend. Replace the access hint with
a "is_seq" parameter. The rest of the flags are dependent on the
command opcode, which is trivial to detect and set.Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe
02 Oct, 2018
1 commit
-
When an io is rejected by nvmf_check_ready() due to validation of the
controller state, the nvmf_fail_nonready_command() will normally return
BLK_STS_RESOURCE to requeue and retry. However, if the controller is
dying or the I/O is marked for NVMe multipath, the I/O is failed so that
the controller can terminate or so that the io can be issued on a
different path. Unfortunately, as this reject point is before the
transport has accepted the command, blk-mq ends up completing the I/O
and never calls nvme_complete_rq(), which is where multipath may preserve
or re-route the I/O. The end result is, the device user ends up seeing an
EIO error.Example: single path connectivity, controller is under load, and a reset
is induced. An I/O is received:a) while the reset state has been set but the queues have yet to be
stopped; or
b) after queues are started (at end of reset) but before the reconnect
has completed.The I/O finishes with an EIO status.
This patch makes the following changes:
- Adds the HOST_PATH_ERROR pathing status from TP4028
- Modifies the reject point such that it appears to queue successfully,
but actually completes the io with the new pathing status and calls
nvme_complete_rq().
- nvme_complete_rq() recognizes the new status, avoids resetting the
controller (likely was already done in order to get this new status),
and calls the multipather to clear the current path that errored.
This allows the next command (retry or new command) to select a new
path if there is one.Signed-off-by: James Smart
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
01 Oct, 2018
1 commit
-
Merge -rc6 in, for two reasons:
1) Resolve a trivial conflict in the blk-mq-tag.c documentation
2) A few important regression fixes went into upstream directly, so
they aren't in the 4.20 branch.Signed-off-by: Jens Axboe
* tag 'v4.19-rc6': (780 commits)
Linux 4.19-rc6
MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
cpufreq: qcom-kryo: Fix section annotations
perf/core: Add sanity check to deal with pinned event failure
xen/blkfront: correct purging of persistent grants
Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
selftests/powerpc: Fix Makefiles for headers_install change
blk-mq: I/O and timer unplugs are inverted in blktrace
dax: Fix deadlock in dax_lock_mapping_entry()
x86/boot: Fix kexec booting failure in the SEV bit detection code
bcache: add separate workqueue for journal_write to avoid deadlock
drm/amd/display: Fix Edid emulation for linux
drm/amd/display: Fix Vega10 lightup on S3 resume
drm/amdgpu: Fix vce work queue was not cancelled when suspend
Revert "drm/panel: Add device_link from panel device to DRM device"
xen/blkfront: When purging persistent grants, keep them in the buffer
clocksource/drivers/timer-atmel-pit: Properly handle error cases
block: fix deadline elevator drain for zoned block devices
ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
...Signed-off-by: Jens Axboe
29 Sep, 2018
3 commits
-
Mark writes:
"spi: Fixes for v4.19Quite a few fixes for the Renesas drivers in here, plus a fix for the
Tegra driver and some documentation fixes for the recently added
spi-mem code. The Tegra fix is relatively large but fairly
straightforward and mechanical, it runs on probe so it's been
reasonably well covered in -next testing."* tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-mem: Move the DMA-able constraint doc to the kerneldoc header
spi: spi-mem: Add missing description for data.nbytes field
spi: rspi: Fix interrupted DMA transfers
spi: rspi: Fix invalid SPI use during system suspend
spi: sh-msiof: Fix handling of write value for SISTR register
spi: sh-msiof: Fix invalid SPI use during system suspend
spi: gpio: Fix copy-and-paste error
spi: tegra20-slink: explicitly enable/disable clock -
Mark writes:
"regulator: Fixes for 4.19A collection of fairly minor bug fixes here, a couple of driver
specific ones plus two core fixes. There's one fix for the new
suspend state code which fixes some confusion with constant values
that are supposed to indicate noop operation and another fixing a
race condition with the creation of sysfs files on new regulators."* tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fix crash caused by null driver data
regulator: Fix 'do-nothing' value for regulators without suspend state
regulator: da9063: fix DT probing with constraints
regulator: bd71837: Disable voltage monitoring for LDO3/4 -
Dave writes:
"drm fixes for 4.19-rc6Looks like a pretty normal week for graphics,
core: syncobj fix, panel link regression revert
amd: suspend/resume fixes, EDID emulation fix
mali-dp: NV12 writeback and vblank reset fixes
etnaviv: DMA setup fix"* tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Fix Edid emulation for linux
drm/amd/display: Fix Vega10 lightup on S3 resume
drm/amdgpu: Fix vce work queue was not cancelled when suspend
Revert "drm/panel: Add device_link from panel device to DRM device"
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
drm/malidp: Fix writeback in NV12
drm: mali-dp: Call drm_crtc_vblank_reset on device init
drm/etnaviv: add DMA configuration for etnaviv platform device
28 Sep, 2018
3 commits
-
Update device_add_disk() to take an 'groups' argument so that
individual drivers can register a device with additional sysfs
attributes.
This avoids race condition the driver would otherwise have if these
groups were to be created with sysfs_add_groups().Signed-off-by: Martin Wilck
Signed-off-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe -
When debugging Kyber, it's really useful to know what latencies we've
been having, how the domain depths have been adjusted, and if we've
actually been throttling. Add three tracepoints, kyber_latency,
kyber_adjust, and kyber_throttled, to record that.Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe -
Commit 4bc6339a583c ("block: move blk_stat_add() to
__blk_mq_end_request()") consolidated some calls using ktime_get() so
we'd only need to call it once. Kyber's ->completed_request() hook also
calls ktime_get(), so let's move it to the same place, too.Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe
27 Sep, 2018
4 commits
-
This reverts commit 0c08754b59da5557532d946599854e6df28edc22.
commit 0c08754b59da
("drm/panel: Add device_link from panel device to DRM device")
creates a circular dependency under these circumstances:1. The panel depends on dsi-host because it is MIPI-DSI child
device.
2. dsi-host depends on the drm parent device (connector->dev->dev)
this should be allowed.
3. drm parent dev (connector->dev->dev) depends on the panel
after this patch.This makes the dependency circular and while it appears it
does not affect any in-tree drivers (they do not seem to have
dsi hosts depending on the same parent device) this does not
seem right.As noted in a response from Andrzej Hajda, the intent is
likely to make the panel dependent on the DRM device
(connector->dev) not its parent. But we have no way of
doing that since the DRM device doesn't contain any
struct device on its own (arguably it should).Revert this until a proper approach is figured out.
Cc: Jyri Sarha
Cc: Eric Anholt
Cc: Andrzej Hajda
Signed-off-by: Linus Walleij
Signed-off-by: Sean Paul
Link: https://patchwork.freedesktop.org/patch/msgid/20180927124130.9102-1-linus.walleij@linaro.org -
This function will be used in a later patch to switch the struct
request_queue q_usage_counter from killed back to live. In contrast
to percpu_ref_reinit(), this new function does not require that the
refcount is zero.Signed-off-by: Bart Van Assche
Acked-by: Tejun Heo
Reviewed-by: Ming Lei
Cc: Christoph Hellwig
Cc: Jianchao Wang
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Signed-off-by: Jens Axboe -
The RQF_PREEMPT flag is used for three purposes:
- In the SCSI core, for making sure that power management requests
are executed even if a device is in the "quiesced" state.
- For domain validation by SCSI drivers that use the parallel port.
- In the IDE driver, for IDE preempt requests.
Rename "preempt-only" into "pm-only" because the primary purpose of
this mode is power management. Since the power management core may
but does not have to resume a runtime suspended device before
performing system-wide suspend and since a later patch will set
"pm-only" mode as long as a block device is runtime suspended, make
it possible to set "pm-only" mode from more than one context. Since
with this change scsi_device_quiesce() is no longer idempotent, make
that function return early if it is called for a quiesced queue.Signed-off-by: Bart Van Assche
Acked-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Cc: Jianchao Wang
Cc: Johannes Thumshirn
Cc: Alan Stern
Signed-off-by: Jens Axboe -
Move the code for runtime power management from blk-core.c into the
new source file blk-pm.c. Move the corresponding declarations from
into . For CONFIG_PM=n, leave out
the declarations of the functions that are not used in that mode.
This patch not only reduces the number of #ifdefs in the block layer
core code but also reduces the size of header file
and hence should help to reduce the build time of the Linux kernel
if CONFIG_PM is not defined.Signed-off-by: Bart Van Assche
Reviewed-by: Ming Lei
Reviewed-by: Christoph Hellwig
Cc: Jianchao Wang
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: Alan Stern
Signed-off-by: Jens Axboe
26 Sep, 2018
2 commits
-
Having multiple externs in arch headers is not a good way to provide
a common interface.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Dan writes:
"libnvdimm/dax for 4.19-rc6* (2) fixes for the dax error handling updates that were merged for
v4.19-rc1. My mails to Al have been bouncing recently, so I do not have
his ack but the uaccess change is of the trivial / obviously correct
variety. The address_space_operations fixes a regression.* A filesystem-dax fix to correct the zero page lookup to be compatible
with non-x86 (mips and s390) architectures."* tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: Add missing address_space_operations
uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe()
filesystem-dax: Fix use of zero page
25 Sep, 2018
10 commits
-
This changes UAPI, breaking iwd and libell:
ell/key.c: In function 'kernel_dh_compute':
ell/key.c:205:38: error: 'struct keyctl_dh_params' has no member named 'private'; did you mean 'dh_private'?
struct keyctl_dh_params params = { .private = private,
^~~~~~~
dh_privateThis reverts commit 8a2336e549d385bb0b46880435b411df8d8200e8.
Fixes: 8a2336e549d3 ("uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name")
Signed-off-by: Lubomir Rintel
Signed-off-by: David Howells
cc: Randy Dunlap
cc: Mat Martineau
cc: Stephan Mueller
cc: James Morris
cc: "Serge E. Hallyn"
cc: Mat Martineau
cc: Andrew Morton
cc: Linus Torvalds
cc:
Signed-off-by: James Morris
Signed-off-by: Greg Kroah-Hartman -
Dave writes:
"Networking fixes:1) Fix multiqueue handling of coalesce timer in stmmac, from Jose
Abreu.2) Fix memory corruption in NFC, from Suren Baghdasaryan.
3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi.
4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun.
5) Fix TX done race in mvpp2, from Antoine Tenart.
6) ipv6 metrics leak, from Wei Wang.
7) Adjust firmware version requirements in mlxsw, from Petr Machata.
8) Fix autonegotiation on resume in r8169, from Heiner Kallweit.
9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff
Barnhill.10) Fix double free in devlink, from Dan Carpenter.
11) Fix ethtool regression from UFO feature removal, from Maciej
Żenczykowski.12) Fix drivers that have a ndo_poll_controller() that captures the
cpu entirely on loaded hosts by trying to drain all rx and tx
queues, from Eric Dumazet.13) Fix memory corruption with jumbo frames in aquantia driver, from
Friedemann Gerold."* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits)
net: mvneta: fix the remaining Rx descriptor unmapping issues
ip_tunnel: be careful when accessing the inner header
mpls: allow routes on ip6gre devices
net: aquantia: memory corruption on jumbo frames
tun: remove ndo_poll_controller
nfp: remove ndo_poll_controller
bnxt: remove ndo_poll_controller
bnx2x: remove ndo_poll_controller
mlx5: remove ndo_poll_controller
mlx4: remove ndo_poll_controller
i40evf: remove ndo_poll_controller
ice: remove ndo_poll_controller
igb: remove ndo_poll_controller
ixgb: remove ndo_poll_controller
fm10k: remove ndo_poll_controller
ixgbevf: remove ndo_poll_controller
ixgbe: remove ndo_poll_controller
bonding: use netpoll_poll_dev() helper
netpoll: make ndo_poll_controller() optional
rds: Fix build regression.
... -
No need to pull in the BUG() defintion.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Now that we don't need an override for BIOVEC_PHYS_MERGEABLE there is
no need to drag this header in.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
We only use it in biovec_phys_mergeable and a m68k paravirt driver,
so just opencode it there. Also remove the pointless unsigned long cast
for the offset in the opencoded instances.Signed-off-by: Christoph Hellwig
Reviewed-by: Geert Uytterhoeven
Signed-off-by: Jens Axboe -
These two checks should always be performed together, so merge them into
a single helper.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Turn the macro into an inline, move it to blk.h and simplify the
arch hooks a bit.Also rename the function to biovec_phys_mergeable as there is no need
to shout.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
No need to expose these helpers outside the block layer.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Keep it close to the actual users instead of exposing the function to all
drivers.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
No need to expose these to drivers.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
24 Sep, 2018
1 commit
-
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller().NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.This patch allows netpoll_poll_dev() to process NAPI
contexts even for drivers not providing ndo_poll_controller(),
allowing for following patches in NAPI drivers.Also we export netpoll_poll_dev() so that it can be called
by bonding/team drivers in following patches.Reported-by: Song Liu
Signed-off-by: Eric Dumazet
Tested-by: Song Liu
Signed-off-by: David S. Miller
23 Sep, 2018
2 commits
-
Lee writes:
"MFD fixes for v4.19
- Fix Dialog DA9063 regulator constraints issue causing failure in
probe
- Fix OMAP Device Tree compatible strings to match DT"* tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: omap-usb-host: Fix dts probe of children
mfd: da9063: Fix DT probing with constraints -
Jens writes:
"Just a single fix in this pull request, fixing a regression in
/proc/diskstats caused by the unification of timestamps."* tag 'for-linus-20180922' of git://git.kernel.dk/linux-block:
block: use nanosecond resolution for iostat
22 Sep, 2018
8 commits
-
blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or NULL.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
Now that every bio is associated with a blkg, this puts the use of
blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
the refcnt in blkg to use percpu_ref.Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
blk_get_rl is responsible for identifying which request_list a request
should be allocated to. Try get logic was added earlier, but
semantically the logic was not changed.This patch makes better use of the bio already having a reference to the
blkg in the hot path. The cold path uses a better fallback of
blkg_lookup_create rather than just blkg_lookup and then falling back to
the q->root_rl. If lookup_create fails with anything but -ENODEV, it
falls back to q->root_rl.A clarifying comment is added to explain why q->root_rl is used rather
than the root blkg's rl.Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.Acked-by: Tejun Heo
Signed-off-by: Dennis Zhou
Signed-off-by: Jens Axboe -
Prior patches ensured that all bios are now associated with some blkg.
This now makes bio->bi_css unnecessary as blkg maintains a reference to
the blkcg already.This patch removes the field bi_css and transfers corresponding uses to
access via bi_blkg.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg. In
this patch, the wbc_init_bio call is changed such that it must be called
after a queue has been associated with the bio.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
A prior patch in this series added blkg association to bios issued by
cgroups. There are two other paths that we want to attribute work back
to the appropriate cgroup: swap and writeback. Here we modify the way
swap tags bios to include the blkg. Writeback will be tackle in the next
patch.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Reviewed-by: Liu Bo
Signed-off-by: Jens Axboe