06 Sep, 2019
2 commits
-
If userspace requests target to be removed, nvm_remove_tgt() will
iterate the nvm_devices to find out the given target, but if not
found, then it should print out an error.Signed-off-by: Minwoo Im
Updated output string and patch description.
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
all the pr_() family can have this prefix by pr_fmt.
Signed-off-by: Minwoo Im
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe
09 Aug, 2019
1 commit
-
A previous commit correctly removed set-but-not-read variables, but
this left two new variables now unused. Kill them.Fixes: ba6f7da99aaf ("lightnvm: remove set but not used variables 'data_len' and 'rq_len'")
Reported-by: Stephen Rothwell
Signed-off-by: Jens Axboe
08 Aug, 2019
1 commit
-
drivers/lightnvm/pblk-read.c: In function pblk_submit_read_gc:
drivers/lightnvm/pblk-read.c:423:6: warning: variable data_len set but not used [-Wunused-but-set-variable]
drivers/lightnvm/pblk-recovery.c: In function pblk_recov_scan_oob:
drivers/lightnvm/pblk-recovery.c:368:15: warning: variable rq_len set but not used [-Wunused-but-set-variable]They are not used since commit 48e5da725581 ("lightnvm:
move metadata mapping to lower level driver")Reported-by: Hulk Robot
Signed-off-by: YueHaibing
Signed-off-by: Jens Axboe
06 Aug, 2019
3 commits
-
There is no reason now not to use kvmalloc, so replace the internal
metadata allocation scheme.Reviewed-by: Javier González
Reviewed-by: Christoph Hellwig
Signed-off-by: Hans Holmberg
Signed-off-by: Jens Axboe -
Now that blk_rq_map_kern can map both kmem and vmem, move internal
metadata mapping down to the lower level driver.Reviewed-by: Javier González
Reviewed-by: Christoph Hellwig
Signed-off-by: Hans Holmberg
Signed-off-by: Jens Axboe -
Move the redundant sync handling interface and wait for a completion in
the lightnvm core instead.Reviewed-by: Javier González
Reviewed-by: Christoph Hellwig
Signed-off-by: Hans Holmberg
Signed-off-by: Jens Axboe
21 Jun, 2019
2 commits
-
With gcc 4.1:
drivers/lightnvm/core.c: In function ‘nvm_remove_tgt’:
drivers/lightnvm/core.c:510: warning: ‘t’ is used uninitialized in this functionIndeed, if no NVM devices have been registered, t will be an
uninitialized pointer, and may be dereferenced later. A call to
nvm_remove_tgt() can be triggered from userspace by issuing the
NVM_DEV_REMOVE ioctl on the lightnvm control device.Fix this by preinitializing t to NULL.
Fixes: 843f2edbdde085b4 ("lightnvm: do not remove instance under global lock")
Signed-off-by: Geert Uytterhoeven
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
bio_add_pc_page() may merge pages when a bio is padded due to a flush.
Fix iteration over the bio to free the correct pages in case of a merge.Signed-off-by: Heiner Litz
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe
05 Jun, 2019
1 commit
-
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program see the file copying if not
write to the free software foundation 675 mass ave cambridge ma
02139 usaextracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 3 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190112.675111872@linutronix.de
Signed-off-by: Greg Kroah-Hartman
21 May, 2019
1 commit
-
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:GPL-2.0-only
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
07 May, 2019
26 commits
-
This patch replaces few remaining usages of rqd->ppa_list[] with
existing nvm_rq_to_ppa_list() helpers. This is needed for theoretical
devices with ws_min/ws_opt equal to 1.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
This patch changes the approach to handling partial read path.
In old approach merging of data from round buffer and drive was fully
made by drive. This had some disadvantages - code was complex and
relies on bio internals, so it was hard to maintain and was strongly
dependent on bio changes.In new approach most of the handling is done mostly by block layer
functions such as bio_split(), bio_chain() and generic_make request()
and generally is less complex and easier to maintain. Below some more
details of the new approach.When read bio arrives, it is cloned for pblk internal purposes. All
the L2P mapping, which includes copying data from round buffer to bio
and thus bio_advance() calls is done on the cloned bio, so the original
bio is untouched. If we found that we have partial read case, we
still have original bio untouched, so we can split it and continue to
process only first part of it in current context, when the rest will be
called as separate bio request which is passed to generic_make_request()
for further processing.Signed-off-by: Igor Konopko
Reviewed-by: Heiner Litz
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Currently all the target instances are removed under global nvm_lock.
This was needed to ensure that nvm_dev struct will not be freed by
hot unplug event during target removal. However, current implementation
has some drawbacks, since the same lock is used when new nvme subsystem
is registered, so we can have a situation, that due to long process of
target removal on drive A, registration (and listing in OS) of the
drive B will take a lot of time, since it will wait for that lock.Now when we have kref which ensures that nvm_dev will not be freed in
the meantime, we can easily get rid of this lock for a time when we are
removing nvm targets.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
When creation process is still in progress, target is not yet on
targets list. This causes a chance for removing whole lightnvm
subsystem by calling nvm_unregister() in the meantime and finally by
causing kernel panic inside target init function.This patch changes the behaviour by adding kref variable which tracks
all the users of nvm_dev structure. When nvm_dev is allocated, kref
value is set to 1. Then before every target creation the value is
increased and decreased after target removal. The extra reference
is decreased when nvm subsystem is unregistered.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
This patch ensures that smeta was fully written before even
trying to read it based on chunk table state and write pointer.Signed-off-by: Igor Konopko
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
This patch is made in order to prepare read path for new approach to
partial read handling, which is simpler in compare with previous one.The most important change is to move the handling of completed and
failed bio from the pblk_make_rq() to particular read and write
functions. This is needed, since after partial read path changes,
sometimes completed/failed bio will be different from original one, so
we cannot do this any longer in pblk_make_rq().Other changes are small read path refactor in order to reduce the size
of the following patch with partial read changes.Generally the goal of this patch is not to change the functionality,
but just to prepare the code for the following changes.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Currently when there is an IO error (or similar) on GC read path, pblk
still move the line, which was currently under GC process to free state.
Such a behaviour can lead to silent data mismatch issue.With this patch, the line which was under GC process on which some IO
errors occurred, will be putted back to closed state (instead of free
state as it was without this patch) and the L2P mapping for such a
failed sectors will not be updated.Then in case of any user IOs to such a failed sectors, pblk would be
able to return at least real IO error instead of stale data as it is
right now.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Currently during pblk padding, there is internal IO timeout introduced,
which is smaller than default NVMe timeout. This can lead to various
use-after-free issues. Since in case of any IO timeouts NVMe and block
layer will handle timeout by themselves and report it back to use,
there is no need to keep this internal timeout in pblk.Signed-off-by: Igor Konopko
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
This patch changes the behaviour of recovery padding in order to
support a case, when some IOs were already submitted to the drive and
some next one are not submitted due to error returned.Currently in case of errors we simply exit the pad function without
waiting for inflight IOs, which leads to panic on inflight IOs
completion.After the changes we always wait for all the inflight IOs before
exiting the function.Signed-off-by: Igor Konopko
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Read errors are not correctly propagated. Errors are cleared before
returning control to the io submitter. Change the behaviour such that
all read errors exept high ecc read warning status is returned
appropriately.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
In case of OOB recovery, we can hit the scenario when all the data in
line were written and some part of emeta was written too. In such
a case pblk_update_line_wp() function will call pblk_alloc_page()
function which will case to set left_msecs to value below zero
(since this field does not track emeta region) and thus will lead to
multiple kernel warnings. This patch fixes that issue.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
In case of write recovery path, there is a chance that writer thread
is not active, kick immediately instead of waiting for timer.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
In pblk_rb_tear_down_check() the spinlock functions are not
called in proper order.Fixes: a4bd217 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
When we trigger nvm target remove during device hot unplug, there is
a probability to hit a general protection fault. This is caused by use
of nvm_dev thay may be freed from another (hot unplug) thread
(in the nvm_unregister function).Introduce lock in nvme_ioctl_dev_remove function to prevent this
situation.Signed-off-by: Marcin Dziegielewski
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
In current implementation of l2p recovery, when we are after gc and we
have open line, we are not setting current data line properly (we set
last line from the device instead of last line ordered by seq_nr) and
in consequence, kernel panic and data corruption.Signed-off-by: Marcin Dziegielewski
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
For large size io where blk_queue_split needs to be called inside
pblk_rw_io, results in bio leak as bio_endio is not called on the
newly allocated. One way to observe this is to mounting ext4
filesystem on the target and issuing 1MB io with dd, e.g., dd bs=1MB
if=/dev/null of=/mount/myvolume. kmemleak reports:unreferenced object 0xffff88803d7d0100 (size 256):
comm "kworker/u16:1", pid 68, jiffies 4294899333 (age 284.120s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 60 e8 31 81 88 ff ff .........`.1....
01 40 00 00 06 06 00 00 00 00 00 00 05 00 00 00 .@..............
backtrace:
[] kmem_cache_alloc+0x204/0x3c0
[] mempool_alloc_slab+0x1d/0x30
[] mempool_alloc+0x83/0x220
[] bio_alloc_bioset+0x229/0x320
[] bio_clone_fast+0x26/0xc0
[] bio_split+0x41/0x110
[] blk_queue_split+0x349/0x930
[] pblk_make_rq+0x1b5/0x1f0
[] generic_make_request+0x2f9/0x690
[] submit_bio+0x12e/0x1f0
[] ext4_io_submit+0x64/0x80
[] ext4_bio_write_page+0x32e/0x890
[] mpage_submit_page+0x65/0xc0
[] mpage_map_and_submit_buffers+0x171/0x330
[] ext4_writepages+0xd5e/0x1650
[] do_writepages+0x39/0xc0In case there is a need for a split, blk_queue_split returns the newly
allocated bio to the caller by changing the value of pointer passed as
a reference, while the original is passed to generic_make_requests.Although pblk_rw_io's local variable bio* has changed and passed to
pblk_submit_read and pblk_write_to_cache, work is done on this new
bio*, and pblk_rw_io returns NVM_IO_DONE, pblk_make_rq calls bio_endio
on the old bio* because it passed bio pointer by value to pblk_rw_io.pblk_rw_io is unfolded into pblk_make_rq so that there is no copying
of bio* and bio_endio is called on the correct bio*.Signed-off-by: Chansol Kim
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Current lightnvm and pblk implementation does not care about NVMe max
data transfer size, which can be smaller than 64*K=256K. There are
existing NVMe controllers which NVMe max data transfer size is lower
that 256K (for example 128K, which happens for existing NVMe
controllers which are NVMe spec compliant). Such a controllers are not
able to handle command which contains 64 PPAs, since the the size of
DMAed buffer will be above the capabilities of such a controller.Signed-off-by: Igor Konopko
Reviewed-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Currently in case of read errors, bi_status is not set properly which
leads to returning inproper data to layers above. This patch fix that
by setting proper status in case of read errors.Also remove unnecessary warn_once(), which does not make sense
in that place, since user bio is not used for interation with drive
and thus bi_status will not be set here.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
L2P table can be huge in many cases, since it typically requires 1GB
of DRAM for 1TB of drive. When there is not enough memory available,
OOM killer turns on and kills random processes, which can be very
annoying for users.This patch changes the flag for L2P table allocation on order to handle
this situation in more user friendly way.GFP_KERNEL and __GPF_HIGHMEM are default flags used in parameterless
vmalloc() calls, so they are also keeped in that patch. Additionally
__GFP_NOWARN flag is added in order to hide very long dmesg warn in
case of the allocation failures. The most important flag introduced
in that patch is __GFP_RETRY_MAYFAIL, which would cause allocator
to try use free memory and if not available to drop caches, but not
to run OOM killer.Signed-off-by: Igor Konopko
Reviewed-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
The sector bits in the erase command may be uninitialized are
uninitialized, causing the erase LBA to be unaligned to the chunk size.This is unexpected situation, since erase shall always be chunk
aligned based on OCSSD the 2.0 specification.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
In the pblk_put_line_back function, a race condition with
__pblk_map_invalidate can make a line not part of any lists.Fix gc_list by resetting it to null fixes the above issue.
Fixes: a4bd217 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Currently when we fail on rq data allocation in gc, it skips moving
active data and moves line straigt to its free state. Losing user
data in the process.Move the data allocation to an earlier phase of GC, where we can still
fail gracefully by moving line back to the closed state.Signed-off-by: Igor Konopko
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
smeta_ssec field in pblk_line is never used after it was replaced by
the function pblk_line_smeta_start().Signed-off-by: Igor Konopko
Reviewed-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Currently L2P map size is calculated based on the total number of
available sectors, which is redundant, since it contains mapping for
overprovisioning as well (11% by default).Change this size to the real capacity and thus reduce the memory
footprint significantly - with default op value it is approx.
110MB of DRAM less for every 1TB of media.Signed-off-by: Igor Konopko
Reviewed-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
A line is left unsigned to the blocks lists in case pblk_gc_line
returns an error.This moves the line back to be appropriate list, which can then be
picked up by the garbage collector.Signed-off-by: Igor Konopko
Reviewed-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Fixes the GC error case when moving a line back to closed state
while releasing additional references.Signed-off-by: Igor Konopko
Reviewed-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe
11 Apr, 2019
1 commit
-
The introduction of multipage bio vectors broke pblk's partial read
logic due to it not being prepared for multipage bio vectors.Use bio vector iterators instead of direct bio vector indexing.
Fixes: 07173c3ec276 ("block: enable multipage bvecs")
Reported-by: Klaus Jensen
Signed-off-by: Hans Holmberg
Updated description.
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe
07 Mar, 2019
1 commit
-
When calculating the maximun I/O size allowed into the buffer, consider
the write size (ws_opt) used by the write thread in order to cover the
case in which, due to flushes, the mem and subm pointers are disaligned
by (ws_opt - 1). This case currently translates into a stall when
an I/O of the largest possible size is submitted.Fixes: f9f9d1ae2c66 ("lightnvm: pblk: prevent stall due to wb threshold")
Signed-off-by: Javier González
Signed-off-by: Jens Axboe
11 Feb, 2019
1 commit
-
This patch fixes a race condition where a write is mapped to the last
sectors of a line. The write is synced to the device but the L2P is not
updated yet. When the line is garbage collected before the L2P update
is performed, the sectors are ignored by the GC logic and the line is
freed before all sectors are moved. When the L2P is finally updated, it
contains a mapping to a freed line, subsequent reads of the
corresponding LBAs fail.This patch introduces a per line counter specifying the number of
sectors that are synced to the device but have not been updated in the
L2P. Lines with a counter of greater than zero will not be selected
for GC.Signed-off-by: Heiner Litz
Reviewed-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe