06 Sep, 2019

2 commits


09 Aug, 2019

1 commit

  • A previous commit correctly removed set-but-not-read variables, but
    this left two new variables now unused. Kill them.

    Fixes: ba6f7da99aaf ("lightnvm: remove set but not used variables 'data_len' and 'rq_len'")
    Reported-by: Stephen Rothwell
    Signed-off-by: Jens Axboe

    Jens Axboe
     

08 Aug, 2019

1 commit

  • drivers/lightnvm/pblk-read.c: In function pblk_submit_read_gc:
    drivers/lightnvm/pblk-read.c:423:6: warning: variable data_len set but not used [-Wunused-but-set-variable]
    drivers/lightnvm/pblk-recovery.c: In function pblk_recov_scan_oob:
    drivers/lightnvm/pblk-recovery.c:368:15: warning: variable rq_len set but not used [-Wunused-but-set-variable]

    They are not used since commit 48e5da725581 ("lightnvm:
    move metadata mapping to lower level driver")

    Reported-by: Hulk Robot
    Signed-off-by: YueHaibing
    Signed-off-by: Jens Axboe

    YueHaibing
     

06 Aug, 2019

3 commits


21 Jun, 2019

2 commits

  • With gcc 4.1:

    drivers/lightnvm/core.c: In function ‘nvm_remove_tgt’:
    drivers/lightnvm/core.c:510: warning: ‘t’ is used uninitialized in this function

    Indeed, if no NVM devices have been registered, t will be an
    uninitialized pointer, and may be dereferenced later. A call to
    nvm_remove_tgt() can be triggered from userspace by issuing the
    NVM_DEV_REMOVE ioctl on the lightnvm control device.

    Fix this by preinitializing t to NULL.

    Fixes: 843f2edbdde085b4 ("lightnvm: do not remove instance under global lock")
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Geert Uytterhoeven
     
  • bio_add_pc_page() may merge pages when a bio is padded due to a flush.
    Fix iteration over the bio to free the correct pages in case of a merge.

    Signed-off-by: Heiner Litz
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Heiner Litz
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program see the file copying if not
    write to the free software foundation 675 mass ave cambridge ma
    02139 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 3 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531190112.675111872@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


07 May, 2019

26 commits

  • This patch replaces few remaining usages of rqd->ppa_list[] with
    existing nvm_rq_to_ppa_list() helpers. This is needed for theoretical
    devices with ws_min/ws_opt equal to 1.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • This patch changes the approach to handling partial read path.

    In old approach merging of data from round buffer and drive was fully
    made by drive. This had some disadvantages - code was complex and
    relies on bio internals, so it was hard to maintain and was strongly
    dependent on bio changes.

    In new approach most of the handling is done mostly by block layer
    functions such as bio_split(), bio_chain() and generic_make request()
    and generally is less complex and easier to maintain. Below some more
    details of the new approach.

    When read bio arrives, it is cloned for pblk internal purposes. All
    the L2P mapping, which includes copying data from round buffer to bio
    and thus bio_advance() calls is done on the cloned bio, so the original
    bio is untouched. If we found that we have partial read case, we
    still have original bio untouched, so we can split it and continue to
    process only first part of it in current context, when the rest will be
    called as separate bio request which is passed to generic_make_request()
    for further processing.

    Signed-off-by: Igor Konopko
    Reviewed-by: Heiner Litz
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Currently all the target instances are removed under global nvm_lock.
    This was needed to ensure that nvm_dev struct will not be freed by
    hot unplug event during target removal. However, current implementation
    has some drawbacks, since the same lock is used when new nvme subsystem
    is registered, so we can have a situation, that due to long process of
    target removal on drive A, registration (and listing in OS) of the
    drive B will take a lot of time, since it will wait for that lock.

    Now when we have kref which ensures that nvm_dev will not be freed in
    the meantime, we can easily get rid of this lock for a time when we are
    removing nvm targets.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • When creation process is still in progress, target is not yet on
    targets list. This causes a chance for removing whole lightnvm
    subsystem by calling nvm_unregister() in the meantime and finally by
    causing kernel panic inside target init function.

    This patch changes the behaviour by adding kref variable which tracks
    all the users of nvm_dev structure. When nvm_dev is allocated, kref
    value is set to 1. Then before every target creation the value is
    increased and decreased after target removal. The extra reference
    is decreased when nvm subsystem is unregistered.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • This patch ensures that smeta was fully written before even
    trying to read it based on chunk table state and write pointer.

    Signed-off-by: Igor Konopko
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • This patch is made in order to prepare read path for new approach to
    partial read handling, which is simpler in compare with previous one.

    The most important change is to move the handling of completed and
    failed bio from the pblk_make_rq() to particular read and write
    functions. This is needed, since after partial read path changes,
    sometimes completed/failed bio will be different from original one, so
    we cannot do this any longer in pblk_make_rq().

    Other changes are small read path refactor in order to reduce the size
    of the following patch with partial read changes.

    Generally the goal of this patch is not to change the functionality,
    but just to prepare the code for the following changes.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Currently when there is an IO error (or similar) on GC read path, pblk
    still move the line, which was currently under GC process to free state.
    Such a behaviour can lead to silent data mismatch issue.

    With this patch, the line which was under GC process on which some IO
    errors occurred, will be putted back to closed state (instead of free
    state as it was without this patch) and the L2P mapping for such a
    failed sectors will not be updated.

    Then in case of any user IOs to such a failed sectors, pblk would be
    able to return at least real IO error instead of stale data as it is
    right now.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Currently during pblk padding, there is internal IO timeout introduced,
    which is smaller than default NVMe timeout. This can lead to various
    use-after-free issues. Since in case of any IO timeouts NVMe and block
    layer will handle timeout by themselves and report it back to use,
    there is no need to keep this internal timeout in pblk.

    Signed-off-by: Igor Konopko
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • This patch changes the behaviour of recovery padding in order to
    support a case, when some IOs were already submitted to the drive and
    some next one are not submitted due to error returned.

    Currently in case of errors we simply exit the pad function without
    waiting for inflight IOs, which leads to panic on inflight IOs
    completion.

    After the changes we always wait for all the inflight IOs before
    exiting the function.

    Signed-off-by: Igor Konopko
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Read errors are not correctly propagated. Errors are cleared before
    returning control to the io submitter. Change the behaviour such that
    all read errors exept high ecc read warning status is returned
    appropriately.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • In case of OOB recovery, we can hit the scenario when all the data in
    line were written and some part of emeta was written too. In such
    a case pblk_update_line_wp() function will call pblk_alloc_page()
    function which will case to set left_msecs to value below zero
    (since this field does not track emeta region) and thus will lead to
    multiple kernel warnings. This patch fixes that issue.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • In case of write recovery path, there is a chance that writer thread
    is not active, kick immediately instead of waiting for timer.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • In pblk_rb_tear_down_check() the spinlock functions are not
    called in proper order.

    Fixes: a4bd217 ("lightnvm: physical block device (pblk) target")
    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • When we trigger nvm target remove during device hot unplug, there is
    a probability to hit a general protection fault. This is caused by use
    of nvm_dev thay may be freed from another (hot unplug) thread
    (in the nvm_unregister function).

    Introduce lock in nvme_ioctl_dev_remove function to prevent this
    situation.

    Signed-off-by: Marcin Dziegielewski
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Marcin Dziegielewski
     
  • In current implementation of l2p recovery, when we are after gc and we
    have open line, we are not setting current data line properly (we set
    last line from the device instead of last line ordered by seq_nr) and
    in consequence, kernel panic and data corruption.

    Signed-off-by: Marcin Dziegielewski
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Marcin Dziegielewski
     
  • For large size io where blk_queue_split needs to be called inside
    pblk_rw_io, results in bio leak as bio_endio is not called on the
    newly allocated. One way to observe this is to mounting ext4
    filesystem on the target and issuing 1MB io with dd, e.g., dd bs=1MB
    if=/dev/null of=/mount/myvolume. kmemleak reports:

    unreferenced object 0xffff88803d7d0100 (size 256):
    comm "kworker/u16:1", pid 68, jiffies 4294899333 (age 284.120s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 60 e8 31 81 88 ff ff .........`.1....
    01 40 00 00 06 06 00 00 00 00 00 00 05 00 00 00 .@..............
    backtrace:
    [] kmem_cache_alloc+0x204/0x3c0
    [] mempool_alloc_slab+0x1d/0x30
    [] mempool_alloc+0x83/0x220
    [] bio_alloc_bioset+0x229/0x320
    [] bio_clone_fast+0x26/0xc0
    [] bio_split+0x41/0x110
    [] blk_queue_split+0x349/0x930
    [] pblk_make_rq+0x1b5/0x1f0
    [] generic_make_request+0x2f9/0x690
    [] submit_bio+0x12e/0x1f0
    [] ext4_io_submit+0x64/0x80
    [] ext4_bio_write_page+0x32e/0x890
    [] mpage_submit_page+0x65/0xc0
    [] mpage_map_and_submit_buffers+0x171/0x330
    [] ext4_writepages+0xd5e/0x1650
    [] do_writepages+0x39/0xc0

    In case there is a need for a split, blk_queue_split returns the newly
    allocated bio to the caller by changing the value of pointer passed as
    a reference, while the original is passed to generic_make_requests.

    Although pblk_rw_io's local variable bio* has changed and passed to
    pblk_submit_read and pblk_write_to_cache, work is done on this new
    bio*, and pblk_rw_io returns NVM_IO_DONE, pblk_make_rq calls bio_endio
    on the old bio* because it passed bio pointer by value to pblk_rw_io.

    pblk_rw_io is unfolded into pblk_make_rq so that there is no copying
    of bio* and bio_endio is called on the correct bio*.

    Signed-off-by: Chansol Kim
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Chansol Kim
     
  • Current lightnvm and pblk implementation does not care about NVMe max
    data transfer size, which can be smaller than 64*K=256K. There are
    existing NVMe controllers which NVMe max data transfer size is lower
    that 256K (for example 128K, which happens for existing NVMe
    controllers which are NVMe spec compliant). Such a controllers are not
    able to handle command which contains 64 PPAs, since the the size of
    DMAed buffer will be above the capabilities of such a controller.

    Signed-off-by: Igor Konopko
    Reviewed-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Currently in case of read errors, bi_status is not set properly which
    leads to returning inproper data to layers above. This patch fix that
    by setting proper status in case of read errors.

    Also remove unnecessary warn_once(), which does not make sense
    in that place, since user bio is not used for interation with drive
    and thus bi_status will not be set here.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • L2P table can be huge in many cases, since it typically requires 1GB
    of DRAM for 1TB of drive. When there is not enough memory available,
    OOM killer turns on and kills random processes, which can be very
    annoying for users.

    This patch changes the flag for L2P table allocation on order to handle
    this situation in more user friendly way.

    GFP_KERNEL and __GPF_HIGHMEM are default flags used in parameterless
    vmalloc() calls, so they are also keeped in that patch. Additionally
    __GFP_NOWARN flag is added in order to hide very long dmesg warn in
    case of the allocation failures. The most important flag introduced
    in that patch is __GFP_RETRY_MAYFAIL, which would cause allocator
    to try use free memory and if not available to drop caches, but not
    to run OOM killer.

    Signed-off-by: Igor Konopko
    Reviewed-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • The sector bits in the erase command may be uninitialized are
    uninitialized, causing the erase LBA to be unaligned to the chunk size.

    This is unexpected situation, since erase shall always be chunk
    aligned based on OCSSD the 2.0 specification.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • In the pblk_put_line_back function, a race condition with
    __pblk_map_invalidate can make a line not part of any lists.

    Fix gc_list by resetting it to null fixes the above issue.

    Fixes: a4bd217 ("lightnvm: physical block device (pblk) target")
    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Currently when we fail on rq data allocation in gc, it skips moving
    active data and moves line straigt to its free state. Losing user
    data in the process.

    Move the data allocation to an earlier phase of GC, where we can still
    fail gracefully by moving line back to the closed state.

    Signed-off-by: Igor Konopko
    Reviewed-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • smeta_ssec field in pblk_line is never used after it was replaced by
    the function pblk_line_smeta_start().

    Signed-off-by: Igor Konopko
    Reviewed-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Currently L2P map size is calculated based on the total number of
    available sectors, which is redundant, since it contains mapping for
    overprovisioning as well (11% by default).

    Change this size to the real capacity and thus reduce the memory
    footprint significantly - with default op value it is approx.
    110MB of DRAM less for every 1TB of media.

    Signed-off-by: Igor Konopko
    Reviewed-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • A line is left unsigned to the blocks lists in case pblk_gc_line
    returns an error.

    This moves the line back to be appropriate list, which can then be
    picked up by the garbage collector.

    Signed-off-by: Igor Konopko
    Reviewed-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Fixes the GC error case when moving a line back to closed state
    while releasing additional references.

    Signed-off-by: Igor Konopko
    Reviewed-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     

11 Apr, 2019

1 commit

  • The introduction of multipage bio vectors broke pblk's partial read
    logic due to it not being prepared for multipage bio vectors.

    Use bio vector iterators instead of direct bio vector indexing.

    Fixes: 07173c3ec276 ("block: enable multipage bvecs")
    Reported-by: Klaus Jensen
    Signed-off-by: Hans Holmberg
    Updated description.
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     

07 Mar, 2019

1 commit

  • When calculating the maximun I/O size allowed into the buffer, consider
    the write size (ws_opt) used by the write thread in order to cover the
    case in which, due to flushes, the mem and subm pointers are disaligned
    by (ws_opt - 1). This case currently translates into a stall when
    an I/O of the largest possible size is submitted.

    Fixes: f9f9d1ae2c66 ("lightnvm: pblk: prevent stall due to wb threshold")

    Signed-off-by: Javier González
    Signed-off-by: Jens Axboe

    Javier González
     

11 Feb, 2019

1 commit

  • This patch fixes a race condition where a write is mapped to the last
    sectors of a line. The write is synced to the device but the L2P is not
    updated yet. When the line is garbage collected before the L2P update
    is performed, the sectors are ignored by the GC logic and the line is
    freed before all sectors are moved. When the L2P is finally updated, it
    contains a mapping to a freed line, subsequent reads of the
    corresponding LBAs fail.

    This patch introduces a per line counter specifying the number of
    sectors that are synced to the device but have not been updated in the
    L2P. Lines with a counter of greater than zero will not be selected
    for GC.

    Signed-off-by: Heiner Litz
    Reviewed-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Heiner Litz