25 Sep, 2020

1 commit

  • BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to
    decided if ->rw_page can be used on a block device. Just check up for
    the method instead. The only complication is that zram needs a second
    set of block_device_operations as it can switch between modes that
    actually support ->rw_page and those who don't.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

02 Sep, 2020

1 commit

  • The nvdimm block driver abuse revalidate_disk in a strange way, and
    totally unrelated to what other drivers do. Simplify this by just
    calling nvdimm_revalidate_disk (which seems rather misnamed) from the
    probe routines, as the additional bdev size revalidation is pointless
    at this point, and remove the revalidate_disk methods given that
    it can only be triggered from add_disk, which is right before the
    manual calls.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Josef Bacik
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

15 Aug, 2020

1 commit

  • This function returns the number of bytes in a THP. It is like
    page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE
    is disabled.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Reviewed-by: William Kucharski
    Reviewed-by: Zi Yan
    Cc: David Hildenbrand
    Cc: Mike Kravetz
    Cc: "Kirill A. Shutemov"
    Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

01 Jul, 2020

1 commit

  • The make_request_fn is a little weird in that it sits directly in
    struct request_queue instead of an operation vector. Replace it with
    a block_device_operations method called submit_bio (which describes much
    better what it does). Also remove the request_queue argument to it, as
    the queue can be derived pretty trivially from the bio.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

14 Jun, 2020

1 commit


27 May, 2020

1 commit


14 May, 2020

1 commit


28 Mar, 2020

1 commit

  • Current make_request based drivers use either blk_alloc_queue_node or
    blk_alloc_queue to allocate a queue, and then set up the make_request_fn
    function pointer and a few parameters using the blk_queue_make_request
    helper. Simplify this by passing the make_request pointer to
    blk_alloc_queue, and while at it merge the _node variant into the main
    helper by always passing a node_id, and remove the superfluous gfp_mask
    parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

18 Nov, 2019

1 commit

  • drivers/nvdimm/btt.c: In function 'btt_read_pg':
    drivers/nvdimm/btt.c:1264:8: warning: variable 'rc' set but not used
    [-Wunused-but-set-variable]
    int rc;
    ^~

    Add a ratelimited message in case a storm of errors is encountered.

    Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing")
    Signed-off-by: Qian Cai
    Reviewed-by: Vishal Verma
    Link: https://lore.kernel.org/r/1572530719-32161-1-git-send-email-cai@lca.pw
    Signed-off-by: Dan Williams

    Qian Cai
     

15 Nov, 2019

1 commit

  • The nvdimm core currently maps the full namespace to an ioremap range
    while probing the namespace mode. This can result in probe failures on
    architectures that have limited ioremap space.

    For example, with a large btt namespace that consumes most of I/O remap
    range, depending on the sequence of namespace initialization, the user
    can find a pfn namespace initialization failure due to unavailable I/O
    remap space which nvdimm core uses for temporary mapping.

    nvdimm core can avoid this failure by only mapping the reserved info
    block area to check for pfn superblock type and map the full namespace
    resource only before using the namespace.

    Given that personalities like BTT can be layered on top of any namespace
    type create a generic form of devm_nsio_enable (devm_namespace_enable)
    and use it inside the per-personality attach routines. Now
    devm_namespace_enable() is always paired with disable unless the mapping
    is going to be used for long term runtime access.

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20191017073308.32645-1-aneesh.kumar@linux.ibm.com
    [djbw: reworks to move devm_namespace_{en,dis}able into *attach helpers]
    Reported-by: kbuild test robot
    Link: https://lore.kernel.org/r/20191031105741.102793-2-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     

25 Sep, 2019

1 commit

  • nd_label->dpa issue was observed when trying to enable the namespace created
    with little-endian kernel on a big-endian kernel. That made me run
    `sparse` on the rest of the code and other changes are the result of that.

    Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing")
    Fixes: 9dedc73a4658 ("libnvdimm/btt: Fix LBA masking during 'free list' population")
    Reviewed-by: Vishal Verma
    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190809074726.27815-1-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms and conditions of the gnu general public license
    version 2 as published by the free software foundation this program
    is distributed in the hope it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 263 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

01 Mar, 2019

2 commits

  • The Linux BTT implementation assumes that log entries will never have
    the 'zero' flag set, and indeed it never sets that flag for log entries
    itself.

    However, the UEFI spec is ambiguous on the exact format of the LBA field
    of a log entry, specifically as to whether it should include the
    additional flag bits or not. While a zero bit doesn't make sense in the
    context of a log entry, other BTT implementations might still have it set.

    If an implementation does happen to have it set, we would happily read
    it in as the next block to write to for writes. Since a high bit is set,
    it pushes the block number out of the range of an 'arena', and we fail
    such a write with an EIO.

    Follow the robustness principle, and tolerate such implementations by
    stripping out the zero flag when populating the free list during
    initialization. Additionally, use the same stripped out entries for
    detection of incomplete writes and map restoration that happens at this
    stage.

    Add a sysfs file 'log_zero_flags' that indicates the ability to accept
    such a layout to userspace applications. This enables 'ndctl
    check-namespace' to recognize whether the kernel is able to handle zero
    flags, or whether it should attempt a fix-up under the --repair option.

    Cc: Dan Williams
    Reported-by: Dexuan Cui
    Reported-by: Pedro d'Aquino Filocre F S Barbuda
    Tested-by: Dexuan Cui
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • We call btt_log_read() twice, once to get the 'old' log entry, and again
    to get the 'new' entry. However, we have no use for the 'old' entry, so
    remove it.

    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

28 Sep, 2018

1 commit

  • Update device_add_disk() to take an 'groups' argument so that
    individual drivers can register a device with additional sysfs
    attributes.
    This avoids race condition the driver would otherwise have if these
    groups were to be created with sysfs_add_groups().

    Signed-off-by: Martin Wilck
    Signed-off-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

18 Jul, 2018

1 commit

  • c11f0c0b5bb9 ("block/mm: make bdev_ops->rw_page() take a bool for
    read/write") replaced @op with boolean @is_write, which limited the
    amount of information going into ->rw_page() and more importantly
    page_endio(), which removed the need to expose block internals to mm.

    Unfortunately, we want to track discards separately and @is_write
    isn't enough information. This patch updates bdev_ops->rw_page() to
    take REQ_OP instead but leaves page_endio() to take bool @is_write.
    This allows the block part of operations to have enough information
    while not leaking it to mm.

    Signed-off-by: Tejun Heo
    Cc: Mike Christie
    Cc: Minchan Kim
    Cc: Dan Williams
    Signed-off-by: Jens Axboe

    Tejun Heo
     

06 Apr, 2018

1 commit

  • Pull block layer updates from Jens Axboe:
    "It's a pretty quiet round this time, which is nice. This contains:

    - series from Bart, cleaning up the way we set/test/clear atomic
    queue flags.

    - series from Bart, fixing races between gendisk and queue
    registration and removal.

    - set of bcache fixes and improvements from various folks, by way of
    Michael Lyle.

    - set of lightnvm updates from Matias, most of it being the 1.2 to
    2.0 transition.

    - removal of unused DIO flags from Nikolay.

    - blk-mq/sbitmap memory ordering fixes from Omar.

    - divide-by-zero fix for BFQ from Paolo.

    - minor documentation patches from Randy.

    - timeout fix from Tejun.

    - Alpha "can't write a char atomically" fix from Mikulas.

    - set of NVMe fixes by way of Keith.

    - bsg and bsg-lib improvements from Christoph.

    - a few sed-opal fixes from Jonas.

    - cdrom check-disk-change deadlock fix from Maurizio.

    - various little fixes, comment fixes, etc from various folks"

    * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits)
    blk-mq: Directly schedule q->timeout_work when aborting a request
    blktrace: fix comment in blktrace_api.h
    lightnvm: remove function name in strings
    lightnvm: pblk: remove some unnecessary NULL checks
    lightnvm: pblk: don't recover unwritten lines
    lightnvm: pblk: implement 2.0 support
    lightnvm: pblk: implement get log report chunk
    lightnvm: pblk: rename ppaf* to addrf*
    lightnvm: pblk: check for supported version
    lightnvm: implement get log report chunk helpers
    lightnvm: make address conversions depend on generic device
    lightnvm: add support for 2.0 address format
    lightnvm: normalize geometry nomenclature
    lightnvm: complete geo structure with maxoc*
    lightnvm: add shorten OCSSD version in geo
    lightnvm: add minor version to generic geometry
    lightnvm: simplify geometry structure
    lightnvm: pblk: refactor init/exit sequences
    lightnvm: Avoid validation of default op value
    lightnvm: centralize permission check for lightnvm ioctl
    ...

    Linus Torvalds
     

09 Mar, 2018

1 commit

  • This patch has been generated as follows:

    for verb in set_unlocked clear_unlocked set clear; do
    replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \
    $(git grep -lw queue_flag_${verb} drivers block/bsg*)
    done

    Except for protecting all queue flag changes with the queue lock
    this patch does not change any functionality.

    Cc: Mike Snitzer
    Cc: Shaohua Li
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Ming Lei
    Signed-off-by: Bart Van Assche
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Johannes Thumshirn
    Acked-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

08 Mar, 2018

1 commit

  • Prior to 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
    we needed to temporarily add a zero-capacity disk before registering for
    blk-integrity. But adding a zero-capacity disk caused the partition
    table scanning to bail early, and this resulted in partitions not coming
    up after a probe of the BTT or blk namespaces.

    We can now register for integrity before the disk has been added, and
    this fixes the rescan problems.

    Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
    Reported-by: Dariusz Dokupil
    Cc:
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

03 Feb, 2018

1 commit


20 Jan, 2018

1 commit

  • When a sector mode namespace is initially created, the arena's err_lock
    is not initialized. If, on the other hand, the namespace already
    exists, the mutex is initialized. To fix the issue, I moved the mutex
    initialization into the arena_alloc, which is called by both
    discover_arenas and create_arenas.

    This was discovered on an older kernel where mutex_trylock checks the
    count to determine whether the lock is held. Because the data structure
    is kzalloc-d, that count was 0 (held), and I/O to the device would hang
    forever waiting for the lock to be released (see btt_write_pg, for
    example). Current kernels have a different mutex implementation that
    checks for a non-null owner, and so this doesn't show up as a problem.
    If that lock were ever contended, it might cause issues, but you'd have
    to be really unlucky, I think.

    Signed-off-by: Jeff Moyer
    Signed-off-by: Dan Williams

    Jeff Moyer
     

22 Dec, 2017

1 commit

  • Due to a spec misinterpretation, the Linux implementation of the BTT log
    area had different padding scheme from other implementations, such as
    UEFI and NVML.

    This fixes the padding scheme, and defaults to it for new BTT layouts.
    We attempt to detect the padding scheme in use when probing for an
    existing BTT. If we detect the older/incompatible scheme, we continue
    using it.

    Reported-by: Juston Li
    Cc: Dan Williams
    Cc:
    Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

16 Nov, 2017

1 commit

  • As discussed at

    https://lkml.kernel.org/r/

    someday we will remove rw_page(). If so, we need something to detect
    such super-fast storage on which synchronous IO operations like the
    current rw_page are always a win.

    Introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices. With it, we
    could use various optimization techniques.

    Link: http://lkml.kernel.org/r/1505886205-9671-3-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Cc: Ilya Dryomov
    Cc: Jens Axboe
    Cc: Sergey Senozhatsky
    Cc: Huang Ying
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

12 Sep, 2017

1 commit

  • Pull libnvdimm from Dan Williams:
    "A rework of media error handling in the BTT driver and other updates.
    It has appeared in a few -next releases and collected some late-
    breaking build-error and warning fixups as a result.

    Summary:

    - Media error handling support in the Block Translation Table (BTT)
    driver is reworked to address sleeping-while-atomic locking and
    memory-allocation-context conflicts.

    - The dax_device lookup overhead for xfs and ext4 is moved out of the
    iomap hot-path to a mount-time lookup.

    - A new 'ecc_unit_size' sysfs attribute is added to advertise the
    read-modify-write boundary property of a persistent memory range.

    - Preparatory fix-ups for arm and powerpc pmem support are included
    along with other miscellaneous fixes"

    * tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
    libnvdimm, btt: fix format string warnings
    libnvdimm, btt: clean up warning and error messages
    ext4: fix null pointer dereference on sbi
    libnvdimm, nfit: move the check on nd_reserved2 to the endpoint
    dax: fix FS_DAX=n BLOCK=y compilation
    libnvdimm: fix integer overflow static analysis warning
    libnvdimm, nd_blk: remove mmio_flush_range()
    libnvdimm, btt: rework error clearing
    libnvdimm: fix potential deadlock while clearing errors
    libnvdimm, btt: cache sector_size in arena_info
    libnvdimm, btt: ensure that flags were also unchanged during a map_read
    libnvdimm, btt: refactor map entry operations with macros
    libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
    libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute
    ext4: perform dax_device lookup at mount
    ext2: perform dax_device lookup at mount
    xfs: perform dax_device lookup at mount
    dax: introduce a fs_dax_get_by_bdev() helper
    libnvdimm, btt: check memory allocation failure
    libnvdimm, label: fix index block size calculation
    ...

    Linus Torvalds
     

10 Sep, 2017

1 commit

  • Fix format warnings (seen on i386) in nvdimm/btt.c:

    ../drivers/nvdimm/btt.c: In function ‘btt_map_init’:
    ../drivers/nvdimm/btt.c:430:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=]
    dev_WARN_ONCE(to_dev(arena), size < 512,
    ^
    ../drivers/nvdimm/btt.c: In function ‘btt_log_init’:
    ../drivers/nvdimm/btt.c:474:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=]
    dev_WARN_ONCE(to_dev(arena), size < 512,
    ^

    Fixes: 86652d2eb347 ("libnvdimm, btt: clean up warning and error messages")
    Reported-by: Arnd Bergmann
    Reported-by: kbuild test robot
    Cc: Vishal Verma
    Signed-off-by: Randy Dunlap
    Signed-off-by: Dan Williams

    Randy Dunlap
     

08 Sep, 2017

1 commit

  • Convert all WARN* style messages to dev_WARN, and for errors in the IO
    paths, use dev_err_ratelimited. Also remove some BUG_ONs in the IO path
    and replace them with the above - no need to crash the machine in case
    of an unaligned IO.

    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Vishal Verma
     

07 Sep, 2017

1 commit

  • The .rw_page in struct block_device_operations is used by the swap
    subsystem to read/write the page contents from/into the corresponding
    swap slot in the swap device. To support the THP (Transparent Huge
    Page) swap optimization, the .rw_page is enhanced to support to
    read/write THP if possible.

    Link: http://lkml.kernel.org/r/20170724051840.2309-6-ying.huang@intel.com
    Signed-off-by: "Huang, Ying"
    Reviewed-by: Ross Zwisler [for brd.c, zram_drv.c, pmem.c]
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dan Williams
    Cc: Vishal L Verma
    Cc: Jens Axboe
    Cc: "Kirill A . Shutemov"
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: Shaohua Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     

01 Sep, 2017

5 commits

  • Clearing errors or badblocks during a BTT write requires sending an ACPI
    DSM, which means potentially sleeping. Since a BTT IO happens in atomic
    context (preemption disabled, spinlocks may be held), we cannot perform
    error clearing in the course of an IO. Due to this error clearing for
    BTT IOs has hitherto been disabled.

    In this patch we move error clearing out of the atomic section, and thus
    re-enable error clearing with BTTs. When we are about to add a block to
    the free list, we check if it was previously marked as an error, and if
    it was, we add it to the freelist, but also set a flag that says error
    clearing will be required. We then drop the lane (ending the atomic
    context), and send a zero buffer so that the error can be cleared. The
    error flag in the free list is protected by the nd 'lane', and is set
    only be a thread while it holds that lane. When the error is cleared,
    the flag is cleared, but while holding a mutex for that freelist index.

    When writing, we check for two things -
    1/ If the freelist mutex is held or if the error flag is set. If so,
    this is an error block that is being (or about to be) cleared.
    2/ If the block is a known badblock based on nsio->bb

    The second check is required because the BTT map error flag for a map
    entry only gets set when an error LBA is read. If we write to a new
    location that may not have the map error flag set, but still might be in
    the region's badblock list, we can trigger an EIO on the write, which is
    undesirable and completely avoidable.

    Cc: Jeff Moyer
    Cc: Toshi Kani
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • In preparation for the error clearing rework, add sector_size in the
    arena_info struct.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • In btt_map_read, we read the map twice to make sure that the map entry
    didn't change after we added it to the read tracking table. In
    anticipation of expanding the use of the error bit, also make sure that
    the error and zero flags are constant across the two map reads.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • Add helpers for converting a raw map entry to just the block number, or
    either of the 'e' or 'z' flags in preparation for actually using the
    error flag to mark blocks with media errors.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • The IO context conversion for rw_bytes missed a case in the BTT write
    path (btt_map_write) which should've been marked as atomic.

    In reality this should not cause a problem, because map writes are to
    small for nsio_rw_bytes to attempt error clearing, but it should be
    fixed for posterity.

    Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
    things like the nfit unit tests, which don't actually sleep, can catch
    bugs like this.

    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

30 Aug, 2017

1 commit

  • Check memory allocation failures and return -ENOMEM in such cases, as
    already done few lines below for another memory allocation.

    This avoids NULL pointers dereference.

    Cc:
    Fixes: 14e494542636 ("libnvdimm, btt: BTT updates for UEFI 2.7 format")
    Signed-off-by: Christophe JAILLET
    Reviewed-by: Vishal Verma
    Signed-off-by: Dan Williams

    Christophe Jaillet
     

12 Jul, 2017

1 commit

  • Pull more block updates from Jens Axboe:
    "This is a followup for block changes, that didn't make the initial
    pull request. It's a bit of a mixed bag, this contains:

    - A followup pull request from Sagi for NVMe. Outside of fixups for
    NVMe, it also includes a series for ensuring that we properly
    quiesce hardware queues when browsing live tags.

    - Set of integrity fixes from Dmitry (mostly), fixing various issues
    for folks using DIF/DIX.

    - Fix for a bug introduced in cciss, with the req init changes. From
    Christoph.

    - Fix for a bug in BFQ, from Paolo.

    - Two followup fixes for lightnvm/pblk from Javier.

    - Depth fix from Ming for blk-mq-sched.

    - Also from Ming, performance fix for mtip32xx that was introduced
    with the dynamic initialization of commands"

    * 'for-linus' of git://git.kernel.dk/linux-block: (44 commits)
    block: call bio_uninit in bio_endio
    nvmet: avoid unneeded assignment of submit_bio return value
    nvme-pci: add module parameter for io queue depth
    nvme-pci: compile warnings in nvme_alloc_host_mem()
    nvmet_fc: Accept variable pad lengths on Create Association LS
    nvme_fc/nvmet_fc: revise Create Association descriptor length
    lightnvm: pblk: remove unnecessary checks
    lightnvm: pblk: control I/O flow also on tear down
    cciss: initialize struct scsi_req
    null_blk: fix error flow for shared tags during module_init
    block: Fix __blkdev_issue_zeroout loop
    nvme-rdma: unconditionally recycle the request mr
    nvme: split nvme_uninit_ctrl into stop and uninit
    virtio_blk: quiesce/unquiesce live IO when entering PM states
    mtip32xx: quiesce request queues to make sure no submissions are inflight
    nbd: quiesce request queues to make sure no submissions are inflight
    nvme: kick requeue list when requeueing a request instead of when starting the queues
    nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
    nvme-loop: quiesce/unquiesce admin_q instead of start/stop its hw queues
    nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues
    ...

    Linus Torvalds
     

08 Jul, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "libnvdimm updates for the latest ACPI and UEFI specifications. This
    pull request also includes new 'struct dax_operations' enabling to
    undo the abuse of copy_user_nocache() for copy operations to pmem.

    The dax work originally missed 4.12 to address concerns raised by Al.

    Summary:

    - Introduce the _flushcache() family of memory copy helpers and use
    them for persistent memory write operations on x86. The
    _flushcache() semantic indicates that the cache is either bypassed
    for the copy operation (movnt) or any lines dirtied by the copy
    operation are written back (clwb, clflushopt, or clflush).

    - Extend dax_operations with ->copy_from_iter() and ->flush()
    operations. These operations and other infrastructure updates allow
    all persistent memory specific dax functionality to be pushed into
    libnvdimm and the pmem driver directly. It also allows dax-specific
    sysfs attributes to be linked to a host device, for example:
    /sys/block/pmem0/dax/write_cache

    - Add support for the new NVDIMM platform/firmware mechanisms
    introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
    namespace label format, extensions to the address-range-scrub
    command set, new error injection commands, and a new BTT
    (block-translation-table) layout. These updates support inter-OS
    and pre-OS compatibility.

    - Fix a longstanding memory corruption bug in nfit_test.

    - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
    capable.

    - Miscellaneous fixes and small updates across libnvdimm and the nfit
    driver.

    Acknowledgements that came after the branch was pushed: commit
    6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks'
    sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
    "

    * tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
    libnvdimm, namespace: record 'lbasize' for pmem namespaces
    acpi/nfit: Issue Start ARS to retrieve existing records
    libnvdimm: New ACPI 6.2 DSM functions
    acpi, nfit: Show bus_dsm_mask in sysfs
    libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
    acpi, nfit: Enable DSM pass thru for root functions.
    libnvdimm: passthru functions clear to send
    libnvdimm, btt: convert some info messages to warn/err
    libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
    libnvdimm: fix the clear-error check in nsio_rw_bytes
    libnvdimm, btt: fix btt_rw_page not returning errors
    acpi, nfit: quiet invalid block-aperture-region warnings
    libnvdimm, btt: BTT updates for UEFI 2.7 format
    acpi, nfit: constify *_attribute_group
    libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
    libnvdimm, pmem, dax: export a cache control attribute
    dax: convert to bitmask for flags
    dax: remove default copy_from_iter fallback
    libnvdimm, nfit: enable support for volatile ranges
    libnvdimm, pmem: fix persistence warning
    ...

    Linus Torvalds
     

04 Jul, 2017

2 commits

  • Currently if some one try to advance bvec beyond it's size we simply
    dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
    This simply means that we endup dereferencing/corrupting random memory
    region.

    Sane reaction would be to propagate error back to calling context
    But bvec_iter_advance's calling context is not always good for error
    handling. For safity reason let truncate iterator size to zero which
    will break external iteration loop which prevent us from unpredictable
    memory range corruption. And even it caller ignores an error, it will
    corrupt it's own bvecs, not others.

    This patch does:
    - Return error back to caller with hope that it will react on this
    - Truncate iterator size

    Code was added long time ago here 4550dd6c, luckily no one hit it
    in real life :)

    Signed-off-by: Dmitry Monakhov
    Reviewed-by: Ming Lei
    Reviewed-by: Martin K. Petersen
    [hch: switch to true/false returns instead of errno values]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     
  • Currently all integrity prep hooks are open-coded, and if prepare fails
    we ignore it's code and fail bio with EIO. Let's return real error to
    upper layer, so later caller may react accordingly.

    In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
    so it is reasonable to fold it in to one function.

    Signed-off-by: Dmitry Monakhov
    Reviewed-by: Martin K. Petersen
    [hch: merged with the latest block tree,
    return bool from bio_integrity_prep]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

01 Jul, 2017

1 commit


30 Jun, 2017

2 commits

  • btt_rw_page was not propagating errors frm btt_do_bvec, resulting in any
    IO errors via the rw_page path going unnoticed. the pmem driver recently
    fixed this in e10624f pmem: fail io-requests to known bad blocks
    but same problem in BTT went neglected.

    Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
    Cc:
    Cc: Toshi Kani
    Cc: Dan Williams
    Cc: Jeff Moyer
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • The UEFI 2.7 specification defines an updated BTT metadata format,
    bumping the revision to 2.0. Add support for the new format, while
    retaining compatibility for the old 1.1 format.

    Cc: Toshi Kani
    Cc: Linda Knippers
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma