18 Jul, 2017

1 commit

  • __add_badblock_range() does not account sector alignment when
    it sets 'num_sectors'. Therefore, an ARS error record range
    spanning across two sectors is set to a single sector length,
    which leaves the 2nd sector unprotected.

    Change __add_badblock_range() to set 'num_sectors' properly.

    Cc:
    Fixes: 0caeef63e6d2 ("libnvdimm: Add a poison list and export badblocks")
    Signed-off-by: Toshi Kani
    Reviewed-by: Vishal Verma
    Signed-off-by: Dan Williams

    Toshi Kani
     

12 Jul, 2017

1 commit

  • Pull more block updates from Jens Axboe:
    "This is a followup for block changes, that didn't make the initial
    pull request. It's a bit of a mixed bag, this contains:

    - A followup pull request from Sagi for NVMe. Outside of fixups for
    NVMe, it also includes a series for ensuring that we properly
    quiesce hardware queues when browsing live tags.

    - Set of integrity fixes from Dmitry (mostly), fixing various issues
    for folks using DIF/DIX.

    - Fix for a bug introduced in cciss, with the req init changes. From
    Christoph.

    - Fix for a bug in BFQ, from Paolo.

    - Two followup fixes for lightnvm/pblk from Javier.

    - Depth fix from Ming for blk-mq-sched.

    - Also from Ming, performance fix for mtip32xx that was introduced
    with the dynamic initialization of commands"

    * 'for-linus' of git://git.kernel.dk/linux-block: (44 commits)
    block: call bio_uninit in bio_endio
    nvmet: avoid unneeded assignment of submit_bio return value
    nvme-pci: add module parameter for io queue depth
    nvme-pci: compile warnings in nvme_alloc_host_mem()
    nvmet_fc: Accept variable pad lengths on Create Association LS
    nvme_fc/nvmet_fc: revise Create Association descriptor length
    lightnvm: pblk: remove unnecessary checks
    lightnvm: pblk: control I/O flow also on tear down
    cciss: initialize struct scsi_req
    null_blk: fix error flow for shared tags during module_init
    block: Fix __blkdev_issue_zeroout loop
    nvme-rdma: unconditionally recycle the request mr
    nvme: split nvme_uninit_ctrl into stop and uninit
    virtio_blk: quiesce/unquiesce live IO when entering PM states
    mtip32xx: quiesce request queues to make sure no submissions are inflight
    nbd: quiesce request queues to make sure no submissions are inflight
    nvme: kick requeue list when requeueing a request instead of when starting the queues
    nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
    nvme-loop: quiesce/unquiesce admin_q instead of start/stop its hw queues
    nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues
    ...

    Linus Torvalds
     

08 Jul, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "libnvdimm updates for the latest ACPI and UEFI specifications. This
    pull request also includes new 'struct dax_operations' enabling to
    undo the abuse of copy_user_nocache() for copy operations to pmem.

    The dax work originally missed 4.12 to address concerns raised by Al.

    Summary:

    - Introduce the _flushcache() family of memory copy helpers and use
    them for persistent memory write operations on x86. The
    _flushcache() semantic indicates that the cache is either bypassed
    for the copy operation (movnt) or any lines dirtied by the copy
    operation are written back (clwb, clflushopt, or clflush).

    - Extend dax_operations with ->copy_from_iter() and ->flush()
    operations. These operations and other infrastructure updates allow
    all persistent memory specific dax functionality to be pushed into
    libnvdimm and the pmem driver directly. It also allows dax-specific
    sysfs attributes to be linked to a host device, for example:
    /sys/block/pmem0/dax/write_cache

    - Add support for the new NVDIMM platform/firmware mechanisms
    introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
    namespace label format, extensions to the address-range-scrub
    command set, new error injection commands, and a new BTT
    (block-translation-table) layout. These updates support inter-OS
    and pre-OS compatibility.

    - Fix a longstanding memory corruption bug in nfit_test.

    - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
    capable.

    - Miscellaneous fixes and small updates across libnvdimm and the nfit
    driver.

    Acknowledgements that came after the branch was pushed: commit
    6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks'
    sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
    "

    * tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
    libnvdimm, namespace: record 'lbasize' for pmem namespaces
    acpi/nfit: Issue Start ARS to retrieve existing records
    libnvdimm: New ACPI 6.2 DSM functions
    acpi, nfit: Show bus_dsm_mask in sysfs
    libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
    acpi, nfit: Enable DSM pass thru for root functions.
    libnvdimm: passthru functions clear to send
    libnvdimm, btt: convert some info messages to warn/err
    libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
    libnvdimm: fix the clear-error check in nsio_rw_bytes
    libnvdimm, btt: fix btt_rw_page not returning errors
    acpi, nfit: quiet invalid block-aperture-region warnings
    libnvdimm, btt: BTT updates for UEFI 2.7 format
    acpi, nfit: constify *_attribute_group
    libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
    libnvdimm, pmem, dax: export a cache control attribute
    dax: convert to bitmask for flags
    dax: remove default copy_from_iter fallback
    libnvdimm, nfit: enable support for volatile ranges
    libnvdimm, pmem: fix persistence warning
    ...

    Linus Torvalds
     

04 Jul, 2017

4 commits

  • Dan Williams
     
  • Commit f979b13c3cc5 "libnvdimm, label: honor the lba size specified in
    v1.2 labels") neglected to update the 'lbasize' in the label when the
    namespace sector_size attribute was written. We need this value in the
    label for inter-OS / pre-OS compatibility.

    Fixes: f979b13c3cc5 ("libnvdimm, label: honor the lba size specified in v1.2 labels")
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Currently if some one try to advance bvec beyond it's size we simply
    dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
    This simply means that we endup dereferencing/corrupting random memory
    region.

    Sane reaction would be to propagate error back to calling context
    But bvec_iter_advance's calling context is not always good for error
    handling. For safity reason let truncate iterator size to zero which
    will break external iteration loop which prevent us from unpredictable
    memory range corruption. And even it caller ignores an error, it will
    corrupt it's own bvecs, not others.

    This patch does:
    - Return error back to caller with hope that it will react on this
    - Truncate iterator size

    Code was added long time ago here 4550dd6c, luckily no one hit it
    in real life :)

    Signed-off-by: Dmitry Monakhov
    Reviewed-by: Ming Lei
    Reviewed-by: Martin K. Petersen
    [hch: switch to true/false returns instead of errno values]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     
  • Currently all integrity prep hooks are open-coded, and if prepare fails
    we ignore it's code and fail bio with EIO. Let's return real error to
    upper layer, so later caller may react accordingly.

    In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
    so it is reasonable to fold it in to one function.

    Signed-off-by: Dmitry Monakhov
    Reviewed-by: Martin K. Petersen
    [hch: merged with the latest block tree,
    return bool from bio_integrity_prep]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

01 Jul, 2017

4 commits

  • Have dsm functions called via the pass thru mechanism also
    be checked against clear to send.

    Signed-off-by: Jerry Hoemann
    Signed-off-by: Dan Williams

    Jerry Hoemann
     
  • Some critical messages such as IO errors, metadata failures were printed
    with dev_info. Make them louder by upgrading them to dev_warn or
    dev_error.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • We need to hold a reference on the 'dirent' until we are sure there are
    no more notifications that will be sent. As noted in the new comments we
    take advantage of the fact that the references are taken and dropped
    under device_lock() and that nd_device_notify() holds device_lock() over
    new badblocks notifications. The notifications that happen when
    badblocks are cleared only occur while the device is active.

    Also take the opportunity to fix up the error messages to report the
    user visible effect of a sysfs_get_dirent() failure.

    Fixes: 975750a98c26 ("libnvdimm, pmem: Add sysfs notifications to badblocks")
    Cc: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • A leftover from the 'bandaid' fix that disabled BTT error clearing in
    rw_bytes resulted in an incorrect check. After we converted these checks
    over to use the NVDIMM_IO_ATOMIC flag, the ndns->claim check was both
    redundant, and incorrect. Remove it.

    Fixes: 3ae3d67ba705 ("libnvdimm: add an atomic vs process context flag to rw_bytes")
    Cc:
    Cc: Dave Jiang
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

30 Jun, 2017

5 commits

  • btt_rw_page was not propagating errors frm btt_do_bvec, resulting in any
    IO errors via the rw_page path going unnoticed. the pmem driver recently
    fixed this in e10624f pmem: fail io-requests to known bad blocks
    but same problem in BTT went neglected.

    Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
    Cc:
    Cc: Toshi Kani
    Cc: Dan Williams
    Cc: Jeff Moyer
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • This state is already visible by userspace since the BLK region will not
    be enabled, and it is otherwise benign as it usually indicates that the
    DIMM is not configured.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • The UEFI 2.7 specification defines an updated BTT metadata format,
    bumping the revision to 2.0. Add support for the new format, while
    retaining compatibility for the old 1.1 format.

    Cc: Toshi Kani
    Cc: Linda Knippers
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • The pmem driver attaches to both persistent and volatile memory ranges
    advertised by the ACPI NFIT. When the region is volatile it is redundant
    to spend cycles flushing caches at fsync(). Check if the hosting region
    is volatile and do not set dax_write_cache() if it is.

    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Christoph Hellwig
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The dax_flush() operation can be turned into a nop on platforms where
    firmware arranges for cpu caches to be flushed on a power-fail event.
    The ACPI 6.2 specification defines a mechanism for the platform to
    indicate this capability so the kernel can select the proper default.
    However, for other platforms, the administrator must toggle this setting
    manually.

    Given this flush setting is a dax-specific mechanism we advertise it
    through a 'dax' attribute group hanging off a host device. For example,
    a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a
    'write_cache' attribute to control response to dax cache flush requests.
    This is similar to the 'queue/write_cache' attribute that appears under
    block devices.

    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Suggested-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

28 Jun, 2017

5 commits

  • Allow volatile nfit ranges to participate in all the same infrastructure
    provided for persistent memory regions. A resulting resulting namespace
    device will still be called "pmem", but the parent region type will be
    "nd_volatile". This is in preparation for disabling the dax ->flush()
    operation in the pmem driver when it is hosted on a volatile range.

    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Christoph Hellwig
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The pmem driver assumes if platform firmware describes the memory
    devices associated with a persistent memory range and
    CONFIG_ARCH_HAS_PMEM_API=y that it has all the mechanism necessary to
    flush data to a power-fail safe zone. We warn if the firmware does not
    describe memory devices, but we also need to warn if the architecture
    does not claim pmem support.

    Cc: Jeff Moyer
    Cc: Christoph Hellwig
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Now that all callers of the pmem api have been converted to dax helpers that
    call back to the pmem driver, we can remove include/linux/pmem.h and
    asm/pmem.h.

    Cc:
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Toshi Kani
    Cc: Oliver O'Halloran
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Kill this globally defined wrapper and move to libnvdimm so that we can
    ultimately remove include/linux/pmem.h and asm/pmem.h.

    Cc:
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Signed-off-by: Dan Williams

    Dan Williams
     
  • We only call blk_queue_bounce for request-based drivers, so stop messing
    with it for make_request based drivers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

16 Jun, 2017

12 commits

  • With all handling of the CONFIG_ARCH_HAS_PMEM_API case being moved to
    libnvdimm and the pmem driver directly we do not need to provide global
    wrappers and fallbacks in the CONFIG_ARCH_HAS_PMEM_API=n case. The pmem
    driver will simply not link to arch_wb_cache_pmem() in that case. Same
    as before, pmem flushing is only defined for x86_64, via
    clean_cache_range(), but it is straightforward to add other archs in the
    future.

    arch_wb_cache_pmem() is an exported function since the pmem module needs
    to find it, but it is privately declared in drivers/nvdimm/pmem.h because
    there are no consumers outside of the pmem driver.

    Cc:
    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Oliver O'Halloran
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Suggested-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Filesystem-DAX flushes caches whenever it writes to the address returned
    through dax_direct_access() and when writing back dirty radix entries.
    That flushing is only required in the pmem case, so add a dax operation
    to allow pmem to take this extra action, but skip it for other dax
    capable devices that do not provide a flush routine.

    An example for this differentiation might be a volatile ram disk where
    there is no expectation of persistence. In fact the pmem driver itself might
    front such an address range specified by the NFIT. So, this "no flush"
    property might be something passed down by the bus / libnvdimm.

    Cc: Christoph Hellwig
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Sysfs "badblocks" information may be updated during run-time that:
    - MCE, SCI, and sysfs "scrub" may add new bad blocks
    - Writes and ioctl() may clear bad blocks

    Add support to send sysfs notifications to sysfs "badblocks" file
    under region and pmem directories when their badblocks information
    is re-evaluated (but is not necessarily changed) during run-time.

    Signed-off-by: Toshi Kani
    Cc: Vishal Verma
    Cc: Linda Knippers
    Signed-off-by: Dan Williams

    Toshi Kani
     
  • The rules for which version of the label specification are in effect at
    any given point in time are as follows:

    1/ If a DIMM has an existing / valid index block then the version
    specified is used regardless if it is a previous version.

    2/ By default when the kernel is initializing new index blocks the
    latest specification version (v1.2 at time of writing) is used.

    3/ An environment that wants to force create v1.1 label-sets must
    arrange for userspace to disable all active regions / namespaces /
    dimms and write a valid set of v1.1 index blocks to the dimms.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Starting with v1.2 labels, 'address abstractions' can be hinted via an
    address abstraction id that implies an info-block format. The standard
    address abstraction in the specification is the v2 format of the
    Block-Translation-Table (BTT). Support for that is saved for a later
    patch, for now we add support for the Linux supported address
    abstractions BTT (v1), PFN, and DAX.

    The new 'holder_class' attribute for namespace devices is added for
    tooling to specify the 'abstraction_guid' to store in the namespace label.
    For v1.1 labels this field is undefined and any setting of
    'holder_class' away from the default 'none' value will only have effect
    until the driver is unloaded. Setting 'holder_class' requires that
    whatever device tries to claim the namespace must be of the specified
    class.

    Cc: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The v1.2 namespace label specification adds a fletcher checksum to each
    label instance. Add generation and validation support for the new field.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • The v1.2 namespace label specification requires 'nlabel' and 'position'
    to be valid for the first ("lowest dpa") label in the set. It also
    requires all non-first labels to set those fields to 0xff.

    Linux does not much care if these values are correct, because we can
    just trust the count of labels with the matching uuid like the v1.1
    case. However, we set them correctly in case other environments care.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Starting with the v1.2 definition of namespace labels, the isetcookie
    field is populated and validated for blk-aperture namespaces. This adds
    some safety against inadvertent copying of namespace labels from one
    DIMM-device to another.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • The type_guid refers to the "Address Range Type GUID" for the region
    backing a namespace as defined the ACPI NFIT (NVDIMM Firmware Interface
    Table). This 'type' identifier specifies an access mechanism for the
    given namespace. This capability replaces the confusing usage of the
    'NSLABEL_FLAG_LOCAL' flag to indicate a block-aperture-mode namespace.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Previously we only honored the lba size for blk-aperture mode
    namespaces. For pmem namespaces the lba size was just assumed to be 512.
    With the new v1.2 label definition and compatibility with other
    operating environments, the ->lbasize property is now respected for pmem
    namespaces.

    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The interleave-set-cookie algorithm is extended to incorporate all the
    same components that are used to generate an nvdimm unique-id. For
    backwards compatibility we still maintain the old v1.1 definition.

    Reported-by: Nicholas Moulin
    Reported-by: Kaushik Kanetkar
    Signed-off-by: Dan Williams

    Dan Williams
     
  • In support of improved interoperability between operating systems and pre-boot
    environments the Intel proposed NVDIMM Namespace Specification [1], has been
    adopted and modified to the the UEFI 2.7 NVDIMM Label Protocol [2].

    Update the definitions of the namespace label data structures so that the new
    format can be supported alongside the existing label format.

    The new specification changes the default label size to 256 bytes, so
    everywhere that relied on sizeof(struct nd_namespace_label) must now use the
    sizeof_namespace_label() helper.

    There should be no functional differences from these changes as the
    default is still the v1.1 128-byte format. Future patches will move the
    default to the v1.2 definition.

    [1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
    [2]: http://www.uefi.org/sites/default/files/resources/UEFI_Spec_2_7.pdf

    Signed-off-by: Dan Williams

    Dan Williams
     

13 Jun, 2017

1 commit


10 Jun, 2017

1 commit

  • The pmem driver has a need to transfer data with a persistent memory
    destination and be able to rely on the fact that the destination writes are not
    cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
    (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
    to ensure data-writes have reached a power-fail-safe zone in the platform. The
    fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
    around and fence previous writes with an "sfence".

    Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
    memcpy_flushcache, that guarantee that the destination buffer is not dirty in
    the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
    will be used to replace the "pmem api" (include/linux/pmem.h +
    arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
    and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
    config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
    otherwise.

    This is meant to satisfy the concern from Linus that if a driver wants to do
    something beyond the normal nocache semantics it should be something private to
    that driver [1], and Al's concern that anything uaccess related belongs with
    the rest of the uaccess code [2].

    The first consumer of this interface is a new 'copy_from_iter' dax operation so
    that pmem can inject cache maintenance operations without imposing this
    overhead on other dax-capable drivers.

    [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
    [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

    Cc:
    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Al Viro
    Cc: Thomas Gleixner
    Cc: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

09 Jun, 2017

1 commit

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

05 Jun, 2017

1 commit

  • Hoist the libnvdimm helper as an inline helper to linux/uuid.h
    using an auxiliary const variable uuid_null in lib/uuid.c.

    [hch: also add the guid variant. Both do the same but I'd like
    to keep casts to a minimum]

    The common helper uses the new abstract type uuid_t * instead of
    u8 *.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Amir Goldstein
    [hch: added guid_is_null]
    Signed-off-by: Christoph Hellwig
    Acked-by: Dan Williams
    Reviewed-by: Andy Shevchenko

    Christoph Hellwig
     

13 May, 2017

1 commit

  • Pull libnvdimm fixes from Dan Williams:
    "Incremental fixes and a small feature addition on top of the main
    libnvdimm 4.12 pull request:

    - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
    The size regression is fixed by moving all dax helpers into the
    dax-core and only specifying "select DAX" for FS_DAX and
    dax-capable drivers. He also asked for clarification of the
    NR_DEV_DAX config option which, on closer look, does not need to be
    a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
    for good measure.

    - Ben's attention to detail on -stable patch submissions caught a
    case where the recent fixes to arch_copy_from_iter_pmem() missed a
    condition where we strand dirty data in the cache. This is tagged
    for -stable and will also be included in the rework of the pmem api
    to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.

    - Vishal adds a feature that missed the initial pull due to pending
    review feedback. It allows the kernel to clear media errors when
    initializing a BTT (atomic sector update driver) instance on a pmem
    namespace.

    - Ross noticed that the dax_device + dax_operations conversion broke
    __dax_zero_page_range(). The nvdimm unit tests fail to check this
    path, but xfstests immediately trips over it. No excuse for missing
    this before submitting the 4.12 pull request.

    These all pass the nvdimm unit tests and an xfstests spot check. The
    set has received a build success notification from the kbuild robot"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    filesystem-dax: fix broken __dax_zero_page_range() conversion
    libnvdimm, btt: ensure that initializing metadata clears poison
    libnvdimm: add an atomic vs process context flag to rw_bytes
    x86, pmem: Fix cache flushing for iovec write < 8 bytes
    device-dax: kill NR_DEV_DAX
    block, dax: move "select DAX" from BLOCK to FS_DAX
    device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX

    Linus Torvalds
     

11 May, 2017

2 commits

  • If we had badblocks/poison in the metadata area of a BTT, recreating the
    BTT would not clear the poison in all cases, notably the flog area. This
    is because rw_bytes will only clear errors if the request being sent
    down is 512B aligned and sized.

    Make sure that when writing the map and info blocks, the rw_bytes being
    sent are of the correct size/alignment. For the flog, instead of doing
    the smaller log_entry writes only, first do a 'wipe' of the entire area
    by writing zeroes in large enough chunks so that errors get cleared.

    Cc: Andy Rudoff
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • nsio_rw_bytes can clear media errors, but this cannot be done while we
    are in an atomic context due to locking within ACPI. From the BTT,
    ->rw_bytes may be called either from atomic or process context depending
    on whether the calls happen during initialization or during IO.

    During init, we want to ensure error clearing happens, and the flag
    marking process context allows nsio_rw_bytes to do that. When called
    during IO, we're in atomic context, and error clearing can be skipped.

    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma