29 Mar, 2018

1 commit

  • commit 3ffb0ba9b567a8efb9a04ed3d1ec15ff333ada22 upstream.

    Prior to 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
    we needed to temporarily add a zero-capacity disk before registering for
    blk-integrity. But adding a zero-capacity disk caused the partition
    table scanning to bail early, and this resulted in partitions not coming
    up after a probe of the BTT or blk namespaces.

    We can now register for integrity before the disk has been added, and
    this fixes the rescan problems.

    Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
    Reported-by: Dariusz Dokupil
    Cc:
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams
    Signed-off-by: Greg Kroah-Hartman

    Vishal Verma
     

04 Jul, 2017

2 commits

  • Currently if some one try to advance bvec beyond it's size we simply
    dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
    This simply means that we endup dereferencing/corrupting random memory
    region.

    Sane reaction would be to propagate error back to calling context
    But bvec_iter_advance's calling context is not always good for error
    handling. For safity reason let truncate iterator size to zero which
    will break external iteration loop which prevent us from unpredictable
    memory range corruption. And even it caller ignores an error, it will
    corrupt it's own bvecs, not others.

    This patch does:
    - Return error back to caller with hope that it will react on this
    - Truncate iterator size

    Code was added long time ago here 4550dd6c, luckily no one hit it
    in real life :)

    Signed-off-by: Dmitry Monakhov
    Reviewed-by: Ming Lei
    Reviewed-by: Martin K. Petersen
    [hch: switch to true/false returns instead of errno values]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     
  • Currently all integrity prep hooks are open-coded, and if prepare fails
    we ignore it's code and fail bio with EIO. Let's return real error to
    upper layer, so later caller may react accordingly.

    In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
    so it is reasonable to fold it in to one function.

    Signed-off-by: Dmitry Monakhov
    Reviewed-by: Martin K. Petersen
    [hch: merged with the latest block tree,
    return bool from bio_integrity_prep]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

28 Jun, 2017

1 commit


09 Jun, 2017

1 commit

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

11 May, 2017

1 commit

  • nsio_rw_bytes can clear media errors, but this cannot be done while we
    are in an atomic context due to locking within ACPI. From the BTT,
    ->rw_bytes may be called either from atomic or process context depending
    on whether the calls happen during initialization or during IO.

    During init, we want to ensure error clearing happens, and the flag
    marking process context allows nsio_rw_bytes to do that. When called
    during IO, we're in atomic context, and error clearing can be skipped.

    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

29 Jul, 2016

1 commit

  • Pull libnvdimm updates from Dan Williams:

    - Replace pcommit with ADR / directed-flushing.

    The pcommit instruction, which has not shipped on any product, is
    deprecated. Instead, the requirement is that platforms implement
    either ADR, or provide one or more flush addresses per nvdimm.

    ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
    to the memory controller on a power-fail event.

    Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
    Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
    A flush hint is an mmio address that when written and fenced assures
    that all previous posted writes targeting a given dimm have been
    flushed to media.

    - On-demand ARS (address range scrub).

    Linux uses the results of the ACPI ARS commands to track bad blocks
    in pmem devices. When latent errors are detected we re-scrub the
    media to refresh the bad block list, userspace can also request a
    re-scrub at any time.

    - Support for the Microsoft DSM (device specific method) command
    format.

    - Support for EDK2/OVMF virtual disk device memory ranges.

    - Various fixes and cleanups across the subsystem.

    * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
    libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
    nfit: do an ARS scrub on hitting a latent media error
    nfit: move to nfit/ sub-directory
    nfit, libnvdimm: allow an ARS scrub to be triggered on demand
    libnvdimm: register nvdimm_bus devices with an nd_bus driver
    pmem: clarify a debug print in pmem_clear_poison
    x86/insn: remove pcommit
    Revert "KVM: x86: add pcommit support"
    nfit, tools/testing/nvdimm/: unify shutdown paths
    libnvdimm: move ->module to struct nvdimm_bus_descriptor
    nfit: cleanup acpi_nfit_init calling convention
    nfit: fix _FIT evaluation memory leak + use after free
    tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
    tools/testing/nvdimm: add virtual ramdisk range
    acpi, nfit: treat virtual ramdisk SPA as pmem region
    pmem: kill __pmem address space
    pmem: kill wmb_pmem()
    libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
    fs/dax: remove wmb_pmem()
    libnvdimm, pmem: flush posted-write queues on shutdown
    ...

    Linus Torvalds
     

28 Jun, 2016

1 commit

  • For block drivers that specify a parent device, convert them to use
    device_add_disk().

    This conversion was done with the following semantic patch:

    @@
    struct gendisk *disk;
    expression E;
    @@

    - disk->driverfs_dev = E;
    ...
    - add_disk(disk);
    + device_add_disk(E, disk);

    @@
    struct gendisk *disk;
    expression E1, E2;
    @@

    - disk->driverfs_dev = E1;
    ...
    E2 = disk;
    ...
    - add_disk(E2);
    + device_add_disk(E1, E2);

    ...plus some manual fixups for a few missed conversions.

    Cc: Jens Axboe
    Cc: Keith Busch
    Cc: Michael S. Tsirkin
    Cc: David Woodhouse
    Cc: David S. Miller
    Cc: James Bottomley
    Cc: Ross Zwisler
    Cc: Konrad Rzeszutek Wilk
    Cc: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     

16 Jun, 2016

1 commit

  • Clean up needless calls to the action routine by letting
    devm_add_action_or_reset() call it automatically. This does cause the
    disk to registered and immediately unregistered when a memory allocation
    fails, but the block layer should be prepared for such an event.

    Reported-by: Sudip Mukherjee
    Signed-off-by: Dan Williams

    Dan Williams
     

23 Apr, 2016

7 commits


10 Mar, 2016

1 commit


08 Nov, 2015

1 commit


29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

26 Jun, 2015

4 commits

  • Upon detection of an unarmed dimm in a region, arrange for descendant
    BTT, PMEM, or BLK instances to be read-only. A dimm is primarily marked
    "unarmed" via flags passed by platform firmware (NFIT).

    The flags in the NFIT memory device sub-structure indicate the state of
    the data on the nvdimm relative to its energy source or last "flush to
    persistence". For the most part there is nothing the driver can do but
    advertise the state of these flags in sysfs and emit a message if
    firmware indicates that the contents of the device may be corrupted.
    However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
    the block devices incorporating that nvdimm to be marked read-only.
    This is a safe default as the data is still available and new writes are
    held off until the administrator either forces read-write mode, or the
    energy source becomes armed.

    A 'read_only' attribute is added to REGION devices to allow for
    overriding the default read-only policy of all descendant block devices.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • This is disabled by default as the overhead is prohibitive, but if the
    user takes the action to turn it on we'll oblige.

    Reviewed-by: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Support multiple block sizes (sector + metadata) for nd_blk in the
    same way as done for the BTT. Add the idea of an 'internal' lbasize,
    which is properly aligned and padded, and store metadata in this space.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • The libnvdimm implementation handles allocating dimm address space (DPA)
    between PMEM and BLK mode interfaces. After DPA has been allocated from
    a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
    as a struct bio based block device. Unlike PMEM, BLK is required to
    handle platform specific details like mmio register formats and memory
    controller interleave. For this reason the libnvdimm generic nd_blk
    driver calls back into the bus provider to carry out the I/O.

    This initial implementation handles the BLK interface defined by the
    ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
    DCR (dimm control region), BDW (block data window), IDT (interleave
    descriptor) NFIT structures and the hardware register format.
    [1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
    [2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    Cc: Andy Lutomirski
    Cc: Boaz Harrosh
    Cc: H. Peter Anvin
    Cc: Jens Axboe
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Signed-off-by: Ross Zwisler
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Dan Williams

    Ross Zwisler