19 Jul, 2019

1 commit

  • For good reason, the standard device_lock() is marked
    lockdep_set_novalidate_class() because there is simply no sane way to
    describe the myriad ways the device_lock() ordered with other locks.
    However, that leaves subsystems that know their own local device_lock()
    ordering rules to find lock ordering mistakes manually. Instead,
    introduce an optional / additional lockdep-enabled lock that a subsystem
    can acquire in all the same paths that the device_lock() is acquired.

    A conversion of the NFIT driver and NVDIMM subsystem to a
    lockdep-validate device_lock() scheme is included. The
    debug_nvdimm_lock() implementation implements the correct lock-class and
    stacking order for the libnvdimm device topology hierarchy.

    Yes, this is a hack, but hopefully it is a useful hack for other
    subsystems device_lock() debug sessions. Quoting Greg:

    "Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
    using it as much as anything else, so user beware :)

    I don't object to it if it makes things easier for you to debug."

    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Will Deacon
    Cc: Dave Jiang
    Cc: Keith Busch
    Cc: Peter Zijlstra
    Cc: Vishal Verma
    Cc: "Rafael J. Wysocki"
    Cc: Greg Kroah-Hartman
    Signed-off-by: Dan Williams
    Acked-by: Greg Kroah-Hartman
    Reviewed-by: Ira Weiny
    Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of version 2 of the gnu general public license as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 64 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

07 Mar, 2018

1 commit

  • Dynamic debug can be instructed to add the function name to the debug
    output using the +f switch, so there is no need for the libnvdimm
    modules to do it again. If a user decides to add the +f switch for
    libnvdimm's dynamic debug this results in double prints of the function
    name.

    Reported-by: Johannes Thumshirn
    Reported-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

03 Nov, 2017

1 commit

  • nfit_test needs to use the poison list manipulation code as well. Make
    it more generic and in the process rename poison to badrange, and move
    all the related helpers to a new file.

    Signed-off-by: Dave Jiang
    [vishal: Add badrange.o to nfit_test's Kbuild]
    [vishal: add a missed include in bus.c for the new badrange functions]
    [vishal: rename all instances of 'be' to 'bre']
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Dave Jiang
     

12 Aug, 2017

1 commit


18 Jul, 2017

1 commit

  • __add_badblock_range() does not account sector alignment when
    it sets 'num_sectors'. Therefore, an ARS error record range
    spanning across two sectors is set to a single sector length,
    which leaves the 2nd sector unprotected.

    Change __add_badblock_range() to set 'num_sectors' properly.

    Cc:
    Fixes: 0caeef63e6d2 ("libnvdimm: Add a poison list and export badblocks")
    Signed-off-by: Toshi Kani
    Reviewed-by: Vishal Verma
    Signed-off-by: Dan Williams

    Toshi Kani
     

04 Jul, 2017

1 commit


28 Jun, 2017

1 commit

  • Allow volatile nfit ranges to participate in all the same infrastructure
    provided for persistent memory regions. A resulting resulting namespace
    device will still be called "pmem", but the parent region type will be
    "nd_volatile". This is in preparation for disabling the dax ->flush()
    operation in the pmem driver when it is hosted on a volatile range.

    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Christoph Hellwig
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

16 Jun, 2017

1 commit

  • Starting with v1.2 labels, 'address abstractions' can be hinted via an
    address abstraction id that implies an info-block format. The standard
    address abstraction in the specification is the v2 format of the
    Block-Translation-Table (BTT). Support for that is saved for a later
    patch, for now we add support for the Linux supported address
    abstractions BTT (v1), PFN, and DAX.

    The new 'holder_class' attribute for namespace devices is added for
    tooling to specify the 'abstraction_guid' to store in the namespace label.
    For v1.1 labels this field is undefined and any setting of
    'holder_class' away from the default 'none' value will only have effect
    until the driver is unloaded. Setting 'holder_class' requires that
    whatever device tries to claim the namespace must be of the specified
    class.

    Cc: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     

14 Apr, 2017

1 commit

  • The following warning results from holding a lane spinlock,
    preempt_disable(), or the btt map spinlock and then trying to take the
    reconfig_mutex to walk the poison list and potentially add new entries.

    BUG: sleeping function called from invalid context at kernel/locking/mutex.
    c:747
    in_atomic(): 1, irqs_disabled(): 0, pid: 17159, name: dd
    [..]
    Call Trace:
    dump_stack+0x85/0xc8
    ___might_sleep+0x184/0x250
    __might_sleep+0x4a/0x90
    __mutex_lock+0x58/0x9b0
    ? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
    ? __nvdimm_bus_badblocks_clear+0x2f/0x60 [libnvdimm]
    ? acpi_nfit_forget_poison+0x79/0x80 [nfit]
    ? _raw_spin_unlock+0x27/0x40
    mutex_lock_nested+0x1b/0x20
    nvdimm_bus_lock+0x21/0x30 [libnvdimm]
    nvdimm_forget_poison+0x25/0x50 [libnvdimm]
    nvdimm_clear_poison+0x106/0x140 [libnvdimm]
    nsio_rw_bytes+0x164/0x270 [libnvdimm]
    btt_write_pg+0x1de/0x3e0 [nd_btt]
    ? blk_queue_enter+0x30/0x290
    btt_make_request+0x11a/0x310 [nd_btt]
    ? blk_queue_enter+0xb7/0x290
    ? blk_queue_enter+0x30/0x290
    generic_make_request+0x118/0x3b0

    A spinlock is introduced to protect the poison list. This allows us to not
    having to acquire the reconfig_mutex for touching the poison list. The
    add_poison() function has been broken out into two helper functions. One to
    allocate the poison entry and the other to apppend the entry. This allows us
    to unlock the poison_lock in non-I/O path and continue to be able to allocate
    the poison entry with GFP_KERNEL. We will use GFP_NOWAIT in the I/O path in
    order to satisfy being in atomic context.

    Reviewed-by: Vishal Verma
    Signed-off-by: Dave Jiang
    Signed-off-by: Dan Williams

    Dave Jiang
     

13 Apr, 2017

1 commit

  • Providing mechanism to clear poison list via the ndctl ND_CMD_CLEAR_ERROR
    call. We will update the poison list and also the badblocks at region level
    if the region is in dax mode or in pmem mode and not active. In other
    words we force badblocks to be cleared through write requests if the
    address is currently accessed through a block device, otherwise it can
    only be done via the ioctl+dsm path.

    Signed-off-by: Dave Jiang
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dave Jiang
     

19 Oct, 2016

1 commit

  • nd_iostat_start() and nd_iostat_end() implement the same functionality
    that generic_start_io_acct() and generic_end_io_acct() already provide.

    Change nd_iostat_start() and nd_iostat_end() to call the generic iostat
    interfaces. There is no change in the nd interfaces.

    Signed-off-by: Toshi Kani
    Cc: Andrew Morton
    Cc: Alexander Viro
    Cc: Dave Chinner
    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Toshi Kani
     

08 Oct, 2016

1 commit


01 Oct, 2016

1 commit

  • nvdimm_clear_poison cleared the user-visible badblocks, and sent
    commands to the NVDIMM to clear the areas marked as 'poison', but it
    neglected to clear the same areas from the internal poison_list which is
    used to marshal ARS results before sorting them by namespace. As a
    result, once on-demand ARS functionality was added:

    37b137f nfit, libnvdimm: allow an ARS scrub to be triggered on demand

    A scrub triggered from either sysfs or an MCE was found to be adding
    stale entries that had been cleared from gendisk->badblocks, but were
    still present in nvdimm_bus->poison_list. Additionally, the stale entries
    could be triggered into producing stale disk->badblocks by simply disabling
    and re-enabling the namespace or region.

    This adds the missing step of clearing poison_list entries when clearing
    poison, so that it is always in sync with badblocks.

    Fixes: 37b137f ("nfit, libnvdimm: allow an ARS scrub to be triggered on demand")
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

22 Sep, 2016

1 commit


24 Jul, 2016

2 commits

  • Normally, an ARS (Address Range Scrub) only happens at
    boot/initialization time. There can however arise situations where a
    bus-wide rescan is needed - notably, in the case of discovering a latent
    media error, we should do a full rescan to figure out what other sectors
    are bad, and thus potentially avoid triggering an mce on them in the
    future. Also provide a sysfs trigger to start a bus-wide scrub.

    Cc: Rafael J. Wysocki
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • A recent effort to add a new nvdimm bus provider attribute highlighted a
    race between interrogating nvdimm_bus->nd_desc and nvdimm_bus tear down.
    The typical way to handle these races is to take the device_lock() in
    the attribute method and validate that the device is still active. In
    order for a device to be 'active' it needs to be associated with a
    driver. So, we create the small boilerplate for a driver and register
    nvdimm_bus devices on the 'nvdimm_bus_type' bus.

    A result of this change is that ndbusX devices now appear under
    /sys/bus/nd/devices. In fact this makes /sys/class/nd somewhat
    redundant, but removing that will need to take a long deprecation period
    given its use by ndctl binaries in the field.

    This change naturally pulls code from drivers/nvdimm/core.c to
    drivers/nvdimm/bus.c, so it is a nice code organization clean-up as
    well.

    Cc: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     

22 Jul, 2016

1 commit


08 Jul, 2016

1 commit

  • In preparation for generically mapping flush hint addresses for both the
    BLK and PMEM use case, provide a generic / reference counted mapping
    api. Given the fact that a dimm may belong to multiple regions (PMEM
    and BLK), the flush hint addresses need to be held valid as long as any
    region associated with the dimm is active. This is similar to the
    existing BLK-region case where multiple BLK-regions may share an
    aperture mapping. Up-level this shared / reference-counted mapping
    capability from the nfit driver to a core nvdimm capability.

    This eliminates the need for the nd_blk_region.disable() callback. Note
    that the removal of nfit_spa_map() and related infrastructure is
    deferred to a later patch.

    Signed-off-by: Dan Williams

    Dan Williams
     

07 Jul, 2016

1 commit


22 May, 2016

1 commit


21 May, 2016

1 commit

  • ida instances allocate some internal memory for ->free_bitmap in
    addition to the base 'struct ida'. Use ida_destroy() to release that
    memory at module_exit().

    Reported-by: Johannes Thumshirn
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     

29 Apr, 2016

1 commit

  • Clarify the distinction between "commands", the ioctls userspace calls
    to request the kernel take some action on a given dimm device, and
    "_DSMs", the actual function numbers used in the firmware interface to
    the DIMM. _DSMs are ACPI specific whereas commands are Linux kernel
    generic.

    This is in preparation for breaking the 1:1 implicit relationship
    between the kernel ioctl number space and the firmware specific function
    numbers.

    Cc: Jerry Hoemann
    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

08 Apr, 2016

1 commit

  • When section alignment padding is in effect we need to shift / truncate
    the range that is queried for poison by the 'start_pad' or 'end_trunc'
    reservations.

    It's easiest if we just pass in an adjusted resource range rather than
    deriving it from the passed in namespace. With the resource range
    resolution pushed out to the caller we can also push the
    namespace-to-region lookup to the caller and drop the implicit pmem-type
    assumption about the passed in namespace object.

    Cc: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     

06 Mar, 2016

2 commits


10 Jan, 2016

3 commits

  • If a device will ever have badblocks it should always have a badblocks
    instance available. So, similar to md, embed a badblocks instance in
    pmem_device. This reduces pointer chasing in the i/o fast path, and
    simplifies the init path.

    Reported-by: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     
  • If the badblocks list runs out of space it simply means that software is
    unable to intercept all errors. This is no different than the latent
    discovery of new badblocks case and should not be an initialization
    failure condition.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • During region creation, perform Address Range Scrubs (ARS) for the SPA
    (System Physical Address) ranges to retrieve known poison locations from
    firmware. Add a new data structure 'nd_poison' which is used as a list
    in nvdimm_bus to store these poison locations.

    When creating a pmem namespace, if there is any known poison associated
    with its physical address space, convert the poison ranges to bad sectors
    that are exposed using the badblocks interface.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

22 Oct, 2015

3 commits

  • The libnvidmm-btt and nvme drivers use blk_integrity to reserve space
    for per-sector metadata, but sometimes without protection checksums.
    This property is generically useful, so teach the block core to
    internally specify a nop profile if one is not provided at registration
    time.

    Cc: Keith Busch
    Cc: Matthew Wilcox
    Suggested-by: Christoph Hellwig
    [hch: kill the local nvme nop profile as well]
    Acked-by: Martin K. Petersen
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Dan Williams
     
  • Up until now the_integrity profile has been dynamically allocated and
    attached to struct gendisk after the disk has been made active.

    This causes problems because NVMe devices need to register the profile
    prior to the partition table being read due to a mandatory metadata
    buffer requirement. In addition, DM goes through hoops to deal with
    preallocating, but not initializing integrity profiles.

    Since the integrity profile is small (4 bytes + a pointer), Christoph
    suggested moving it to struct gendisk proper. This requires several
    changes:

    - Moving the blk_integrity definition to genhd.h.

    - Inlining blk_integrity in struct gendisk.

    - Removing the dynamic allocation code.

    - Adding helper functions which allow gendisk to set up and tear down
    the integrity sysfs dir when a disk is added/deleted.

    - Adding a blk_integrity_revalidate() callback for updating the stable
    pages bdi setting.

    - The calls that depend on whether a device has an integrity profile or
    not now key off of the bi->profile pointer.

    - Simplifying the integrity support routines in DM (Mike Snitzer).

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Mike Snitzer
    Cc: Dan Williams
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • We previously made a complete copy of a device's data integrity profile
    even though several of the fields inside the blk_integrity struct are
    pointers to fixed template entries in t10-pi.c.

    Split the static and per-device portions so that we can reference the
    template directly.

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Cc: Dan Williams
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

26 Jun, 2015

3 commits

  • This is disabled by default as the overhead is prohibitive, but if the
    user takes the action to turn it on we'll oblige.

    Reviewed-by: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Support multiple block sizes (sector + metadata) for nd_blk in the
    same way as done for the BTT. Add the idea of an 'internal' lbasize,
    which is properly aligned and padded, and store metadata in this space.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • Support multiple block sizes (sector + metadata) using the blk integrity
    framework. This registers a new integrity template that defines the
    protection information tuple size based on the configured metadata size,
    and simply acts as a passthrough for protection information generated by
    another layer. The metadata is written to the storage as-is, and read back
    with each sector.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

25 Jun, 2015

5 commits

  • A blk label set describes a namespace comprised of one or more
    discontiguous dpa ranges on a single dimm. They may alias with one or
    more pmem interleave sets that include the given dimm.

    This is the runtime/volatile configuration infrastructure for sysfs
    manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'. A later
    patch will make these settings persistent by writing back the label(s).

    Unlike pmem namespaces, multiple blk namespaces can be created per
    region. Once a blk namespace has been created a new seed device
    (unconfigured child of a parent blk region) is instantiated. As long as
    a region has 'available_size' != 0 new child namespaces may be created.

    Cc: Greg KH
    Cc: Neil Brown
    Acked-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • A complete label set is a PMEM-label per-dimm per-interleave-set where
    all the UUIDs match and the interleave set cookie matches the hosting
    interleave set.

    Present sysfs attributes for manipulation of a PMEM-namespace's
    'alt_name', 'uuid', and 'size' attributes. A later patch will make
    these settings persistent by writing back the label.

    Note that PMEM allocations grow forwards from the start of an interleave
    set (lowest dimm-physical-address (DPA)). BLK-namespaces that alias
    with a PMEM interleave set will grow allocations backward from the
    highest DPA.

    Cc: Greg KH
    Cc: Neil Brown
    Acked-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • On platforms that have firmware support for reading/writing per-dimm
    label space, a portion of the dimm may be accessible via an interleave
    set PMEM mapping in addition to the dimm's BLK (block-data-window
    aperture(s)) interface. A label, stored in a "configuration data
    region" on the dimm, disambiguates which dimm addresses are accessed
    through which exclusive interface.

    Add infrastructure that allows the kernel to block modifications to a
    label in the set while any member dimm is active. Note that this is
    meant only for enforcing "no modifications of active labels" via the
    coarse ioctl command. Adding/deleting namespaces from an active
    interleave set is always possible via sysfs.

    Another aspect of tracking interleave sets is tracking their integrity
    when DIMMs in a set are physically re-ordered. For this purpose we
    generate an "interleave-set cookie" that can be recorded in a label and
    validated against the current configuration. It is the bus provider
    implementation's responsibility to calculate the interleave set cookie
    and attach it to a given region.

    Cc: Neil Brown
    Cc:
    Cc: Greg KH
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Acked-by: Christoph Hellwig
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The libnvdimm region driver is an intermediary driver that translates
    non-volatile "region"s into "namespace" sub-devices that are surfaced by
    persistent memory block-device drivers (PMEM and BLK).

    ACPI 6 introduces the concept that a given nvdimm may simultaneously
    offer multiple access modes to its media through direct PMEM load/store
    access, or windowed BLK mode. Existing nvdimms mostly implement a PMEM
    interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
    If an nvdimm is single interfaced, then there is no need for dimm
    metadata labels. For these devices we can take the region boundaries
    directly to create a child namespace device (nd_namespace_io).

    Acked-by: Christoph Hellwig
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • * Implement the device-model infrastructure for loading modules and
    attaching drivers to nvdimm devices. This is a simple association of a
    nd-device-type number with a driver that has a bitmask of supported
    device types. To facilitate userspace bind/unbind operations 'modalias'
    and 'devtype', that also appear in the uevent, are added as generic
    sysfs attributes for all nvdimm devices. The reason for the device-type
    number is to support sub-types within a given parent devtype, be it a
    vendor-specific sub-type or otherwise.

    * The first consumer of this infrastructure is the driver
    for dimm devices. It simply uses control messages to retrieve and
    store the configuration-data image (label set) from each dimm.

    Note: nd_device_register() arranges for asynchronous registration of
    nvdimm bus devices by default.

    Cc: Greg KH
    Cc: Neil Brown
    Acked-by: Christoph Hellwig
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams