28 Mar, 2020

1 commit

  • Current make_request based drivers use either blk_alloc_queue_node or
    blk_alloc_queue to allocate a queue, and then set up the make_request_fn
    function pointer and a few parameters using the blk_queue_make_request
    helper. Simplify this by passing the make_request pointer to
    blk_alloc_queue, and while at it merge the _node variant into the main
    helper by always passing a node_id, and remove the superfluous gfp_mask
    parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Feb, 2020

1 commit

  • After the removal of the device-public infrastructure there are only 2
    ->page_free() call backs in the kernel. One of those is a
    device-private callback in the nouveau driver, the other is a generic
    wakeup needed in the DAX case. In the hopes that all ->page_free()
    callbacks can be migrated to common core kernel functionality, move the
    device-private specific actions in __put_devmap_managed_page() under the
    is_device_private_page() conditional, including the ->page_free()
    callback. For the other page types just open-code the generic wakeup.

    Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
    does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
    case.

    Link: http://lkml.kernel.org/r/20200107224558.2362728-4-jhubbard@nvidia.com
    Signed-off-by: Dan Williams
    Signed-off-by: John Hubbard
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jérôme Glisse
    Cc: Jan Kara
    Cc: Ira Weiny
    Cc: Alex Williamson
    Cc: Aneesh Kumar K.V
    Cc: Björn Töpel
    Cc: Daniel Vetter
    Cc: Hans Verkuil
    Cc: Jason Gunthorpe
    Cc: Jason Gunthorpe
    Cc: Jens Axboe
    Cc: Jonathan Corbet
    Cc: Kirill A. Shutemov
    Cc: Leon Romanovsky
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

02 Dec, 2019

2 commits

  • Pull libnvdimm updates from Dan Williams:
    "The highlight this cycle is continuing integration fixes for PowerPC
    and some resulting optimizations.

    Summary:

    - Updates to better support vmalloc space restrictions on PowerPC
    platforms.

    - Cleanups to move common sysfs attributes to core 'struct
    device_type' objects.

    - Export the 'target_node' attribute (the effective numa node if pmem
    is marked online) for regions and namespaces.

    - Miscellaneous fixups and optimizations"

    * tag 'libnvdimm-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
    MAINTAINERS: Remove Keith from NVDIMM maintainers
    libnvdimm: Export the target_node attribute for regions and namespaces
    dax: Add numa_node to the default device-dax attributes
    libnvdimm: Simplify root read-only definition for the 'resource' attribute
    dax: Simplify root read-only definition for the 'resource' attribute
    dax: Create a dax device_type
    libnvdimm: Move nvdimm_bus_attribute_group to device_type
    libnvdimm: Move nvdimm_attribute_group to device_type
    libnvdimm: Move nd_mapping_attribute_group to device_type
    libnvdimm: Move nd_region_attribute_group to device_type
    libnvdimm: Move nd_numa_attribute_group to device_type
    libnvdimm: Move nd_device_attribute_group to device_type
    libnvdimm: Move region attribute group definition
    libnvdimm: Move attribute groups to device type
    libnvdimm: Remove prototypes for nonexistent functions
    libnvdimm/btt: fix variable 'rc' set but not used
    libnvdimm/pmem: Delete include of nd-core.h
    libnvdimm/namespace: Differentiate between probe mapping and runtime mapping
    libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe
    libnvdimm: Trivial comment fix
    ...

    Linus Torvalds
     
  • Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
    "As part of the cleanup of some remaining y2038 issues, I came to
    fs/compat_ioctl.c, which still has a couple of commands that need
    support for time64_t.

    In completely unrelated work, I spent time on cleaning up parts of
    this file in the past, moving things out into drivers instead.

    After Al Viro reviewed an earlier version of this series and did a lot
    more of that cleanup, I decided to try to completely eliminate the
    rest of it and move it all into drivers.

    This series incorporates some of Al's work and many patches of my own,
    but in the end stops short of actually removing the last part, which
    is the scsi ioctl handlers. I have patches for those as well, but they
    need more testing or possibly a rewrite"

    * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
    scsi: sd: enable compat ioctls for sed-opal
    pktcdvd: add compat_ioctl handler
    compat_ioctl: move SG_GET_REQUEST_TABLE handling
    compat_ioctl: ppp: move simple commands into ppp_generic.c
    compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
    compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
    compat_ioctl: unify copy-in of ppp filters
    tty: handle compat PPP ioctls
    compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
    compat_ioctl: handle SIOCOUTQNSD
    af_unix: add compat_ioctl support
    compat_ioctl: reimplement SG_IO handling
    compat_ioctl: move WDIOC handling into wdt drivers
    fs: compat_ioctl: move FITRIM emulation into file systems
    gfs2: add compat_ioctl support
    compat_ioctl: remove unused convert_in_user macro
    compat_ioctl: remove last RAID handling code
    compat_ioctl: remove /dev/raw ioctl translation
    compat_ioctl: remove PCI ioctl translation
    compat_ioctl: remove joystick ioctl translation
    ...

    Linus Torvalds
     

20 Nov, 2019

7 commits

  • Aneesh points out that some platforms may have "local" attached
    persistent memory and "remote" persistent memory that map to the same
    "online" node, or persistent memory devices with different performance
    properties. In this case 'numa_node' is identical for the two instances,
    but 'target_node' is differentiated so platform firmware can communicate
    distinct performance properties per range. Expose 'target_node' by
    default to allow for disambiguation of devices that share the same
    numa_map_to_online_node() result.

    Reported-by: "Aneesh Kumar K.V"
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157401274500.43284.2369509941678577768.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Rather than update the permission in ->is_visible() set the permission
    directly at declaration time.

    Cc: Ira Weiny
    Cc: Vishal Verma
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309905534.1582359.13927459228885931097.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     
  • A 'struct device_type' instance can carry default attributes for the
    device. Use this facility to remove the export of
    nvdimm_bus_attribute_group and put the responsibility on the core rather
    than leaf implementations to define this attribute.

    Cc: Ira Weiny
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Vishal Verma
    Cc: Aneesh Kumar K.V
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309903815.1582359.6418211876315050283.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     
  • A 'struct device_type' instance can carry default attributes for the
    device. Use this facility to remove the export of
    nvdimm_attribute_group and put the responsibility on the core rather
    than leaf implementations to define this attribute.

    Cc: Ira Weiny
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Vishal Verma
    Cc: Aneesh Kumar K.V
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309903201.1582359.10966209746585062329.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     
  • A 'struct device_type' instance can carry default attributes for the
    device. Use this facility to remove the export of
    nd_mapping_attribute_group and put the responsibility on the core rather
    than leaf implementations to define this attribute.

    Cc: Ira Weiny
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Vishal Verma
    Cc: Aneesh Kumar K.V
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309902686.1582359.6749533709859492704.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     
  • A 'struct device_type' instance can carry default attributes for the
    device. Use this facility to remove the export of
    nd_region_attribute_group and put the responsibility on the core rather
    than leaf implementations to define this attribute.

    Cc: Ira Weiny
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Vishal Verma
    Cc: Aneesh Kumar K.V
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309902169.1582359.16828508538444551337.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     
  • A 'struct device_type' instance can carry default attributes for the
    device. Use this facility to remove the export of
    nd_numa_attribute_group and put the responsibility on the core rather
    than leaf implementations to define this attribute.

    Cc: Ira Weiny
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Vishal Verma
    Cc: Aneesh Kumar K.V
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157401269537.43284.14411189404186877352.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Dan Williams
     

18 Nov, 2019

6 commits

  • A 'struct device_type' instance can carry default attributes for the
    device. Use this facility to remove the export of
    nd_device_attribute_group and put the responsibility on the core rather
    than leaf implementations to define this attribute.

    For regions this creates a new nd_region_attribute_groups[] added to the
    per-region device-type instances.

    Cc: Ira Weiny
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Vishal Verma
    Cc: Aneesh Kumar K.V
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309901138.1582359.12909354140826530394.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Dan Williams
     
  • In preparation for moving region attributes from device attribute groups
    to the region device-type, reorder the declaration so that it can be
    referenced by the device-type definition without forward declarations.
    No functional changes are intended to result from this change.

    Cc: Ira Weiny
    Cc: Vishal Verma
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309900624.1582359.6929998072035982264.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     
  • Statically initialize the attribute groups for each libnvdimm
    device_type. This is a preparation step for removing unnecessary exports
    of attributes that can be included in the device_type by default.

    Also take the opportunity to mark 'struct device_type' instances const.

    Cc: Ira Weiny
    Cc: Vishal Verma
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309900111.1582359.2445687530383470348.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     
  • These functions don't exist, so remove the prototypes for them.

    Signed-off-by: Alastair D'Silva
    Reviewed-by: Andrew Donnellan
    Reviewed-by: Frederic Barrat
    Link: https://lore.kernel.org/r/20191025044721.16617-3-alastair@au1.ibm.com
    Signed-off-by: Dan Williams

    Alastair D'Silva
     
  • drivers/nvdimm/btt.c: In function 'btt_read_pg':
    drivers/nvdimm/btt.c:1264:8: warning: variable 'rc' set but not used
    [-Wunused-but-set-variable]
    int rc;
    ^~

    Add a ratelimited message in case a storm of errors is encountered.

    Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing")
    Signed-off-by: Qian Cai
    Reviewed-by: Vishal Verma
    Link: https://lore.kernel.org/r/1572530719-32161-1-git-send-email-cai@lca.pw
    Signed-off-by: Dan Williams

    Qian Cai
     
  • The entire point of nd-core.h is to hide functionality that no leaf
    driver should touch. In fact, the commit that added it had no need to
    include it.

    Fixes: 06e8ccdab15f ("acpi: nfit: Add support for detect platform...")
    Cc: Ira Weiny
    Cc: Dave Jiang
    Cc: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     

15 Nov, 2019

3 commits

  • The nvdimm core currently maps the full namespace to an ioremap range
    while probing the namespace mode. This can result in probe failures on
    architectures that have limited ioremap space.

    For example, with a large btt namespace that consumes most of I/O remap
    range, depending on the sequence of namespace initialization, the user
    can find a pfn namespace initialization failure due to unavailable I/O
    remap space which nvdimm core uses for temporary mapping.

    nvdimm core can avoid this failure by only mapping the reserved info
    block area to check for pfn superblock type and map the full namespace
    resource only before using the namespace.

    Given that personalities like BTT can be layered on top of any namespace
    type create a generic form of devm_nsio_enable (devm_namespace_enable)
    and use it inside the per-personality attach routines. Now
    devm_namespace_enable() is always paired with disable unless the mapping
    is going to be used for long term runtime access.

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20191017073308.32645-1-aneesh.kumar@linux.ibm.com
    [djbw: reworks to move devm_namespace_{en,dis}able into *attach helpers]
    Reported-by: kbuild test robot
    Link: https://lore.kernel.org/r/20191031105741.102793-2-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • nvdimm core use nd_pfn_validate when looking for devdax or fsdax namespace. In this
    case device resources are allocated against nd_namespace_io dev. In-order to
    allow remap of range in nd_pfn_clear_memmap_error(), move the device memmap
    area clearing while initializing pfn namespace. With this device
    resource are allocated against nd_pfn and we can use nd_pfn->dev for remapping.

    This also avoids calling nd_pfn_clear_mmap_errors twice. Once while probing the
    namespace and second while initializing a pfn namespace.

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20191101032728.113001-1-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • Don't leave claim_class set to an invalid value if an error occurs in
    btt_claim_class().

    While we are here change the return type of __holder_class_store() to be
    clear about the values it is returning.

    This was found via code inspection.

    Reported-by: Dan Carpenter
    Reviewed-by: Vishal Verma
    Signed-off-by: Ira Weiny
    Link: https://lore.kernel.org/r/20190925211348.14082-1-ira.weiny@intel.com
    Signed-off-by: Dan Williams

    Ira Weiny
     

07 Nov, 2019

1 commit

  • In preparation for handling platform differentiated memory types beyond
    persistent memory, uplevel the "region" identifier to a global number
    space. This enables a device-dax instance to be registered to any memory
    type with guaranteed unique names.

    Signed-off-by: Dan Williams
    Acked-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki

    Dan Williams
     

23 Oct, 2019

1 commit

  • The .ioctl and .compat_ioctl file operations have the same prototype so
    they can both point to the same function, which works great almost all
    the time when all the commands are compatible.

    One exception is the s390 architecture, where a compat pointer is only
    31 bit wide, and converting it into a 64-bit pointer requires calling
    compat_ptr(). Most drivers here will never run in s390, but since we now
    have a generic helper for it, it's easy enough to use it consistently.

    I double-checked all these drivers to ensure that all ioctl arguments
    are used as pointers or are ignored, but are not interpreted as integer
    values.

    Acked-by: Jason Gunthorpe
    Acked-by: Daniel Vetter
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Greg Kroah-Hartman
    Acked-by: David Sterba
    Acked-by: Darren Hart (VMware)
    Acked-by: Jonathan Cameron
    Acked-by: Bjorn Andersson
    Acked-by: Dan Williams
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

30 Sep, 2019

1 commit

  • More libnvdimm updates from Dan Williams:

    - Complete the reworks to interoperate with powerpc dynamic huge page
    sizes

    - Fix a crash due to missed accounting for the powerpc 'struct
    page'-memmap mapping granularity

    - Fix badblock initialization for volatile (DRAM emulated) pmem ranges

    - Stop triggering request_key() notifications to userspace when
    NVDIMM-security is disabled / not present

    - Miscellaneous small fixups

    * tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    libnvdimm/region: Enable MAP_SYNC for volatile regions
    libnvdimm: prevent nvdimm from requesting key when security is disabled
    libnvdimm/region: Initialize bad block for volatile namespaces
    libnvdimm/nfit_test: Fix acpi_handle redefinition
    libnvdimm/altmap: Track namespace boundaries in altmap
    libnvdimm: Fix endian conversion issues 
    libnvdimm/dax: Pick the right alignment default when creating dax devices
    powerpc/book3s64: Export has_transparent_hugepage() related functions.

    Linus Torvalds
     

25 Sep, 2019

6 commits

  • Some environments want to use a host tmpfs/ramdisk to back guest pmem.
    While the data is not persisted relative to the host it *is* persisted
    relative to guest crashes / reboots. The guest is free to use dax and
    MAP_SYNC to keep filesystem metadata consistent with dax accesses
    without requiring guest fsync(). The guest can also observe that the
    region is volatile and skip cache flushing as global visibility is
    enough to "persist" data relative to the host staying alive over guest
    reset events.

    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: Pankaj Gupta
    Link: https://lore.kernel.org/r/20190924114327.14700-1-aneesh.kumar@linux.ibm.com
    [djbw: reword the changelog]
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • Current implementation attempts to request keys from the keyring even when
    security is not enabled. Change behavior so when security is disabled it
    will skip key request.

    Error messages seen when no keys are installed and libnvdimm is loaded:

    request-key[4598]: Cannot find command to construct key 661489677
    request-key[4606]: Cannot find command to construct key 34713726

    Cc: stable@vger.kernel.org
    Fixes: 4c6926a23b76 ("acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMs")
    Signed-off-by: Dave Jiang
    Link: https://lore.kernel.org/r/156934642272.30222.5230162488753445916.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dan Williams

    Dave Jiang
     
  • We do check for a bad block during namespace init and that use
    region bad block list. We need to initialize the bad block
    for volatile regions for this to work. We also observe a lockdep
    warning as below because the lock is not initialized correctly
    since we skip bad block init for volatile regions.

    INFO: trying to register non-static key.
    the code is fine but needs lockdep annotation.
    turning off the locking correctness validator.
    CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
    Call Trace:
    [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
    [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
    [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
    [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
    [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
    [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
    [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
    [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
    [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
    [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
    [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
    [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
    [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
    [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
    [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
    [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
    [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
    [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
    [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
    [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
    [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
    [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
    [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • With PFN_MODE_PMEM namespace, the memmap area is allocated from the device
    area. Some architectures map the memmap area with large page size. On
    architectures like ppc64, 16MB page for memap mapping can map 262144 pfns.
    This maps a namespace size of 16G.

    When populating memmap region with 16MB page from the device area,
    make sure the allocated space is not used to map resources outside this
    namespace. Such usage of device area will prevent a namespace destroy.

    Add resource end pnf in altmap and use that to check if the memmap area
    allocation can map pfn outside the namespace. On ppc64 in such case we fallback
    to allocation from memory.

    This fix kernel crash reported below:

    [ 132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0
    [ 133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000
    [ 133.464760] Faulting instruction address: 0xc00000000007580c
    [ 133.464766] Oops: Kernel access of bad area, sig: 11 [#1]
    [ 133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
    .....
    [ 133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0
    [ 133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0
    [ 133.464910] Call Trace:
    [ 133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable)
    [ 133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240
    [ 133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590
    [ 133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160
    [ 133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0
    [ 133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50
    [ 133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400
    [ 133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250
    [ 133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110
    [ 133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70
    [ 133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0
    [ 133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250
    [ 133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70
    [ 133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250

    djbw: Aneesh notes that this crash can likely be triggered in any kernel that
    supports 'papr_scm', so flagging that commit for -stable consideration.

    Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
    Cc:
    Reported-by: Sachin Sant
    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: Pankaj Gupta
    Tested-by: Santosh Sivaraj
    Reviewed-by: Johannes Thumshirn
    Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • nd_label->dpa issue was observed when trying to enable the namespace created
    with little-endian kernel on a big-endian kernel. That made me run
    `sparse` on the rest of the code and other changes are the result of that.

    Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing")
    Fixes: 9dedc73a4658 ("libnvdimm/btt: Fix LBA masking during 'free list' population")
    Reviewed-by: Vishal Verma
    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190809074726.27815-1-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • Allow arch to provide the supported alignments and use hugepage alignment only
    if we support hugepage. Right now we depend on compile time configs whereas this
    patch switch this to runtime discovery.

    Architectures like ppc64 can have THP enabled in code, but then can have
    hugepage size disabled by the hypervisor. This allows us to create dax devices
    with PAGE_SIZE alignment in this case.

    Existing dax namespace with alignment larger than PAGE_SIZE will fail to
    initialize in this specific case. We still allow fsdax namespace initialization.

    With respect to identifying whether to enable hugepage fault for a dax device,
    if THP is enabled during compile, we default to taking hugepage fault and in dax
    fault handler if we find the fault size > alignment we retry with PAGE_SIZE
    fault size.

    This also addresses the below failure scenario on ppc64

    ndctl create-namespace --mode=devdax | grep align
    "align":16777216,
    "align":16777216

    cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments
    65536 16777216

    daxio.static-debug -z -o /dev/dax0.0
    Bus error (core dumped)

    $ dmesg | tail
    lpar: Failed hash pte insert with error -4
    hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 access=0x8000000000000006 current=daxio
    hash-mmu: trap=0x300 vsid=0x22cb7a3 ssize=1 base psize=2 psize 10 pte=0xc000000501002b86
    daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000]
    daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 4800012c e93f0088 f93f0120
    daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 e93f0088 39290008 f93f0110

    The failure was due to guest kernel using wrong page size.

    The namespaces created with 16M alignment will appear as below on a config with
    16M page size disabled.

    $ ndctl list -Ni
    [
    {
    "dev":"namespace0.1",
    "mode":"fsdax",
    "map":"dev",
    "size":5351931904,
    "uuid":"fc6e9667-461a-4718-82b4-69b24570bddb",
    "align":16777216,
    "blockdev":"pmem0.1",
    "supported_alignments":[
    65536
    ]
    },
    {
    "dev":"namespace0.0",
    "mode":"fsdax",
    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190905154603.10349-8-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     

22 Sep, 2019

2 commits

  • Pull libnvdimm updates from Dan Williams:
    "Some reworks to better support nvdimms on powerpc and an nvdimm
    security interface update:

    - Rework the nvdimm core to accommodate architectures with different
    page sizes and ones that can change supported huge page sizes at
    boot time rather than a compile time constant.

    - Introduce a distinct 'frozen' attribute for the nvdimm security
    state since it is independent of the locked state.

    - Miscellaneous fixups"

    * tag 'libnvdimm-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    libnvdimm: Use PAGE_SIZE instead of SZ_4K for align check
    libnvdimm/label: Remove the dpa align check
    libnvdimm/pfn_dev: Add page size and struct page size to pfn superblock
    libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page size change
    libnvdimm/pmem: Advance namespace seed for specific probe errors
    libnvdimm/region: Rewrite _probe_success() to _advance_seeds()
    libnvdimm/security: Consolidate 'security' operations
    libnvdimm/security: Tighten scope of nvdimm->busy vs security operations
    libnvdimm/security: Introduce a 'frozen' attribute
    libnvdimm, region: Use struct_size() in kzalloc()
    tools/testing/nvdimm: Fix fallthrough warning
    libnvdimm/of_pmem: Provide a unique name for bus provider

    Linus Torvalds
     
  • Pull hmm updates from Jason Gunthorpe:
    "This is more cleanup and consolidation of the hmm APIs and the very
    strongly related mmu_notifier interfaces. Many places across the tree
    using these interfaces are touched in the process. Beyond that a
    cleanup to the page walker API and a few memremap related changes
    round out the series:

    - General improvement of hmm_range_fault() and related APIs, more
    documentation, bug fixes from testing, API simplification &
    consolidation, and unused API removal

    - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE,
    and make them internal kconfig selects

    - Hoist a lot of code related to mmu notifier attachment out of
    drivers by using a refcount get/put attachment idiom and remove the
    convoluted mmu_notifier_unregister_no_release() and related APIs.

    - General API improvement for the migrate_vma API and revision of its
    only user in nouveau

    - Annotate mmu_notifiers with lockdep and sleeping region debugging

    Two series unrelated to HMM or mmu_notifiers came along due to
    dependencies:

    - Allow pagemap's memremap_pages family of APIs to work without
    providing a struct device

    - Make walk_page_range() and related use a constant structure for
    function pointers"

    * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits)
    libnvdimm: Enable unit test infrastructure compile checks
    mm, notifier: Catch sleeping/blocking for !blockable
    kernel.h: Add non_block_start/end()
    drm/radeon: guard against calling an unpaired radeon_mn_unregister()
    csky: add missing brackets in a macro for tlb.h
    pagewalk: use lockdep_assert_held for locking validation
    pagewalk: separate function pointers from iterator data
    mm: split out a new pagewalk.h header from mm.h
    mm/mmu_notifiers: annotate with might_sleep()
    mm/mmu_notifiers: prime lockdep
    mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
    mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports
    mm/hmm: hmm_range_fault() infinite loop
    mm/hmm: hmm_range_fault() NULL pointer bug
    mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
    mm/mmu_notifiers: remove unregister_no_release
    RDMA/odp: remove ib_ucontext from ib_umem
    RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
    RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
    RDMA/mlx5: Use ib_umem_start instead of umem.address
    ...

    Linus Torvalds
     

07 Sep, 2019

1 commit

  • The infrastructure to mock core libnvdimm routines for unit testing
    purposes is prone to bitrot relative to refactoring of that core. Arrange
    for the unit test core to be built when CONFIG_COMPILE_TEST=y. This does
    not result in a functional unit test environment, it is only a helper for
    0day to catch unit test build regressions.

    Note that there are a few x86isms in the implementation, so this does not
    bother compile testing this architectures other than 64-bit x86.

    Link: https://lore.kernel.org/r/156763690875.2556198.15786177395425033830.stgit@dwillia2-desk3.amr.corp.intel.com
    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Dan Williams
     

06 Sep, 2019

6 commits

  • Architectures have different page size than 4K. Use the PAGE_SIZE
    to make sure ranges are correctly aligned.

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190905154603.10349-7-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • There's no strict requirement why slot_valid() needs to check for page alignment
    and it would seem to actively hurt cross-page-size compatibility. Let's
    delete the check and rely on checksum validation.

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190905154603.10349-6-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • This is needed so that pmem probe don't wrongly initialize a namespace
    which doesn't have enough space reserved for holding struct pages
    with the current kernel.

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190905154603.10349-5-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • Namespaces created with PFN_MODE_PMEM mode stores struct page in the reserve
    block area. We need to make sure we account for the right struct page
    size while doing this. Instead of directly depending on sizeof(struct page)
    which can change based on different kernel config option, use the max struct
    page size (64) while calculating the reserve block area. This makes sure pmem
    device can be used across kernels built with different configs.

    If the above assumption of max struct page size change, we need to update the
    reserve block allocation space for new namespaces created.

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190905154603.10349-4-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • In order to support marking namespaces with unsupported feature/versions
    disabled, nvdimm core should advance the namespace seed on these
    probe failures. Otherwise, these failed namespaces will be considered a
    seed namespace and will be wrongly used while creating new namespaces.

    Add -EOPNOTSUPP as return from pmem probe callback to indicate a namespace
    initialization failures due to pfn superblock feature/version mismatch.

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190905154603.10349-3-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     
  • The nd_region_probe_success() helper collides seed management with
    nvdimm->busy tracking. Given the 'busy' increment is handled internal to the
    nd_region driver 'probe' path move the decrement to the 'remove' path.
    With that cleanup the routine can be renamed to the more descriptive
    nd_region_advance_seeds().

    The change is prompted by an incoming need to optionally advance the
    seeds on other events besides 'probe' success.

    Cc: "Aneesh Kumar K.V"
    Signed-off-by: Dan Williams
    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20190905154603.10349-2-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Dan Williams
     

30 Aug, 2019

2 commits

  • The security operations are exported from libnvdimm/security.c to
    libnvdimm/dimm_devs.c, and libnvdimm/security.c is optionally compiled
    based on the CONFIG_NVDIMM_KEYS config symbol.

    Rather than export the operations across compile objects, just move the
    __security_store() entry point to live with the helpers.

    Acked-by: Jeff Moyer
    Reviewed-by: Dave Jiang
    Link: https://lore.kernel.org/r/156686730515.184120.10522747907309996674.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Dan Williams
     
  • An attempt to freeze DIMMs currently runs afoul of default blocking of
    all security operations in the entry to the 'store' routine for the
    'security' sysfs attribute.

    The blanket blocking of all security operations while the DIMM is in
    active use in a region is too restrictive. The only security operations
    that need to be aware of the ->busy state are those that mutate the
    state of data, i.e. erase and overwrite.

    Refactor the ->busy checks to be applied at the entry common entry point
    in __security_store() rather than each of the helper routines to enable
    freeze to be run regardless of busy state.

    Reviewed-by: Dave Jiang
    Reviewed-by: Jeff Moyer
    Link: https://lore.kernel.org/r/156686729996.184120.3458026302402493937.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Dan Williams