07 Dec, 2016

1 commit

  • Hugh notes in response to commit 4cb19355ea19 "device-dax: fail all
    private mapping attempts":

    "I think that is more restrictive than you intended: haven't tried, but I
    believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap, leaving no
    way to mmap /dev/dax without write permission to it."

    Indeed it does restrict read-only mappings, switch to checking
    VM_MAYSHARE, not VM_SHARED.

    Cc:
    Cc: Dave Hansen
    Cc: Pawel Lebioda
    Fixes: 4cb19355ea19 ("device-dax: fail all private mapping attempts")
    Reported-by: Hugh Dickins
    Signed-off-by: Dan Williams

    Dan Williams
     

17 Nov, 2016

1 commit

  • The device-dax implementation originally tried to be tricky and allow
    private read-only mappings, but in the process allowed writable
    MAP_PRIVATE + MAP_NORESERVE mappings. For simplicity and predictability
    just fail all private mapping attempts since device-dax memory is
    statically allocated and will never support overcommit.

    Cc:
    Cc: Dave Hansen
    Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
    Reported-by: Pawel Lebioda
    Signed-off-by: Dan Williams

    Dan Williams
     

29 Oct, 2016

1 commit

  • If the dax_pmem driver is passed a resource that is already busy the
    driver probe attempt should fail with a message like the following:

    dax_pmem dax0.1: could not reserve region [mem 0x100000000-0x11fffffff]

    However, if we do not catch the error we crash for the obvious reason of
    accessing memory that is not mapped.

    BUG: unable to handle kernel paging request at ffffc90020001000
    IP: [] __memcpy+0x12/0x20
    [..]
    Call Trace:
    [] ? nsio_rw_bytes+0x60/0x180
    [] nd_pfn_validate+0x75/0x320
    [] nvdimm_setup_pfn+0xb9/0x5d0
    [] ? devm_nsio_enable+0xff/0x110
    [] dax_pmem_probe+0x59/0x260

    Cc:
    Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory")
    Reported-by: Dave Hansen
    Signed-off-by: Dan Williams

    Dan Williams
     

28 Oct, 2016

2 commits

  • We need to wait until the percpu_ref is released before exit. Otherwise,
    we sometimes lose the race and trigger this new warning that was added
    in v4.9 (commit a67823c1ed10 "percpu-refcount: init ->confirm_switch
    member properly"):

    WARNING: CPU: 0 PID: 3629 at lib/percpu-refcount.c:107 percpu_ref_exit+0x51/0x60
    [..]
    Call Trace:
    [] dump_stack+0x85/0xc2
    [] __warn+0xcb/0xf0
    [] warn_slowpath_null+0x1d/0x20
    [] percpu_ref_exit+0x51/0x60
    [] dax_pmem_percpu_exit+0x1a/0x50 [dax_pmem]
    [] devm_action_release+0xf/0x20

    Cc:
    Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory")
    Signed-off-by: Dan Williams

    Dan Williams
     
  • A bugfix just tried to address a randconfig build problem and introduced
    a variant of the same problem: with CONFIG_LIBNVDIMM=y and
    CONFIG_NVDIMM_DAX=m, the nvdimm module now fails to link:

    drivers/nvdimm/built-in.o: In function `to_nd_device_type':
    bus.c:(.text+0x1b5d): undefined reference to `is_nd_dax'
    drivers/nvdimm/built-in.o: In function `nd_region_notify_driver_action.constprop.2':
    region_devs.c:(.text+0x6b6c): undefined reference to `is_nd_dax'
    region_devs.c:(.text+0x6b8c): undefined reference to `to_nd_dax'
    drivers/nvdimm/built-in.o: In function `nd_region_probe':
    region.c:(.text+0x70f3): undefined reference to `nd_dax_create'
    drivers/nvdimm/built-in.o: In function `mode_show':
    namespace_devs.c:(.text+0xa196): undefined reference to `is_nd_dax'
    drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
    (.text+0xa55f): undefined reference to `is_nd_dax'
    drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
    (.text+0xa56e): undefined reference to `to_nd_dax'

    This reverts the earlier fix, making NVDIMM_DAX a 'bool' option again
    as it should be (it gets linked into the libnvdimm module). To fix
    the original problem, I'm adding a dependency on LIBNVDIMM to
    DEV_DAX_PMEM, which ensures we can't have that one built-in if the
    rest is a module.

    Fixes: 4e65e9381c7a ("/dev/dax: fix Kconfig dependency build breakage")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Arnd Bergmann
     

08 Oct, 2016

3 commits


04 Sep, 2016

1 commit

  • pgoff_to_phys() validates that both the starting address and the length
    of the mapping against the resource list. We need to check for a
    mapping size of PMD_SIZE not PAGE_SIZE in the pmd fault path.

    Signed-off-by: Dan Williams

    Dan Williams
     

27 Aug, 2016

1 commit

  • The data offset for a dax region needs to account for a reservation in
    the resource range. Otherwise, device-dax is allowing mappings directly
    into the memmap or device-info-block area with crash signatures like the
    following:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: get_zone_device_page+0x11/0x30
    Call Trace:
    follow_devmap_pmd+0x298/0x2c0
    follow_page_mask+0x275/0x530
    __get_user_pages+0xe3/0x750
    __gfn_to_pfn_memslot+0x1b2/0x450 [kvm]
    tdp_page_fault+0x130/0x280 [kvm]
    kvm_mmu_page_fault+0x5f/0xf0 [kvm]
    handle_ept_violation+0x94/0x180 [kvm_intel]
    vmx_handle_exit+0x1d3/0x1440 [kvm_intel]
    kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm]
    kvm_vcpu_ioctl+0x33c/0x620 [kvm]
    do_vfs_ioctl+0xa2/0x5d0
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x1a/0xa4

    Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory")
    Link: http://lkml.kernel.org/r/147205536732.1606.8994275381938837346.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Reported-by: Abhilash Kumar Mulumudi
    Reported-by: Toshi Kani
    Tested-by: Toshi Kani
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

24 Aug, 2016

8 commits


07 Jul, 2016

1 commit

  • If devm_add_action() fails, we are explicitly calling the cleanup to free
    the resources allocated. Use the helper devm_add_action_or_reset()
    and return directly in case of error, since the cleanup function
    has been already called by the helper if there was any error.

    Reported-by: Sudip Mukherjee
    Signed-off-by: Vikas C Sajjan
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Sajjan, Vikas C
     

21 May, 2016

2 commits

  • The "Device DAX" core enables dax mappings of performance / feature
    differentiated memory. An open mapping or file handle keeps the backing
    struct device live, but new mappings are only possible while the device
    is enabled. Faults are handled under rcu_read_lock to synchronize
    with the enabled state of the device.

    Similar to the filesystem-dax case the backing memory may optionally
    have struct page entries. However, unlike fs-dax there is no support
    for private mappings, or mappings that are not backed by media (see
    use of zero-page in fs-dax).

    Mappings are always guaranteed to match the alignment of the dax_region.
    If the dax_region is configured to have a 2MB alignment, all mappings
    are guaranteed to be backed by a pmd entry. Contrast this determinism
    with the fs-dax case where pmd mappings are opportunistic. If userspace
    attempts to force a misaligned mapping, the driver will fail the mmap
    attempt. See dax_dev_check_vma() for other scenarios that are rejected,
    like MAP_PRIVATE mappings.

    Cc: Hannes Reinecke
    Cc: Jeff Moyer
    Cc: Christoph Hellwig
    Cc: Andrew Morton
    Cc: Dave Hansen
    Cc: Ross Zwisler
    Acked-by: "Paul E. McKenney"
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Device DAX is the device-centric analogue of Filesystem DAX
    (CONFIG_FS_DAX). It allows memory ranges to be allocated and mapped
    without need of an intervening file system. Device DAX is strict,
    precise and predictable. Specifically this interface:

    1/ Guarantees fault granularity with respect to a given page size (pte,
    pmd, or pud) set at configuration time.

    2/ Enforces deterministic behavior by being strict about what fault
    scenarios are supported.

    For example, by forcing MADV_DONTFORK semantics and omitting MAP_PRIVATE
    support device-dax guarantees that a mapping always behaves/performs the
    same once established. It is the "what you see is what you get" access
    mechanism to differentiated memory vs filesystem DAX which has
    filesystem specific implementation semantics.

    Persistent memory is the first target, but the mechanism is also
    targeted for exclusive allocations of performance differentiated memory
    ranges.

    This commit is limited to the base device driver infrastructure to
    associate a dax device with pmem range.

    Cc: Jeff Moyer
    Cc: Christoph Hellwig
    Cc: Andrew Morton
    Cc: Dave Hansen
    Cc: Ross Zwisler
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams