20 Jan, 2017

1 commit

  • commit f931ab479dd24cf7a2c6e2df19778406892591fb upstream.

    Both arch_add_memory() and arch_remove_memory() expect a single threaded
    context.

    For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
    not hold any locks over this check and branch:

    if (pgd_val(*pgd)) {
    pud = (pud_t *)pgd_page_vaddr(*pgd);
    paddr_last = phys_pud_init(pud, __pa(vaddr),
    __pa(vaddr_end),
    page_size_mask);
    continue;
    }

    pud = alloc_low_page();
    paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
    page_size_mask);

    The result is that two threads calling devm_memremap_pages()
    simultaneously can end up colliding on pgd initialization. This leads
    to crash signatures like the following where the loser of the race
    initializes the wrong pgd entry:

    BUG: unable to handle kernel paging request at ffff888ebfff0000
    IP: memcpy_erms+0x6/0x10
    PGD 2f8e8fc067 PUD 0 /*
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     

10 Sep, 2016

1 commit

  • track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
    uncacheable rendering them impractical for application usage. DAX-pte
    mappings are cached and the goal of establishing DAX-pmd mappings is to
    attain more performance, not dramatically less (3 orders of magnitude).

    track_pfn_insert() relies on a previous call to reserve_memtype() to
    establish the expected page_cache_mode for the range. While memremap()
    arranges for reserve_memtype() to be called, devm_memremap_pages() does
    not. So, teach track_pfn_insert() and untrack_pfn() how to handle
    tracking without a vma, and arrange for devm_memremap_pages() to
    establish the write-back-cache reservation in the memtype tree.

    Cc:
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Nilesh Choudhury
    Cc: Kirill A. Shutemov
    Reported-by: Toshi Kani
    Reported-by: Kai Zhang
    Acked-by: Andrew Morton
    Signed-off-by: Dan Williams

    Dan Williams
     

29 Jul, 2016

2 commits

  • Pull libnvdimm updates from Dan Williams:

    - Replace pcommit with ADR / directed-flushing.

    The pcommit instruction, which has not shipped on any product, is
    deprecated. Instead, the requirement is that platforms implement
    either ADR, or provide one or more flush addresses per nvdimm.

    ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
    to the memory controller on a power-fail event.

    Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
    Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
    A flush hint is an mmio address that when written and fenced assures
    that all previous posted writes targeting a given dimm have been
    flushed to media.

    - On-demand ARS (address range scrub).

    Linux uses the results of the ACPI ARS commands to track bad blocks
    in pmem devices. When latent errors are detected we re-scrub the
    media to refresh the bad block list, userspace can also request a
    re-scrub at any time.

    - Support for the Microsoft DSM (device specific method) command
    format.

    - Support for EDK2/OVMF virtual disk device memory ranges.

    - Various fixes and cleanups across the subsystem.

    * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
    libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
    nfit: do an ARS scrub on hitting a latent media error
    nfit: move to nfit/ sub-directory
    nfit, libnvdimm: allow an ARS scrub to be triggered on demand
    libnvdimm: register nvdimm_bus devices with an nd_bus driver
    pmem: clarify a debug print in pmem_clear_poison
    x86/insn: remove pcommit
    Revert "KVM: x86: add pcommit support"
    nfit, tools/testing/nvdimm/: unify shutdown paths
    libnvdimm: move ->module to struct nvdimm_bus_descriptor
    nfit: cleanup acpi_nfit_init calling convention
    nfit: fix _FIT evaluation memory leak + use after free
    tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
    tools/testing/nvdimm: add virtual ramdisk range
    acpi, nfit: treat virtual ramdisk SPA as pmem region
    pmem: kill __pmem address space
    pmem: kill wmb_pmem()
    libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
    fs/dax: remove wmb_pmem()
    libnvdimm, pmem: flush posted-write queues on shutdown
    ...

    Linus Torvalds
     
  • Now that ZONE_DEVICE depends on SPARSEMEM_VMEMMAP we can simplify some
    ifdef guards to just ZONE_DEVICE.

    Link: http://lkml.kernel.org/r/146687646788.39261.8020536391978771940.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Reported-by: Vlastimil Babka
    Cc: Eric Sandeen
    Cc: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

25 Jun, 2016

1 commit

  • Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to
    override it and indicate that nfit_test-pmem is not device-mapped. Now,
    we want to enable nfit_test to operate without DMA_CMA and the pmem it
    provides will no longer be physically contiguous, i.e. won't be capable
    of supporting direct_access requests larger than a page. Make
    pmem_direct_access() a weak symbol so that it can be replaced by the
    tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static
    inline now that it no longer needs to be overridden.

    Acked-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     

04 Apr, 2016

1 commit

  • Currently, the memremap code serves MEMREMAP_WB mappings directly from
    the kernel direct mapping, unless the region is in high memory, in which
    case it falls back to using ioremap_cache(). However, the semantics of
    ioremap_cache() are not unambiguously defined, and on ARM, it will
    actually result in a mapping type that differs from the attributes used
    for the linear mapping, and for this reason, the ioremap_cache() call
    fails if the region is part of the memory managed by the kernel.

    So instead, implement an optional hook 'arch_memremap_wb' whose default
    implementation calls ioremap_cache() as before, but which can be
    overridden by the architecture to do what is appropriate for it.

    Acked-by: Dan Williams
    Signed-off-by: Ard Biesheuvel

    Ard Biesheuvel
     

23 Mar, 2016

2 commits

  • Add a flag to memremap() for writecombine mappings. Mappings satisfied
    by this flag will not be cached, however writes may be delayed or
    combined into more efficient bursts. This is most suitable for buffers
    written sequentially by the CPU for use by other DMA devices.

    Signed-off-by: Brian Starkey
    Reviewed-by: Catalin Marinas
    Cc: Dan Williams
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brian Starkey
     
  • These patches implement a MEMREMAP_WC flag for memremap(), which can be
    used to obtain writecombine mappings. This is then used for setting up
    dma_coherent_mem regions which use the DMA_MEMORY_MAP flag.

    The motivation is to fix an alignment fault on arm64, and the suggestion
    to implement MEMREMAP_WC for this case was made at [1]. That particular
    issue is handled in patch 4, which makes sure that the appropriate
    memset function is used when zeroing allocations mapped as IO memory.

    This patch (of 4):

    Don't modify the flags input argument to memremap(). MEMREMAP_WB is
    already a special case so we can check for it directly instead of
    clearing flag bits in each mapper.

    Signed-off-by: Brian Starkey
    Cc: Catalin Marinas
    Cc: Dan Williams
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brian Starkey
     

16 Mar, 2016

1 commit

  • Commit 4b94ffdc4163 ("x86, mm: introduce vmem_altmap to augment
    vmemmap_populate()"), introduced the to_vmem_altmap() function.

    The comments in this function contain two typos (one misspelling of the
    Kconfig option CONFIG_SPARSEMEM_VMEMMAP, and one missing letter 'n'),
    let's fix them up.

    Signed-off-by: Andreas Ziegler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Ziegler
     

15 Mar, 2016

1 commit

  • Pull ram resource handling changes from Ingo Molnar:
    "Core kernel resource handling changes to support NVDIMM error
    injection.

    This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM,
    for System RAM while keeping the current IORESOURCE_MEM type bit set
    for all memory-mapped ranges (including System RAM) for backward
    compatibility.

    With this resource flag it no longer takes a strcmp() loop through the
    resource tree to find "System RAM" resources.

    The new resource type is then used to extend ACPI/APEI error injection
    facility to also support NVDIMM"

    * 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ACPI/EINJ: Allow memory error injection to NVDIMM
    resource: Kill walk_iomem_res()
    x86/kexec: Remove walk_iomem_res() call with GART type
    x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search
    resource: Add walk_iomem_res_desc()
    memremap: Change region_intersects() to take @flags and @desc
    arm/samsung: Change s3c_pm_run_res() to use System RAM type
    resource: Change walk_system_ram() to use System RAM type
    drivers: Initialize resource entry to zero
    xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM
    kexec: Set IORESOURCE_SYSTEM_RAM for System RAM
    arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
    ia64: Set System RAM type and descriptor
    x86/e820: Set System RAM type and descriptor
    resource: Add I/O resource descriptor
    resource: Handle resource flags properly
    resource: Add System RAM resource type

    Linus Torvalds
     

10 Mar, 2016

3 commits

  • In memremap's helper function try_ram_remap(), we dereference a struct
    page pointer that was derived from a PFN that is known to be covered by
    a 'System RAM' iomem region, and is thus assumed to be a 'valid' PFN,
    i.e., a PFN that has a struct page associated with it and is covered by
    the kernel direct mapping.

    However, the assumption that there is a 1:1 relation between the System
    RAM iomem region and the kernel direct mapping is not universally valid
    on all architectures, and on ARM and arm64, 'System RAM' may include
    regions for which pfn_valid() returns false.

    Generally speaking, both __va() and pfn_to_page() should only ever be
    called on PFNs/physical addresses for which pfn_valid() returns true, so
    add that check to try_ram_remap().

    Signed-off-by: Ard Biesheuvel
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ard Biesheuvel
     
  • The check for whether we overlap "System RAM" needs to be done at
    section granularity. For example a system with the following mapping:

    100000000-37bffffff : System RAM
    37c000000-837ffffff : Persistent Memory

    ...is unable to use devm_memremap_pages() as it would result in two
    zones colliding within a given section.

    Signed-off-by: Dan Williams
    Cc: Ross Zwisler
    Reviewed-by: Toshi Kani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Given we have uninitialized list_heads being passed to list_add() it
    will always be the case that those uninitialized values randomly trigger
    the poison value. Especially since a list_add() operation will seed the
    stack with the poison value for later stack allocations to trip over.

    For example, see these two false positive reports:

    list_add attempted on force-poisoned entry
    WARNING: at lib/list_debug.c:34
    [..]
    NIP [c00000000043c390] __list_add+0xb0/0x150
    LR [c00000000043c38c] __list_add+0xac/0x150
    Call Trace:
    __list_add+0xac/0x150 (unreliable)
    __down+0x4c/0xf8
    down+0x68/0x70
    xfs_buf_lock+0x4c/0x150 [xfs]

    list_add attempted on force-poisoned entry(0000000000000500),
    new->next == d0000000059ecdb0, new->prev == 0000000000000500
    WARNING: at lib/list_debug.c:33
    [..]
    NIP [c00000000042db78] __list_add+0xa8/0x140
    LR [c00000000042db74] __list_add+0xa4/0x140
    Call Trace:
    __list_add+0xa4/0x140 (unreliable)
    rwsem_down_read_failed+0x6c/0x1a0
    down_read+0x58/0x60
    xfs_log_commit_cil+0x7c/0x600 [xfs]

    Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup")
    Signed-off-by: Dan Williams
    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

04 Mar, 2016

1 commit


26 Feb, 2016

1 commit

  • Pull libnvdimm fixes from Dan Williams:

    - Two fixes for compatibility with the ACPI 6.1 specification.

    Without these fixes multi-interface DIMMs will fail to be probed, and
    address range scrub commands to find memory errors will give results
    that the kernel will mis-interpret. For multi-interface DIMMs Linux
    will accept either the original 6.0 implementation or 6.1.

    For address range scrub we'll only support 6.1 since ACPI formalized
    this DSM differently than the original example [1] implemented in
    v4.2. The expectation is that production systems will only ever ship
    the ACPI 6.1 address range scrub command definition.

    - The wider async address range scrub work targeting 4.6 discovered
    that the original synchronous implementation in 4.5 is not sizing its
    return buffer correctly.

    - Arnd caught that my recent fix to the size of the pfn_t flags missed
    updating the flags variable used in the pmem driver.

    - Toshi found that we mishandle the memremap() return value in
    devm_memremap().

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    nvdimm: use 'u64' for pfn flags
    devm_memremap: Fix error value when memremap failed
    nfit: update address range scrub commands to the acpi 6.1 format
    libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing
    nfit: fix multi-interface dimm handling, acpi6.1 compatibility

    Linus Torvalds
     

24 Feb, 2016

1 commit

  • devm_memremap() returns an ERR_PTR() value in case of error.
    However, it returns NULL when memremap() failed. This causes
    the caller, such as the pmem driver, to proceed and oops later.

    Change devm_memremap() to return ERR_PTR(-ENXIO) when memremap()
    failed.

    Signed-off-by: Toshi Kani
    Cc: Andrew Morton
    Cc:
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Toshi Kani
     

19 Feb, 2016

1 commit

  • The pmem driver calls devm_memremap() to map a persistent memory range.
    When the pmem driver is unloaded, this memremap'd range is not released
    so the kernel will leak a vma.

    Fix devm_memremap_release() to handle a given memremap'd address
    properly.

    Signed-off-by: Toshi Kani
    Acked-by: Dan Williams
    Cc: Christoph Hellwig
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     

12 Feb, 2016

1 commit

  • The pfn_t type uses an unsigned long to store a pfn + flags value. On a
    64-bit platform the upper 12 bits of an unsigned long are never used for
    storing the value of a pfn. However, this is not true on highmem
    platforms, all 32-bits of a pfn value are used to address a 44-bit
    physical address space. A pfn_t needs to store a 64-bit value.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=112211
    Fixes: 01c8f1c44b83 ("mm, dax, gpu: convert vm_insert_mixed to pfn_t")
    Signed-off-by: Dan Williams
    Reported-by: Stuart Foster
    Reported-by: Julian Margetson
    Tested-by: Julian Margetson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

01 Feb, 2016

1 commit

  • A dma_addr_t is potentially smaller than a phys_addr_t on some archs.
    Don't truncate the address when doing the pfn conversion.

    Cc: Ross Zwisler
    Reported-by: Matthew Wilcox
    [willy: fix pfn_t_to_phys as well]
    Signed-off-by: Dan Williams

    Dan Williams
     

30 Jan, 2016

2 commits

  • Change region_intersects() to identify a target with @flags and
    @desc, instead of @name with strcmp().

    Change the callers of region_intersects(), memremap() and
    devm_memremap(), to set IORESOURCE_SYSTEM_RAM in @flags and
    IORES_DESC_NONE in @desc when searching System RAM.

    Also, export region_intersects() so that the ACPI EINJ error
    injection driver can call this function in a later patch.

    Signed-off-by: Toshi Kani
    Signed-off-by: Borislav Petkov
    Acked-by: Dan Williams
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jakub Sitnicki
    Cc: Jan Kara
    Cc: Jiang Liu
    Cc: Kees Cook
    Cc: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Linus Torvalds
    Cc: Luis R. Rodriguez
    Cc: Michal Hocko
    Cc: Naoya Horiguchi
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Cc: Vlastimil Babka
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm
    Link: http://lkml.kernel.org/r/1453841853-11383-13-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Toshi Kani
     
  • to_vmem_altmap() needs to return valid results until
    arch_remove_memory() completes. It also needs to be valid for any pfn
    in a section regardless of whether that pfn maps to data. This escape
    was a result of a bug in the unit test.

    The signature of this bug is that free_pagetable() fails to retrieve a
    vmem_altmap and goes off into the weeds:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] get_pfnblock_flags_mask+0x49/0x60
    [..]
    Call Trace:
    [] free_hot_cold_page+0x97/0x1d0
    [] __free_pages+0x2a/0x40
    [] free_pagetable+0x8c/0xd4
    [] remove_pagetable+0x37a/0x808
    [] vmemmap_free+0x10/0x20

    Fixes: 4b94ffdc4163 ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()")
    Cc: Andrew Morton
    Reported-by: Jeff Moyer
    Signed-off-by: Dan Williams

    Dan Williams
     

16 Jan, 2016

5 commits

  • A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
    has established a devm_memremap_pages() mapping, i.e. when the pfn_t
    return from ->direct_access() has PFN_DEV and PFN_MAP set. Later, when
    encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
    struct dev_pagemap instance to keep the result of pfn_to_page() valid
    until put_page().

    Signed-off-by: Dan Williams
    Tested-by: Logan Gunthorpe
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • get_dev_page() enables paths like get_user_pages() to pin a dynamically
    mapped pfn-range (devm_memremap_pages()) while the resulting struct page
    objects are in use. Unlike get_page() it may fail if the device is, or
    is in the process of being, disabled. While the initial lookup of the
    range may be an expensive list walk, the result is cached to speed up
    subsequent lookups which are likely to be in the same mapped range.

    devm_memremap_pages() now requires a reference counter to be specified
    at init time. For pmem this means moving request_queue allocation into
    pmem_alloc() so the existing queue usage counter can track "device
    pages".

    ZONE_DEVICE pages always have an elevated count and will never be on an
    lru reclaim list. That space in 'struct page' can be redirected for
    other uses, but for safety introduce a poison value that will always
    trip __list_add() to assert. This allows half of the struct list_head
    storage to be reclaimed with some assurance to back up the assumption
    that the page count never goes to zero and a list_add() is never
    attempted.

    Signed-off-by: Dan Williams
    Tested-by: Logan Gunthorpe
    Cc: Dave Hansen
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In support of providing struct page for large persistent memory
    capacities, use struct vmem_altmap to change the default policy for
    allocating memory for the memmap array. The default vmemmap_populate()
    allocates page table storage area from the page allocator. Given
    persistent memory capacities relative to DRAM it may not be feasible to
    store the memmap in 'System Memory'. Instead vmem_altmap represents
    pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
    requests.

    Signed-off-by: Dan Williams
    Reported-by: kbuild test robot
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • There are several scenarios where we need to retrieve and update
    metadata associated with a given devm_memremap_pages() mapping, and the
    only lookup key available is a pfn in the range:

    1/ We want to augment vmemmap_populate() (called via arch_add_memory())
    to allocate memmap storage from pre-allocated pages reserved by the
    device driver. At vmemmap_alloc_block_buf() time it grabs device pages
    rather than page allocator pages. This is in support of
    devm_memremap_pages() mappings where the memmap is too large to fit in
    main memory (i.e. large persistent memory devices).

    2/ Taking a reference against the mapping when inserting device pages
    into the address_space radix of a given inode. This facilitates
    unmap_mapping_range() and truncate_inode_pages() operations when the
    driver is tearing down the mapping.

    3/ get_user_pages() operations on ZONE_DEVICE memory require taking a
    reference against the mapping so that the driver teardown path can
    revoke and drain usage of device pages.

    Signed-off-by: Dan Williams
    Tested-by: Logan Gunthorpe
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • For the purpose of communicating the optional presence of a 'struct
    page' for the pfn returned from ->direct_access(), introduce a type that
    encapsulates a page-frame-number plus flags. These flags contain the
    historical "page_link" encoding for a scatterlist entry, but can also
    denote "device memory". Where "device memory" is a set of pfns that are
    not part of the kernel's linear mapping by default, but are accessed via
    the same memory controller as ram.

    The motivation for this new type is large capacity persistent memory
    that needs struct page entries in the 'memmap' to support 3rd party DMA
    (i.e. O_DIRECT I/O with a persistent memory source/target). However,
    we also need it in support of maintaining a list of mapped inodes which
    need to be unmapped at driver teardown or freeze_bdev() time.

    Signed-off-by: Dan Williams
    Cc: Christoph Hellwig
    Cc: Dave Hansen
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

11 Nov, 2015

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "Outside of the new ACPI-NFIT hot-add support this pull request is more
    notable for what it does not contain, than what it does. There were a
    handful of development topics this cycle, dax get_user_pages, dax
    fsync, and raw block dax, that need more more iteration and will wait
    for 4.5.

    The patches to make devm and the pmem driver NUMA aware have been in
    -next for several weeks. The hot-add support has not, but is
    contained to the NFIT driver and is passing unit tests. The coredump
    support is straightforward and was looked over by Jeff. All of it has
    received a 0day build success notification across 107 configs.

    Summary:

    - Add support for the ACPI 6.0 NFIT hot add mechanism to process
    updates of the NFIT at runtime.

    - Teach the coredump implementation how to filter out DAX mappings.

    - Introduce NUMA hints for allocations made by the pmem driver, and
    as a side effect all devm allocations now hint their NUMA node by
    default"

    * tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    coredump: add DAX filtering for FDPIC ELF coredumps
    coredump: add DAX filtering for ELF coredumps
    acpi: nfit: Add support for hot-add
    nfit: in acpi_nfit_init, break on a 0-length table
    pmem, memremap: convert to numa aware allocations
    devm_memremap_pages: use numa_mem_id
    devm: make allocations numa aware by default
    devm_memremap: convert to return ERR_PTR
    devm_memunmap: use devres_release()
    pmem: kill memremap_pmem()
    x86, mm: quiet arch_add_memory()

    Linus Torvalds
     

27 Oct, 2015

1 commit

  • Currently memremap checks if the range is "System RAM" and returns the
    kernel linear address. This is broken for highmem platforms where a
    range may be "System RAM", but is not part of the kernel linear mapping.
    Fallback to ioremap_cache() in these cases, to let the arch code attempt
    to handle it.

    Note that ARM ioremap will WARN when attempting to remap ram, and in
    that case the caller needs to be fixed. For this reason, existing
    ioremap_cache() usages for ARM are already trained to avoid attempts to
    remap ram.

    The impact of this bug is low for now since the pmem driver is the only
    user of memremap(), but this is important to fix before more conversions
    to memremap arrive in 4.4.

    Cc: Rafael J. Wysocki
    Reported-by: Ard Biesheuvel
    Acked-by: Ard Biesheuvel
    Signed-off-by: Dan Williams

    Dan Williams
     

10 Oct, 2015

4 commits


28 Aug, 2015

1 commit

  • This behaves like devm_memremap except that it ensures we have page
    structures available that can back the region.

    Signed-off-by: Christoph Hellwig
    [djbw: catch attempts to remap RAM, drop flags]
    Signed-off-by: Dan Williams

    Christoph Hellwig
     

15 Aug, 2015

2 commits

  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Christoph Hellwig
     
  • Existing users of ioremap_cache() are mapping memory that is known in
    advance to not have i/o side effects. These users are forced to cast
    away the __iomem annotation, or otherwise neglect to fix the sparse
    errors thrown when dereferencing pointers to this memory. Provide
    memremap() as a non __iomem annotated ioremap_*() in the case when
    ioremap is otherwise a pointer to cacheable memory. Empirically,
    ioremap_() call sites are seeking memory-like semantics
    (e.g. speculative reads, and prefetching permitted).

    memremap() is a break from the ioremap implementation pattern of adding
    a new memremap_() for each mapping type and having silent
    compatibility fall backs. Instead, the implementation defines flags
    that are passed to the central memremap() and if a mapping type is not
    supported by an arch memremap returns NULL.

    We introduce a memremap prototype as a trivial wrapper of
    ioremap_cache() and ioremap_wt(). Later, once all ioremap_cache() and
    ioremap_wt() usage has been removed from drivers we teach archs to
    implement arch_memremap() with the ability to strictly enforce the
    mapping type.

    Cc: Arnd Bergmann
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams