01 Feb, 2016

1 commit

  • A dma_addr_t is potentially smaller than a phys_addr_t on some archs.
    Don't truncate the address when doing the pfn conversion.

    Cc: Ross Zwisler
    Reported-by: Matthew Wilcox
    [willy: fix pfn_t_to_phys as well]
    Signed-off-by: Dan Williams

    Dan Williams
     

30 Jan, 2016

1 commit

  • to_vmem_altmap() needs to return valid results until
    arch_remove_memory() completes. It also needs to be valid for any pfn
    in a section regardless of whether that pfn maps to data. This escape
    was a result of a bug in the unit test.

    The signature of this bug is that free_pagetable() fails to retrieve a
    vmem_altmap and goes off into the weeds:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] get_pfnblock_flags_mask+0x49/0x60
    [..]
    Call Trace:
    [] free_hot_cold_page+0x97/0x1d0
    [] __free_pages+0x2a/0x40
    [] free_pagetable+0x8c/0xd4
    [] remove_pagetable+0x37a/0x808
    [] vmemmap_free+0x10/0x20

    Fixes: 4b94ffdc4163 ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()")
    Cc: Andrew Morton
    Reported-by: Jeff Moyer
    Signed-off-by: Dan Williams

    Dan Williams
     

16 Jan, 2016

5 commits

  • A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
    has established a devm_memremap_pages() mapping, i.e. when the pfn_t
    return from ->direct_access() has PFN_DEV and PFN_MAP set. Later, when
    encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
    struct dev_pagemap instance to keep the result of pfn_to_page() valid
    until put_page().

    Signed-off-by: Dan Williams
    Tested-by: Logan Gunthorpe
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • get_dev_page() enables paths like get_user_pages() to pin a dynamically
    mapped pfn-range (devm_memremap_pages()) while the resulting struct page
    objects are in use. Unlike get_page() it may fail if the device is, or
    is in the process of being, disabled. While the initial lookup of the
    range may be an expensive list walk, the result is cached to speed up
    subsequent lookups which are likely to be in the same mapped range.

    devm_memremap_pages() now requires a reference counter to be specified
    at init time. For pmem this means moving request_queue allocation into
    pmem_alloc() so the existing queue usage counter can track "device
    pages".

    ZONE_DEVICE pages always have an elevated count and will never be on an
    lru reclaim list. That space in 'struct page' can be redirected for
    other uses, but for safety introduce a poison value that will always
    trip __list_add() to assert. This allows half of the struct list_head
    storage to be reclaimed with some assurance to back up the assumption
    that the page count never goes to zero and a list_add() is never
    attempted.

    Signed-off-by: Dan Williams
    Tested-by: Logan Gunthorpe
    Cc: Dave Hansen
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In support of providing struct page for large persistent memory
    capacities, use struct vmem_altmap to change the default policy for
    allocating memory for the memmap array. The default vmemmap_populate()
    allocates page table storage area from the page allocator. Given
    persistent memory capacities relative to DRAM it may not be feasible to
    store the memmap in 'System Memory'. Instead vmem_altmap represents
    pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
    requests.

    Signed-off-by: Dan Williams
    Reported-by: kbuild test robot
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • There are several scenarios where we need to retrieve and update
    metadata associated with a given devm_memremap_pages() mapping, and the
    only lookup key available is a pfn in the range:

    1/ We want to augment vmemmap_populate() (called via arch_add_memory())
    to allocate memmap storage from pre-allocated pages reserved by the
    device driver. At vmemmap_alloc_block_buf() time it grabs device pages
    rather than page allocator pages. This is in support of
    devm_memremap_pages() mappings where the memmap is too large to fit in
    main memory (i.e. large persistent memory devices).

    2/ Taking a reference against the mapping when inserting device pages
    into the address_space radix of a given inode. This facilitates
    unmap_mapping_range() and truncate_inode_pages() operations when the
    driver is tearing down the mapping.

    3/ get_user_pages() operations on ZONE_DEVICE memory require taking a
    reference against the mapping so that the driver teardown path can
    revoke and drain usage of device pages.

    Signed-off-by: Dan Williams
    Tested-by: Logan Gunthorpe
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • For the purpose of communicating the optional presence of a 'struct
    page' for the pfn returned from ->direct_access(), introduce a type that
    encapsulates a page-frame-number plus flags. These flags contain the
    historical "page_link" encoding for a scatterlist entry, but can also
    denote "device memory". Where "device memory" is a set of pfns that are
    not part of the kernel's linear mapping by default, but are accessed via
    the same memory controller as ram.

    The motivation for this new type is large capacity persistent memory
    that needs struct page entries in the 'memmap' to support 3rd party DMA
    (i.e. O_DIRECT I/O with a persistent memory source/target). However,
    we also need it in support of maintaining a list of mapped inodes which
    need to be unmapped at driver teardown or freeze_bdev() time.

    Signed-off-by: Dan Williams
    Cc: Christoph Hellwig
    Cc: Dave Hansen
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

11 Nov, 2015

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "Outside of the new ACPI-NFIT hot-add support this pull request is more
    notable for what it does not contain, than what it does. There were a
    handful of development topics this cycle, dax get_user_pages, dax
    fsync, and raw block dax, that need more more iteration and will wait
    for 4.5.

    The patches to make devm and the pmem driver NUMA aware have been in
    -next for several weeks. The hot-add support has not, but is
    contained to the NFIT driver and is passing unit tests. The coredump
    support is straightforward and was looked over by Jeff. All of it has
    received a 0day build success notification across 107 configs.

    Summary:

    - Add support for the ACPI 6.0 NFIT hot add mechanism to process
    updates of the NFIT at runtime.

    - Teach the coredump implementation how to filter out DAX mappings.

    - Introduce NUMA hints for allocations made by the pmem driver, and
    as a side effect all devm allocations now hint their NUMA node by
    default"

    * tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    coredump: add DAX filtering for FDPIC ELF coredumps
    coredump: add DAX filtering for ELF coredumps
    acpi: nfit: Add support for hot-add
    nfit: in acpi_nfit_init, break on a 0-length table
    pmem, memremap: convert to numa aware allocations
    devm_memremap_pages: use numa_mem_id
    devm: make allocations numa aware by default
    devm_memremap: convert to return ERR_PTR
    devm_memunmap: use devres_release()
    pmem: kill memremap_pmem()
    x86, mm: quiet arch_add_memory()

    Linus Torvalds
     

27 Oct, 2015

1 commit

  • Currently memremap checks if the range is "System RAM" and returns the
    kernel linear address. This is broken for highmem platforms where a
    range may be "System RAM", but is not part of the kernel linear mapping.
    Fallback to ioremap_cache() in these cases, to let the arch code attempt
    to handle it.

    Note that ARM ioremap will WARN when attempting to remap ram, and in
    that case the caller needs to be fixed. For this reason, existing
    ioremap_cache() usages for ARM are already trained to avoid attempts to
    remap ram.

    The impact of this bug is low for now since the pmem driver is the only
    user of memremap(), but this is important to fix before more conversions
    to memremap arrive in 4.4.

    Cc: Rafael J. Wysocki
    Reported-by: Ard Biesheuvel
    Acked-by: Ard Biesheuvel
    Signed-off-by: Dan Williams

    Dan Williams
     

10 Oct, 2015

4 commits


28 Aug, 2015

1 commit

  • This behaves like devm_memremap except that it ensures we have page
    structures available that can back the region.

    Signed-off-by: Christoph Hellwig
    [djbw: catch attempts to remap RAM, drop flags]
    Signed-off-by: Dan Williams

    Christoph Hellwig
     

15 Aug, 2015

2 commits

  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Christoph Hellwig
     
  • Existing users of ioremap_cache() are mapping memory that is known in
    advance to not have i/o side effects. These users are forced to cast
    away the __iomem annotation, or otherwise neglect to fix the sparse
    errors thrown when dereferencing pointers to this memory. Provide
    memremap() as a non __iomem annotated ioremap_*() in the case when
    ioremap is otherwise a pointer to cacheable memory. Empirically,
    ioremap_() call sites are seeking memory-like semantics
    (e.g. speculative reads, and prefetching permitted).

    memremap() is a break from the ioremap implementation pattern of adding
    a new memremap_() for each mapping type and having silent
    compatibility fall backs. Instead, the implementation defines flags
    that are passed to the central memremap() and if a mapping type is not
    supported by an arch memremap returns NULL.

    We introduce a memremap prototype as a trivial wrapper of
    ioremap_cache() and ioremap_wt(). Later, once all ioremap_cache() and
    ioremap_wt() usage has been removed from drivers we teach archs to
    implement arch_memremap() with the ability to strictly enforce the
    mapping type.

    Cc: Arnd Bergmann
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams