Eric Lee / smarc-fsl-linux-kernel

20 Jan, 2017

1 commit

692755b10 mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} ... Browse Code »

commit f931ab479dd24cf7a2c6e2df19778406892591fb upstream.

Both arch_add_memory() and arch_remove_memory() expect a single threaded
context.

For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
not hold any locks over this check and branch:

if (pgd_val(*pgd)) {
pud = (pud_t *)pgd_page_vaddr(*pgd);
paddr_last = phys_pud_init(pud, __pa(vaddr),
__pa(vaddr_end),
page_size_mask);
continue;
}

pud = alloc_low_page();
paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
page_size_mask);

The result is that two threads calling devm_memremap_pages()
simultaneously can end up colliding on pgd initialization. This leads
to crash signatures like the following where the loser of the race
initializes the wrong pgd entry:

BUG: unable to handle kernel paging request at ffff888ebfff0000
IP: memcpy_erms+0x6/0x10
PGD 2f8e8fc067 PUD 0 /*
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Dan Williams
2017-01-20 03:17:58 +0800

10 Sep, 2016

1 commit

9049771f7 mm: fix cache mode of dax pmd mappings ... Browse Code »

track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
uncacheable rendering them impractical for application usage. DAX-pte
mappings are cached and the goal of establishing DAX-pmd mappings is to
attain more performance, not dramatically less (3 orders of magnitude).

track_pfn_insert() relies on a previous call to reserve_memtype() to
establish the expected page_cache_mode for the range. While memremap()
arranges for reserve_memtype() to be called, devm_memremap_pages() does
not. So, teach track_pfn_insert() and untrack_pfn() how to handle
tracking without a vma, and arrange for devm_memremap_pages() to
establish the write-back-cache reservation in the memtype tree.

Cc:
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc: Nilesh Choudhury
Cc: Kirill A. Shutemov
Reported-by: Toshi Kani
Reported-by: Kai Zhang
Acked-by: Andrew Morton
Signed-off-by: Dan Williams

Dan Williams
2016-09-10 08:34:46 +0800

29 Jul, 2016

2 commits

f0c98ebc5 Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:

- Replace pcommit with ADR / directed-flushing.

The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement
either ADR, or provide one or more flush addresses per nvdimm.

ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
to the memory controller on a power-fail event.

Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
A flush hint is an mmio address that when written and fenced assures
that all previous posted writes targeting a given dimm have been
flushed to media.

- On-demand ARS (address range scrub).

Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the
media to refresh the bad block list, userspace can also request a
re-scrub at any time.

- Support for the Microsoft DSM (device specific method) command
format.

- Support for EDK2/OVMF virtual disk device memory ranges.

- Various fixes and cleanups across the subsystem.

* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
nfit: do an ARS scrub on hitting a latent media error
nfit: move to nfit/ sub-directory
nfit, libnvdimm: allow an ARS scrub to be triggered on demand
libnvdimm: register nvdimm_bus devices with an nd_bus driver
pmem: clarify a debug print in pmem_clear_poison
x86/insn: remove pcommit
Revert "KVM: x86: add pcommit support"
nfit, tools/testing/nvdimm/: unify shutdown paths
libnvdimm: move ->module to struct nvdimm_bus_descriptor
nfit: cleanup acpi_nfit_init calling convention
nfit: fix _FIT evaluation memory leak + use after free
tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
tools/testing/nvdimm: add virtual ramdisk range
acpi, nfit: treat virtual ramdisk SPA as pmem region
pmem: kill __pmem address space
pmem: kill wmb_pmem()
libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
fs/dax: remove wmb_pmem()
libnvdimm, pmem: flush posted-write queues on shutdown
...

Linus Torvalds
2016-07-29 08:38:16 +0800
11db04864 mm: cleanup ifdef guards for vmem_altmap ... Browse Code »

Now that ZONE_DEVICE depends on SPARSEMEM_VMEMMAP we can simplify some
ifdef guards to just ZONE_DEVICE.

Link: http://lkml.kernel.org/r/146687646788.39261.8020536391978771940.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams
Reported-by: Vlastimil Babka
Cc: Eric Sandeen
Cc: Jeff Moyer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2016-07-29 07:07:41 +0800

25 Jun, 2016

1 commit

f295e53b6 libnvdimm, pmem: allow nfit_test to override pmem_direct_access() ... Browse Code »

Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to
override it and indicate that nfit_test-pmem is not device-mapped. Now,
we want to enable nfit_test to operate without DMA_CMA and the pmem it
provides will no longer be physically contiguous, i.e. won't be capable
of supporting direct_access requests larger than a page. Make
pmem_direct_access() a weak symbol so that it can be replaced by the
tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static
inline now that it no longer needs to be overridden.

Acked-by: Johannes Thumshirn
Signed-off-by: Dan Williams

Dan Williams
2016-06-25 02:39:29 +0800

04 Apr, 2016

1 commit

c269cba35 memremap: add arch specific hook for MEMREMAP_WB mappings ... Browse Code »

Currently, the memremap code serves MEMREMAP_WB mappings directly from
the kernel direct mapping, unless the region is in high memory, in which
case it falls back to using ioremap_cache(). However, the semantics of
ioremap_cache() are not unambiguously defined, and on ARM, it will
actually result in a mapping type that differs from the attributes used
for the linear mapping, and for this reason, the ioremap_cache() call
fails if the region is part of the memory managed by the kernel.

So instead, implement an optional hook 'arch_memremap_wb' whose default
implementation calls ioremap_cache() as before, but which can be
overridden by the architecture to do what is appropriate for it.

Acked-by: Dan Williams
Signed-off-by: Ard Biesheuvel

Ard Biesheuvel
2016-04-04 16:26:41 +0800

23 Mar, 2016

2 commits

c907e0eb4 memremap: add MEMREMAP_WC flag ... Browse Code »

Add a flag to memremap() for writecombine mappings. Mappings satisfied
by this flag will not be cached, however writes may be delayed or
combined into more efficient bursts. This is most suitable for buffers
written sequentially by the CPU for use by other DMA devices.

Signed-off-by: Brian Starkey
Reviewed-by: Catalin Marinas
Cc: Dan Williams
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brian Starkey
2016-03-23 06:36:02 +0800
cf61e2a14 memremap: don't modify flags ... Browse Code »

These patches implement a MEMREMAP_WC flag for memremap(), which can be
used to obtain writecombine mappings. This is then used for setting up
dma_coherent_mem regions which use the DMA_MEMORY_MAP flag.

The motivation is to fix an alignment fault on arm64, and the suggestion
to implement MEMREMAP_WC for this case was made at [1]. That particular
issue is handled in patch 4, which makes sure that the appropriate
memset function is used when zeroing allocations mapped as IO memory.

This patch (of 4):

Don't modify the flags input argument to memremap(). MEMREMAP_WB is
already a special case so we can check for it directly instead of
clearing flag bits in each mapper.

Signed-off-by: Brian Starkey
Cc: Catalin Marinas
Cc: Dan Williams
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brian Starkey
2016-03-23 06:36:02 +0800

16 Mar, 2016

1 commit

07061aab2 mm: fix two typos in comments for to_vmem_altmap() ... Browse Code »

Commit 4b94ffdc4163 ("x86, mm: introduce vmem_altmap to augment
vmemmap_populate()"), introduced the to_vmem_altmap() function.

The comments in this function contain two typos (one misspelling of the
Kconfig option CONFIG_SPARSEMEM_VMEMMAP, and one missing letter 'n'),
let's fix them up.

Signed-off-by: Andreas Ziegler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andreas Ziegler
2016-03-16 07:55:16 +0800

15 Mar, 2016

1 commit

d37a14bb5 Merge branch 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull ram resource handling changes from Ingo Molnar:
"Core kernel resource handling changes to support NVDIMM error
injection.

This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM,
for System RAM while keeping the current IORESOURCE_MEM type bit set
for all memory-mapped ranges (including System RAM) for backward
compatibility.

With this resource flag it no longer takes a strcmp() loop through the
resource tree to find "System RAM" resources.

The new resource type is then used to extend ACPI/APEI error injection
facility to also support NVDIMM"

* 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ACPI/EINJ: Allow memory error injection to NVDIMM
resource: Kill walk_iomem_res()
x86/kexec: Remove walk_iomem_res() call with GART type
x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search
resource: Add walk_iomem_res_desc()
memremap: Change region_intersects() to take @flags and @desc
arm/samsung: Change s3c_pm_run_res() to use System RAM type
resource: Change walk_system_ram() to use System RAM type
drivers: Initialize resource entry to zero
xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM
kexec: Set IORESOURCE_SYSTEM_RAM for System RAM
arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
ia64: Set System RAM type and descriptor
x86/e820: Set System RAM type and descriptor
resource: Add I/O resource descriptor
resource: Handle resource flags properly
resource: Add System RAM resource type

Linus Torvalds
2016-03-15 06:15:51 +0800

10 Mar, 2016

3 commits

ac343e882 memremap: check pfn validity before passing to pfn_to_page() ... Browse Code »

In memremap's helper function try_ram_remap(), we dereference a struct
page pointer that was derived from a PFN that is known to be covered by
a 'System RAM' iomem region, and is thus assumed to be a 'valid' PFN,
i.e., a PFN that has a struct page associated with it and is covered by
the kernel direct mapping.

However, the assumption that there is a 1:1 relation between the System
RAM iomem region and the kernel direct mapping is not universally valid
on all architectures, and on ARM and arm64, 'System RAM' may include
regions for which pfn_valid() returns false.

Generally speaking, both __va() and pfn_to_page() should only ever be
called on PFNs/physical addresses for which pfn_valid() returns true, so
add that check to try_ram_remap().

Signed-off-by: Ard Biesheuvel
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ard Biesheuvel
2016-03-10 07:43:42 +0800
5f29a77cd mm: fix mixed zone detection in devm_memremap_pages ... Browse Code »

The check for whether we overlap "System RAM" needs to be done at
section granularity. For example a system with the following mapping:

100000000-37bffffff : System RAM
37c000000-837ffffff : Persistent Memory

...is unable to use devm_memremap_pages() as it would result in two
zones colliding within a given section.

Signed-off-by: Dan Williams
Cc: Ross Zwisler
Reviewed-by: Toshi Kani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2016-03-10 07:43:42 +0800
d77a117e6 list: kill list_force_poison() ... Browse Code »

Given we have uninitialized list_heads being passed to list_add() it
will always be the case that those uninitialized values randomly trigger
the poison value. Especially since a list_add() operation will seed the
stack with the poison value for later stack allocations to trip over.

For example, see these two false positive reports:

list_add attempted on force-poisoned entry
WARNING: at lib/list_debug.c:34
[..]
NIP [c00000000043c390] __list_add+0xb0/0x150
LR [c00000000043c38c] __list_add+0xac/0x150
Call Trace:
__list_add+0xac/0x150 (unreliable)
__down+0x4c/0xf8
down+0x68/0x70
xfs_buf_lock+0x4c/0x150 [xfs]

list_add attempted on force-poisoned entry(0000000000000500),
new->next == d0000000059ecdb0, new->prev == 0000000000000500
WARNING: at lib/list_debug.c:33
[..]
NIP [c00000000042db78] __list_add+0xa8/0x140
LR [c00000000042db74] __list_add+0xa4/0x140
Call Trace:
__list_add+0xa4/0x140 (unreliable)
rwsem_down_read_failed+0x6c/0x1a0
down_read+0x58/0x60
xfs_log_commit_cil+0x7c/0x600 [xfs]

Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup")
Signed-off-by: Dan Williams
Reported-by: Eryu Guan
Tested-by: Eryu Guan
Cc: Ross Zwisler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2016-03-10 07:43:42 +0800

04 Mar, 2016

1 commit

bc94b9963 Merge tag 'v4.5-rc6' into core/resources, to resolve conflict ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-03-04 19:12:08 +0800

26 Feb, 2016

1 commit

3d7b36549 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm fixes from Dan Williams:

- Two fixes for compatibility with the ACPI 6.1 specification.

Without these fixes multi-interface DIMMs will fail to be probed, and
address range scrub commands to find memory errors will give results
that the kernel will mis-interpret. For multi-interface DIMMs Linux
will accept either the original 6.0 implementation or 6.1.

For address range scrub we'll only support 6.1 since ACPI formalized
this DSM differently than the original example [1] implemented in
v4.2. The expectation is that production systems will only ever ship
the ACPI 6.1 address range scrub command definition.

- The wider async address range scrub work targeting 4.6 discovered
that the original synchronous implementation in 4.5 is not sizing its
return buffer correctly.

- Arnd caught that my recent fix to the size of the pfn_t flags missed
updating the flags variable used in the pmem driver.

- Toshi found that we mishandle the memremap() return value in
devm_memremap().

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: use 'u64' for pfn flags
devm_memremap: Fix error value when memremap failed
nfit: update address range scrub commands to the acpi 6.1 format
libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing
nfit: fix multi-interface dimm handling, acpi6.1 compatibility

Linus Torvalds
2016-02-26 10:54:53 +0800

24 Feb, 2016

1 commit

93f834df9 devm_memremap: Fix error value when memremap failed ... Browse Code »

devm_memremap() returns an ERR_PTR() value in case of error.
However, it returns NULL when memremap() failed. This causes
the caller, such as the pmem driver, to proceed and oops later.

Change devm_memremap() to return ERR_PTR(-ENXIO) when memremap()
failed.

Signed-off-by: Toshi Kani
Cc: Andrew Morton
Cc:
Reviewed-by: Ross Zwisler
Signed-off-by: Dan Williams

Toshi Kani
2016-02-24 09:17:20 +0800

19 Feb, 2016

1 commit

9273a8bbf devm_memremap_release(): fix memremap'd addr handling ... Browse Code »

The pmem driver calls devm_memremap() to map a persistent memory range.
When the pmem driver is unloaded, this memremap'd range is not released
so the kernel will leak a vma.

Fix devm_memremap_release() to handle a given memremap'd address
properly.

Signed-off-by: Toshi Kani
Acked-by: Dan Williams
Cc: Christoph Hellwig
Cc: Ross Zwisler
Cc: Matthew Wilcox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
2016-02-19 08:23:24 +0800

12 Feb, 2016

1 commit

db78c2223 mm: fix pfn_t vs highmem ... Browse Code »

The pfn_t type uses an unsigned long to store a pfn + flags value. On a
64-bit platform the upper 12 bits of an unsigned long are never used for
storing the value of a pfn. However, this is not true on highmem
platforms, all 32-bits of a pfn value are used to address a 44-bit
physical address space. A pfn_t needs to store a 64-bit value.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=112211
Fixes: 01c8f1c44b83 ("mm, dax, gpu: convert vm_insert_mixed to pfn_t")
Signed-off-by: Dan Williams
Reported-by: Stuart Foster
Reported-by: Julian Margetson
Tested-by: Julian Margetson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2016-02-12 10:35:48 +0800

01 Feb, 2016

1 commit

76e9f0ee5 phys_to_pfn_t: use phys_addr_t ... Browse Code »

A dma_addr_t is potentially smaller than a phys_addr_t on some archs.
Don't truncate the address when doing the pfn conversion.

Cc: Ross Zwisler
Reported-by: Matthew Wilcox
[willy: fix pfn_t_to_phys as well]
Signed-off-by: Dan Williams

Dan Williams
2016-02-01 01:10:19 +0800

30 Jan, 2016

2 commits

1c29f25bf memremap: Change region_intersects() to take @flags and @desc ... Browse Code »

Change region_intersects() to identify a target with @flags and
@desc, instead of @name with strcmp().

Change the callers of region_intersects(), memremap() and
devm_memremap(), to set IORESOURCE_SYSTEM_RAM in @flags and
IORES_DESC_NONE in @desc when searching System RAM.

Also, export region_intersects() so that the ACPI EINJ error
injection driver can call this function in a later patch.

Signed-off-by: Toshi Kani
Signed-off-by: Borislav Petkov
Acked-by: Dan Williams
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Jakub Sitnicki
Cc: Jan Kara
Cc: Jiang Liu
Cc: Kees Cook
Cc: Kirill A. Shutemov
Cc: Konstantin Khlebnikov
Cc: Linus Torvalds
Cc: Luis R. Rodriguez
Cc: Michal Hocko
Cc: Naoya Horiguchi
Cc: Peter Zijlstra
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Toshi Kani
Cc: Vlastimil Babka
Cc: linux-arch@vger.kernel.org
Cc: linux-mm
Link: http://lkml.kernel.org/r/1453841853-11383-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar

Toshi Kani
2016-01-30 16:49:58 +0800
eb7d78c9e devm_memremap_pages: fix vmem_altmap lifetime + alignment handling ... Browse Code »

to_vmem_altmap() needs to return valid results until
arch_remove_memory() completes. It also needs to be valid for any pfn
in a section regardless of whether that pfn maps to data. This escape
was a result of a bug in the unit test.

The signature of this bug is that free_pagetable() fails to retrieve a
vmem_altmap and goes off into the weeds:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] get_pfnblock_flags_mask+0x49/0x60
[..]
Call Trace:
[] free_hot_cold_page+0x97/0x1d0
[] __free_pages+0x2a/0x40
[] free_pagetable+0x8c/0xd4
[] remove_pagetable+0x37a/0x808
[] vmemmap_free+0x10/0x20

Fixes: 4b94ffdc4163 ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()")
Cc: Andrew Morton
Reported-by: Jeff Moyer
Signed-off-by: Dan Williams

Dan Williams
2016-01-30 13:54:04 +0800

16 Jan, 2016

5 commits

3565fce3a mm, x86: get_user_pages() for dax mappings ... Browse Code »

A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
has established a devm_memremap_pages() mapping, i.e. when the pfn_t
return from ->direct_access() has PFN_DEV and PFN_MAP set. Later, when
encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
struct dev_pagemap instance to keep the result of pfn_to_page() valid
until put_page().

Signed-off-by: Dan Williams
Tested-by: Logan Gunthorpe
Cc: Dave Hansen
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2016-01-16 09:56:32 +0800
5c2c2587b mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup ... Browse Code »

get_dev_page() enables paths like get_user_pages() to pin a dynamically
mapped pfn-range (devm_memremap_pages()) while the resulting struct page
objects are in use. Unlike get_page() it may fail if the device is, or
is in the process of being, disabled. While the initial lookup of the
range may be an expensive list walk, the result is cached to speed up
subsequent lookups which are likely to be in the same mapped range.

devm_memremap_pages() now requires a reference counter to be specified
at init time. For pmem this means moving request_queue allocation into
pmem_alloc() so the existing queue usage counter can track "device
pages".

ZONE_DEVICE pages always have an elevated count and will never be on an
lru reclaim list. That space in 'struct page' can be redirected for
other uses, but for safety introduce a poison value that will always
trip __list_add() to assert. This allows half of the struct list_head
storage to be reclaimed with some assurance to back up the assumption
that the page count never goes to zero and a list_add() is never
attempted.

Signed-off-by: Dan Williams
Tested-by: Logan Gunthorpe
Cc: Dave Hansen
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2016-01-16 09:56:32 +0800
4b94ffdc4 x86, mm: introduce vmem_altmap to augment vmemmap_populate() ... Browse Code »

In support of providing struct page for large persistent memory
capacities, use struct vmem_altmap to change the default policy for
allocating memory for the memmap array. The default vmemmap_populate()
allocates page table storage area from the page allocator. Given
persistent memory capacities relative to DRAM it may not be feasible to
store the memmap in 'System Memory'. Instead vmem_altmap represents
pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
requests.

Signed-off-by: Dan Williams
Reported-by: kbuild test robot
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2016-01-16 09:56:32 +0800
9476df7d8 mm: introduce find_dev_pagemap() ... Browse Code »

There are several scenarios where we need to retrieve and update
metadata associated with a given devm_memremap_pages() mapping, and the
only lookup key available is a pfn in the range:

1/ We want to augment vmemmap_populate() (called via arch_add_memory())
to allocate memmap storage from pre-allocated pages reserved by the
device driver. At vmemmap_alloc_block_buf() time it grabs device pages
rather than page allocator pages. This is in support of
devm_memremap_pages() mappings where the memmap is too large to fit in
main memory (i.e. large persistent memory devices).

2/ Taking a reference against the mapping when inserting device pages
into the address_space radix of a given inode. This facilitates
unmap_mapping_range() and truncate_inode_pages() operations when the
driver is tearing down the mapping.

3/ get_user_pages() operations on ZONE_DEVICE memory require taking a
reference against the mapping so that the driver teardown path can
revoke and drain usage of device pages.

Signed-off-by: Dan Williams
Tested-by: Logan Gunthorpe
Cc: Christoph Hellwig
Cc: Dave Chinner
Cc: Ross Zwisler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2016-01-16 09:56:32 +0800
34c0fd540 mm, dax, pmem: introduce pfn_t ... Browse Code »

For the purpose of communicating the optional presence of a 'struct
page' for the pfn returned from ->direct_access(), introduce a type that
encapsulates a page-frame-number plus flags. These flags contain the
historical "page_link" encoding for a scatterlist entry, but can also
denote "device memory". Where "device memory" is a set of pfns that are
not part of the kernel's linear mapping by default, but are accessed via
the same memory controller as ram.

The motivation for this new type is large capacity persistent memory
that needs struct page entries in the 'memmap' to support 3rd party DMA
(i.e. O_DIRECT I/O with a persistent memory source/target). However,
we also need it in support of maintaining a list of mapped inodes which
need to be unmapped at driver teardown or freeze_bdev() time.

Signed-off-by: Dan Williams
Cc: Christoph Hellwig
Cc: Dave Hansen
Cc: Ross Zwisler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2016-01-16 09:56:32 +0800

11 Nov, 2015

1 commit

264015f8a Merge tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"Outside of the new ACPI-NFIT hot-add support this pull request is more
notable for what it does not contain, than what it does. There were a
handful of development topics this cycle, dax get_user_pages, dax
fsync, and raw block dax, that need more more iteration and will wait
for 4.5.

The patches to make devm and the pmem driver NUMA aware have been in
-next for several weeks. The hot-add support has not, but is
contained to the NFIT driver and is passing unit tests. The coredump
support is straightforward and was looked over by Jeff. All of it has
received a 0day build success notification across 107 configs.

Summary:

- Add support for the ACPI 6.0 NFIT hot add mechanism to process
updates of the NFIT at runtime.

- Teach the coredump implementation how to filter out DAX mappings.

- Introduce NUMA hints for allocations made by the pmem driver, and
as a side effect all devm allocations now hint their NUMA node by
default"

* tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
coredump: add DAX filtering for FDPIC ELF coredumps
coredump: add DAX filtering for ELF coredumps
acpi: nfit: Add support for hot-add
nfit: in acpi_nfit_init, break on a 0-length table
pmem, memremap: convert to numa aware allocations
devm_memremap_pages: use numa_mem_id
devm: make allocations numa aware by default
devm_memremap: convert to return ERR_PTR
devm_memunmap: use devres_release()
pmem: kill memremap_pmem()
x86, mm: quiet arch_add_memory()

Linus Torvalds
2015-11-11 04:07:22 +0800

27 Oct, 2015

1 commit

182475b7a memremap: fix highmem support ... Browse Code »

Currently memremap checks if the range is "System RAM" and returns the
kernel linear address. This is broken for highmem platforms where a
range may be "System RAM", but is not part of the kernel linear mapping.
Fallback to ioremap_cache() in these cases, to let the arch code attempt
to handle it.

Note that ARM ioremap will WARN when attempting to remap ram, and in
that case the caller needs to be fixed. For this reason, existing
ioremap_cache() usages for ARM are already trained to avoid attempts to
remap ram.

The impact of this bug is low for now since the pmem driver is the only
user of memremap(), but this is important to fix before more conversions
to memremap arrive in 4.4.

Cc: Rafael J. Wysocki
Reported-by: Ard Biesheuvel
Acked-by: Ard Biesheuvel
Signed-off-by: Dan Williams

Dan Williams
2015-10-27 04:55:56 +0800

10 Oct, 2015

4 commits

538ea4aa4 pmem, memremap: convert to numa aware allocations ... Browse Code »

Given that pmem ranges come with numa-locality hints, arrange for the
resulting driver objects to be obtained from node-local memory.

Reviewed-by: Tejun Heo
Reviewed-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-10-10 05:00:33 +0800
7eff93b7c devm_memremap_pages: use numa_mem_id ... Browse Code »

Hint to closest numa node for the placement of newly allocated pages.
As that is where the device's other allocations will originate by
default when it does not specify a NUMA node.

Cc: Ross Zwisler
Reviewed-by: Tejun Heo
Reviewed-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-10-10 05:00:33 +0800
b36f47617 devm_memremap: convert to return ERR_PTR ... Browse Code »

Make devm_memremap consistent with the error return scheme of
devm_memremap_pages to remove special casing in the pmem driver.

Cc: Christoph Hellwig
Cc: Ross Zwisler
Reviewed-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-10-10 05:00:33 +0800
d741314fe devm_memunmap: use devres_release() ... Browse Code »

Remove open coded call to memunmap.

Cc: Christoph Hellwig
Cc: Ross Zwisler
Reviewed-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-10-10 05:00:33 +0800

28 Aug, 2015

1 commit

41e94a851 add devm_memremap_pages ... Browse Code »

This behaves like devm_memremap except that it ensures we have page
structures available that can back the region.

Signed-off-by: Christoph Hellwig
[djbw: catch attempts to remap RAM, drop flags]
Signed-off-by: Dan Williams

Christoph Hellwig
2015-08-28 07:40:58 +0800

15 Aug, 2015

2 commits

7d3dcf26a devres: add devm_memremap ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Dan Williams

Christoph Hellwig
2015-08-15 04:01:21 +0800
92281dee8 arch: introduce memremap() ... Browse Code »

Existing users of ioremap_cache() are mapping memory that is known in
advance to not have i/o side effects. These users are forced to cast
away the __iomem annotation, or otherwise neglect to fix the sparse
errors thrown when dereferencing pointers to this memory. Provide
memremap() as a non __iomem annotated ioremap_*() in the case when
ioremap is otherwise a pointer to cacheable memory. Empirically,
ioremap_() call sites are seeking memory-like semantics
(e.g. speculative reads, and prefetching permitted).

memremap() is a break from the ioremap implementation pattern of adding
a new memremap_() for each mapping type and having silent
compatibility fall backs. Instead, the implementation defines flags
that are passed to the central memremap() and if a mapping type is not
supported by an arch memremap returns NULL.

We introduce a memremap prototype as a trivial wrapper of
ioremap_cache() and ioremap_wt(). Later, once all ioremap_cache() and
ioremap_wt() usage has been removed from drivers we teach archs to
implement arch_memremap() with the ability to strictly enforce the
mapping type.

Cc: Arnd Bergmann
Reviewed-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-08-15 01:23:28 +0800