25 Sep, 2019
2 commits
-
This patch is (hopefully) the first step to enable THP for non-shmem
filesystems.This patch enables an application to put part of its text sections to THP
via madvise, for example:madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);
We tried to reuse the logic for THP on tmpfs.
Currently, write is not supported for non-shmem THP. khugepaged will only
process vma with VM_DENYWRITE. sys_mmap() ignores VM_DENYWRITE requests
(see ksys_mmap_pgoff). The only way to create vma with VM_DENYWRITE is
execve(). This requirement limits non-shmem THP to text sections.The next patch will handle writes, which would only happen when the all
the vmas with VM_DENYWRITE are unmapped.An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
feature.[songliubraving@fb.com: fix build without CONFIG_SHMEM]
Link: http://lkml.kernel.org/r/F53407FB-96CC-42E8-9862-105C92CC2B98@fb.com
[songliubraving@fb.com: fix double unlock in collapse_file()]
Link: http://lkml.kernel.org/r/B960CBFA-8EFC-4DA4-ABC5-1977FFF2CA57@fb.com
Link: http://lkml.kernel.org/r/20190801184244.3169074-7-songliubraving@fb.com
Signed-off-by: Song Liu
Acked-by: Rik van Riel
Acked-by: Kirill A. Shutemov
Acked-by: Johannes Weiner
Cc: Stephen Rothwell
Cc: Dan Carpenter
Cc: Hillf Danton
Cc: Hugh Dickins
Cc: William Kucharski
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "mm: remove quicklist page table caches".
A while ago Nicholas proposed to remove quicklist page table caches [1].
I've rebased his patch on the curren upstream and switched ia64 and sh to
use generic versions of PTE allocation.[1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com
This patch (of 3):
Remove page table allocator "quicklists". These have been around for a
long time, but have not got much traction in the last decade and are only
used on ia64 and sh architectures.The numbers in the initial commit look interesting but probably don't
apply anymore. If anybody wants to resurrect this it's in the git
history, but it's unhelpful to have this code and divergent allocator
behaviour for minor archs.Also it might be better to instead make more general improvements to page
allocator if this is still so slow.Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Nicholas Piggin
Signed-off-by: Mike Rapoport
Cc: Tony Luck
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Aug, 2019
1 commit
-
CONFIG_MIGRATE_VMA_HELPER guards helpers that are required for proper
devic private memory support. Remove the option and just check for
CONFIG_DEVICE_PRIVATE instead.Link: https://lore.kernel.org/r/20190814075928.23766-11-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Tested-by: Ralph Campbell
Signed-off-by: Jason Gunthorpe
08 Aug, 2019
2 commits
-
Make HMM_MIRROR an option that is selected by drivers wanting to use it
instead of a user visible option as it is just a low-level implementation
detail.Link: https://lore.kernel.org/r/20190806160554.14046-15-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
There isn't really any architecture specific code in this page table walk
implementation, so drop the dependencies.Link: https://lore.kernel.org/r/20190806160554.14046-14-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
17 Jul, 2019
1 commit
-
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
with the long-out-of-date comment can lead to the impression than an
architecture may just enable it (since __add_pages() now "comprehends
device memory" for itself) and expect things to work.In practice, however, ZONE_DEVICE users have little chance of
functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
dependency so the real situation is clearer.Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy
Acked-by: Dan Williams
Reviewed-by: Ira Weiny
Acked-by: Oliver O'Halloran
Reviewed-by: Anshuman Khandual
Cc: Michael Ellerman
Cc: Catalin Marinas
Cc: David Hildenbrand
Cc: Jerome Glisse
Cc: Michal Hocko
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Jul, 2019
1 commit
-
Pull HMM updates from Jason Gunthorpe:
"Improvements and bug fixes for the hmm interface in the kernel:- Improve clarity, locking and APIs related to the 'hmm mirror'
feature merged last cycle. In linux-next we now see AMDGPU and
nouveau to be using this API.- Remove old or transitional hmm APIs. These are hold overs from the
past with no users, or APIs that existed only to manage cross tree
conflicts. There are still a few more of these cleanups that didn't
make the merge window cut off.- Improve some core mm APIs:
- export alloc_pages_vma() for driver use
- refactor into devm_request_free_mem_region() to manage
DEVICE_PRIVATE resource reservations
- refactor duplicative driver code into the core dev_pagemap
struct- Remove hmm wrappers of improved core mm APIs, instead have drivers
use the simplified API directly- Remove DEVICE_PUBLIC
- Simplify the kconfig flow for the hmm users and core code"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
mm: remove the HMM config option
mm: sort out the DEVICE_PRIVATE Kconfig mess
mm: simplify ZONE_DEVICE page private data
mm: remove hmm_devmem_add
mm: remove hmm_vma_alloc_locked_page
nouveau: use devm_memremap_pages directly
nouveau: use alloc_page_vma directly
PCI/P2PDMA: use the dev_pagemap internal refcount
device-dax: use the dev_pagemap internal refcount
memremap: provide an optional internal refcount in struct dev_pagemap
memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
memremap: remove the data field in struct dev_pagemap
memremap: add a migrate_to_ram method to struct dev_pagemap_ops
memremap: lift the devmap_enable manipulation into devm_memremap_pages
memremap: pass a struct dev_pagemap to ->kill and ->cleanup
memremap: move dev_pagemap callbacks into a separate structure
memremap: validate the pagemap type passed to devm_memremap_pages
mm: factor out a devm_request_free_mem_region helper
mm: export alloc_pages_vma
...
13 Jul, 2019
4 commits
-
While only powerpc supports the hugepd case, the code is pretty generic
and I'd like to keep all GUP internals in one place.Link: http://lkml.kernel.org/r/20190625143715.1689-15-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Always build mm/gup.c so that we don't have to provide separate nommu
stubs. Also merge the get_user_pages_fast and __get_user_pages_fast stubs
when HAVE_FAST_GUP into the main implementations, which will never call
the fast path if HAVE_FAST_GUP is not set.This also ensures the new put_user_pages* helpers are available for nommu,
as those are currently missing, which would create a problem as soon as we
actually grew users for it.Link: http://lkml.kernel.org/r/20190625143715.1689-13-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We only support the generic GUP now, so rename the config option to
be more clear, and always use the mm/Kconfig definition of the
symbol and select it from the arch Kconfigs.Link: http://lkml.kernel.org/r/20190625143715.1689-11-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Khalid Aziz
Reviewed-by: Jason Gunthorpe
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The split low/high access is the only non-READ_ONCE version of gup_get_pte
that did show up in the various arch implemenations. Lift it to common
code and drop the ifdef based arch override.Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Jul, 2019
4 commits
-
The migrate_vma helper is only used by noveau to migrate device private
pages around. Other HMM_MIRROR users like amdgpu or infiniband don't
need it.Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
All the mm/hmm.c code is better keyed off HMM_MIRROR. Also let nouveau
depend on it instead of the mix of a dummy dependency symbol plus the
actually selected one. Drop various odd dependencies, as the code is
pretty portable.Signed-off-by: Christoph Hellwig
Reviewed-by: Ira Weiny
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
The ZONE_DEVICE support doesn't depend on anything HMM related, just on
various bits of arch support as indicated by the architecture. Also
don't select the option from nouveau as it isn't present in many setups,
and depend on it instead.Signed-off-by: Christoph Hellwig
Reviewed-by: Ira Weiny
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
The code hasn't been used since it was added to the tree, and doesn't
appear to actually be usable.Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Acked-by: Michal Hocko
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe
02 Jul, 2019
1 commit
-
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe
15 Jun, 2019
1 commit
-
We need to pick up post-rc1 changes to various document files so they don't
get lost in Mauro's massive RST conversion push.
09 Jun, 2019
1 commit
-
Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.Signed-off-by: Mauro Carvalho Chehab
Reviewed-by: Wolfram Sang
Reviewed-by: Sven Van Asbroeck
Reviewed-by: Bhupesh Sharma
Acked-by: Mark Brown
Signed-off-by: Jonathan Corbet
21 May, 2019
1 commit
-
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:GPL-2.0-only
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
15 May, 2019
6 commits
-
The help describing the memory model selection is outdated. It still says
that SPARSEMEM is experimental and DISCONTIGMEM is a preferred over
SPARSEMEM.Update the help text for the relevant options:
* add a generic help for the "Memory Model" prompt
* add description for FLATMEM
* reduce the description of DISCONTIGMEM and add a deprecation note
* prefer SPARSEMEM over DISCONTIGMEMLink: http://lkml.kernel.org/r/1556188531-20728-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Most architectures do not need the memblock memory after the page
allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the
arch Kconfig.Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the
logic makes it clear which architectures actually use memblock after
system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK
to the architectures that are still missing that option.Link: http://lkml.kernel.org/r/1556102150-32517-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Acked-by: Michael Ellerman (powerpc)
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Richard Kuo
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Ralf Baechle
Cc: Paul Burton
Cc: James Hogan
Cc: Ley Foon Tan
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Yoshinori Sato
Cc: Rich Felker
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: "H. Peter Anvin"
Cc: Eric Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add 2 new Kconfig variables that are not used by anyone. I check that
various make ARCH=somearch allmodconfig do work and do not complain. This
new Kconfig needs to be added first so that device drivers that depend on
HMM can be updated.Once drivers are updated then I can update the HMM Kconfig to depend on
this new Kconfig in a followup patch.This is about solving Kconfig for HMM given that device driver are
going through their own tree we want to avoid changing them from the mm
tree. So plan is:1 - Kernel release N add the new Kconfig to mm/Kconfig (this patch)
2 - Kernel release N+1 update driver to depend on new Kconfig ie
stop using ARCH_HASH_HMM and start using ARCH_HAS_HMM_MIRROR
and ARCH_HAS_HMM_DEVICE (one or the other or both depending
on the driver)
3 - Kernel release N+2 remove ARCH_HASH_HMM and do final Kconfig
update in mm/KconfigLink: http://lkml.kernel.org/r/20190417211141.17580-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Cc: Guenter Roeck
Cc: Leon Romanovsky
Cc: Jason Gunthorpe
Cc: Ralph Campbell
Cc: John Hubbard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
'default n' is the default value for any bool or tristate Kconfig
setting so there is no need to write it explicitly.Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
is not set' for visible symbols") the Kconfig behavior is the same
regardless of 'default n' being present or not:...
One side effect of (and the main motivation for) this change is making
the following two definitions behave exactly the same:config FOO
boolconfig FOO
bool
default nWith this change, neither of these will generate a
'# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
That might make it clearer to people that a bare 'default n' is
redundant.
...Link: http://lkml.kernel.org/r/c3385916-e4d4-37d3-b330-e6b7dff83a52@samsung.com
Signed-off-by: Bartlomiej Zolnierkiewicz
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
To avoid random config build issue, select mmu notifier when HMM is
selected. In any cases when HMM get selected it will be by users that
will also wants the mmu notifier.Link: http://lkml.kernel.org/r/20190403193318.16478-2-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Acked-by: Balbir Singh
Cc: Ralph Campbell
Cc: John Hubbard
Cc: Dan Williams
Cc: Arnd Bergmann
Cc: Dan Carpenter
Cc: Ira Weiny
Cc: Matthew Wilcox
Cc: Souptick Joarder
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This condition allows to define alloc_contig_range, so simplify it into a
more accurate naming.Link: http://lkml.kernel.org/r/20190327063626.18421-4-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti
Suggested-by: Vlastimil Babka
Acked-by: Vlastimil Babka
Cc: Andy Lutomirsky
Cc: Aneesh Kumar K.V
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David S. Miller
Cc: Heiko Carstens
Cc: "H . Peter Anvin"
Cc: Ingo Molnar
Cc: Martin Schwidefsky
Cc: Michael Ellerman
Cc: Mike Kravetz
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Rich Felker
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Dec, 2018
1 commit
-
Replace jhash2 with xxhash.
Perf numbers:
Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
ksm: crc32c hash() 12081 MB/s
ksm: xxh64 hash() 8770 MB/s
ksm: xxh32 hash() 4529 MB/s
ksm: jhash2 hash() 1569 MB/sSioh Lee did some testing:
crc32c_intel: 1084.10ns
crc32c (no hardware acceleration): 7012.51ns
xxhash32: 2227.75ns
xxhash64: 1413.16ns
jhash2: 5128.30nsAs jhash2 always will be slower (for data size like PAGE_SIZE). Don't use
it in ksm at all.Use only xxhash for now, because for using crc32c, cryptoapi must be
initialized first - that requires some tricky solution to work well in all
situations.Link: http://lkml.kernel.org/r/20181023182554.23464-3-nefelim4ag@gmail.com
Signed-off-by: Timofey Titovets
Signed-off-by: leesioh
Reviewed-by: Pavel Tatashin
Reviewed-by: Mike Rapoport
Reviewed-by: Andrew Morton
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Oct, 2018
2 commits
-
All architecures use memblock for early memory management. There is no need
for the CONFIG_HAVE_MEMBLOCK configuration option.[rppt@linux.vnet.ibm.com: of/fdt: fixup #ifdefs]
Link: http://lkml.kernel.org/r/20180919103457.GA20545@rapoport-lnx
[rppt@linux.vnet.ibm.com: csky: fixups after bootmem removal]
Link: http://lkml.kernel.org/r/20180926112744.GC4628@rapoport-lnx
[rppt@linux.vnet.ibm.com: remove stale #else and the code it protects]
Link: http://lkml.kernel.org/r/1538067825-24835-1-git-send-email-rppt@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1536927045-23536-4-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Acked-by: Michal Hocko
Tested-by: Jonathan Cameron
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: "James E.J. Bottomley"
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Serge Semin
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
All achitectures select NO_BOOTMEM which essentially becomes 'Y' for any
kernel configuration and therefore it can be removed.[alexander.h.duyck@linux.intel.com: remove now defunct NO_BOOTMEM from depends list for deferred init]
Link: http://lkml.kernel.org/r/20180925201814.3576.15105.stgit@localhost.localdomain
Link: http://lkml.kernel.org/r/1536927045-23536-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Signed-off-by: Alexander Duyck
Acked-by: Michal Hocko
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: "James E.J. Bottomley"
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Serge Semin
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Oct, 2018
1 commit
-
All users have now been converted to the XArray. Removing the support
reduces code size and ensures new users will use the XArray instead.Signed-off-by: Matthew Wilcox
21 Sep, 2018
1 commit
-
Deferred struct page init is needed only on systems with large amount of
physical memory to improve boot performance. 32-bit systems do not
benefit from this feature.Jiri reported a problem where deferred struct pages do not work well with
x86-32:[ 0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[ 0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.036269] Initializing CPU#0
[ 0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
[ 0.038459] page:f6780000 is uninitialized and poisoned
[ 0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
[ 0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
[ 0.040038] ------------[ cut here ]------------
[ 0.040399] kernel BUG at include/linux/page-flags.h:293!
[ 0.040823] invalid opcode: 0000 [#1] SMP PTI
[ 0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
[ 0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[ 0.042496] EIP: free_highmem_page+0x64/0x80
[ 0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
[ 0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
[ 0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
[ 0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
[ 0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
[ 0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 0.046913] DR6: fffe0ff0 DR7: 00000400
[ 0.047220] Call Trace:
[ 0.047419] add_highpages_with_active_regions+0xbd/0x10d
[ 0.047854] set_highmem_pages_init+0x5b/0x71
[ 0.048202] mem_init+0x2b/0x1e8
[ 0.048460] start_kernel+0x1d2/0x425
[ 0.048757] i386_start_kernel+0x93/0x97
[ 0.049073] startup_32_smp+0x164/0x168
[ 0.049379] Modules linked in:
[ 0.049626] ---[ end trace 337949378db0abbb ]---We free highmem pages before their struct pages are initialized:
mem_init()
set_highmem_pages_init()
add_highpages_with_active_regions()
free_highmem_page()
.. Access uninitialized struct page here..Because there is no reason to have this feature on 32-bit systems, just
disable it.Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
Fixes: 2e3ca40f03bb ("mm: relax deferred struct page requirements")
Signed-off-by: Pavel Tatashin
Reported-by: Jiri Slaby
Acked-by: Michal Hocko
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman
18 Aug, 2018
3 commits
-
CONFIG_THP_SWAP should depend on CONFIG_SWAP, because it's unreasonable
to optimize swapping for THP (Transparent Huge Page) without basic
swapping support.In original code, when CONFIG_SWAP=n and CONFIG_THP_SWAP=y,
split_swap_cluster() will not be built because it is in swapfile.c, but
it will be called in huge_memory.c. This doesn't trigger a build error
in practice because the call site is enclosed by PageSwapCache(), which
is defined to be constant 0 when CONFIG_SWAP=n. But this is fragile and
should be fixed.The comments are fixed too to reflect the latest progress.
Link: http://lkml.kernel.org/r/20180713021228.439-1-ying.huang@intel.com
Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: "Huang, Ying"
Reviewed-by: Dan Williams
Reviewed-by: Naoya Horiguchi
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Shaohua Li
Cc: Hugh Dickins
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Dave Hansen
Cc: Zi Yan
Cc: Daniel Jordan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Rename new_sparse_init() to sparse_init() which enables it. Delete old
sparse_init() and all the code that became obsolete with.[pasha.tatashin@oracle.com: remove unused sparse_mem_maps_populate_node()]
Link: http://lkml.kernel.org/r/20180716174447.14529-6-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180712203730.8703-6-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin
Tested-by: Michael Ellerman [powerpc]
Tested-by: Oscar Salvador
Reviewed-by: Oscar Salvador
Cc: Pasha Tatashin
Cc: Abdul Haleem
Cc: Baoquan He
Cc: Daniel Jordan
Cc: Dan Williams
Cc: Dave Hansen
Cc: David Rientjes
Cc: Greg Kroah-Hartman
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jérôme Glisse
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Souptick Joarder
Cc: Steven Sistare
Cc: Vlastimil Babka
Cc: Wei Yang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The deferred memory initialization relies on section definitions, e.g
PAGES_PER_SECTION, that are only available when CONFIG_SPARSEMEM=y on
most architectures.Initially DEFERRED_STRUCT_PAGE_INIT depended on explicit
ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT configuration option, but since
the commit 2e3ca40f03bb13709df4 ("mm: relax deferred struct page
requirements") this requirement was relaxed and now it is possible to
enable DEFERRED_STRUCT_PAGE_INIT on architectures that support
DISCONTINGMEM and NO_BOOTMEM which causes build failures.For instance, setting SMP=y and DEFERRED_STRUCT_PAGE_INIT=y on arc
causes the following build failure:CC mm/page_alloc.o
mm/page_alloc.c: In function 'update_defer_init':
mm/page_alloc.c:321:14: error: 'PAGES_PER_SECTION'
undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
(pfn & (PAGES_PER_SECTION - 1)) == 0) {
^~~~~~~~~~~~~~~~~
USEC_PER_SEC
mm/page_alloc.c:321:14: note: each undeclared identifier is reported only once for each function it appears in
In file included from include/linux/cache.h:5:0,
from include/linux/printk.h:9,
from include/linux/kernel.h:14,
from include/asm-generic/bug.h:18,
from arch/arc/include/asm/bug.h:32,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:9,
from mm/page_alloc.c:18:
mm/page_alloc.c: In function 'deferred_grow_zone':
mm/page_alloc.c:1624:52: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
^
include/uapi/linux/kernel.h:11:47: note: in definition of macro '__ALIGN_KERNEL_MASK'
#define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
^~~~
include/linux/kernel.h:58:22: note: in expansion of macro '__ALIGN_KERNEL'
#define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
^~~~~~~~~~~~~~
mm/page_alloc.c:1624:34: note: in expansion of macro 'ALIGN'
unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
^~~~~
In file included from include/asm-generic/bug.h:18:0,
from arch/arc/include/asm/bug.h:32,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:9,
from mm/page_alloc.c:18:
mm/page_alloc.c: In function 'free_area_init_node':
mm/page_alloc.c:6379:50: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
^
include/linux/kernel.h:812:22: note: in definition of macro '__typecheck'
(!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
^
include/linux/kernel.h:836:24: note: in expansion of macro '__safe_cmp'
__builtin_choose_expr(__safe_cmp(x, y), \
^~~~~~~~~~
include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
^~~~~
include/linux/kernel.h:836:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
^~~~~
scripts/Makefile.build:317: recipe for target 'mm/page_alloc.o' failedLet's make the DEFERRED_STRUCT_PAGE_INIT explicitly depend on SPARSEMEM
as the systems that support DISCONTIGMEM do not seem to have that huge
amounts of memory that would make DEFERRED_STRUCT_PAGE_INIT relevant.Link: http://lkml.kernel.org/r/1530279308-24988-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Acked-by: Michal Hocko
Reviewed-by: Pavel Tatashin
Tested-by: Randy Dunlap
Cc: Pasha Tatashin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Aug, 2018
1 commit
-
This moves all the options under a proper menu.
Based on a patch from Randy Dunlap.
Signed-off-by: Christoph Hellwig
Tested-by: Randy Dunlap
Acked-by: Randy Dunlap
Signed-off-by: Masahiro Yamada
09 Jun, 2018
2 commits
-
Pull libnvdimm updates from Dan Williams:
"This adds a user for the new 'bytes-remaining' updates to
memcpy_mcsafe() that you already received through Ingo via the
x86-dax- for-linus pull.Not included here, but still targeting this cycle, is support for
handling memory media errors (poison) consumed via userspace dax
mappings.Summary:
- DAX broke a fundamental assumption of truncate of file mapped
pages. The truncate path assumed that it is safe to disconnect a
pinned page from a file and let the filesystem reclaim the physical
block. With DAX the page is equivalent to the filesystem block.
Introduce dax_layout_busy_page() to enable filesystems to wait for
pinned DAX pages to be released. Without this wait a filesystem
could allocate blocks under active device-DMA to a new file.- DAX arranges for the block layer to be bypassed and uses
dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
However, the memcpy_mcsafe() facility is available through the pmem
block driver. In order to safely handle media errors, via the DAX
block-layer bypass, introduce copy_to_iter_mcsafe().- Fix cache management policy relative to the ACPI NFIT Platform
Capabilities Structure to properly elide cache flushes when they
are not necessary. The table indicates whether CPU caches are
power-fail protected. Clarify that a deep flush is always performed
on REQ_{FUA,PREFLUSH} requests"* tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
dax: Use dax_write_cache* helpers
libnvdimm, pmem: Do not flush power-fail protected CPU caches
libnvdimm, pmem: Unconditionally deep flush on *sync
libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH
acpi, nfit: Remove ecc_unit_size
dax: dax_insert_mapping_entry always succeeds
libnvdimm, e820: Register all pmem resources
libnvdimm: Debug probe times
linvdimm, pmem: Preserve read-only setting for pmem devices
x86, nfit_test: Add unit test for memcpy_mcsafe()
pmem: Switch to copy_to_iter_mcsafe()
dax: Report bytes remaining in dax_iomap_actor()
dax: Introduce a ->copy_to_iter dax operation
uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
xfs, dax: introduce xfs_break_dax_layouts()
xfs: prepare xfs_break_layouts() for another layout type
xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
mm, fs, dax: handle layout changes to pinned dax mappings
mm: fix __gup_device_huge vs unmap
mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
...
08 Jun, 2018
1 commit
-
Currently the PTE special supports is turned on in per architecture
header files. Most of the time, it is defined in
arch/*/include/asm/pgtable.h depending or not on some other per
architecture static definition.This patch introduce a new configuration variable to manage this
directly in the Kconfig files. It would later replace
__HAVE_ARCH_PTE_SPECIAL.Here notes for some architecture where the definition of
__HAVE_ARCH_PTE_SPECIAL is not obvious:arm
__HAVE_ARCH_PTE_SPECIAL which is currently defined in
arch/arm/include/asm/pgtable-3level.h which is included by
arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.powerpc
__HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
- arch/powerpc/include/asm/book3s/64/pgtable.h
- arch/powerpc/include/asm/pte-common.h
The first one is included if (PPC_BOOK3S & PPC64) while the second is
included in all the other cases.
So select ARCH_HAS_PTE_SPECIAL all the time.sparc:
__HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
defined(__arch64__) which are defined through the compiler in
sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
So select ARCH_HAS_PTE_SPECIAL if SPARC64There is no functional change introduced by this patch.
Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.com
Signed-off-by: Laurent Dufour
Suggested-by: Jerome Glisse
Reviewed-by: Jerome Glisse
Acked-by: David Rientjes
Cc: Michal Hocko
Cc: "Aneesh Kumar K . V"
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Jonathan Corbet
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Yoshinori Sato
Cc: Rich Felker
Cc: David S. Miller
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Vineet Gupta
Cc: Palmer Dabbelt
Cc: Albert Ou
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: David Rientjes
Cc: Robin Murphy
Cc: Christophe LEROY
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Jun, 2018
2 commits
-
Pull documentation updates from Jonathan Corbet:
"There's been a fair amount of work in the docs tree this time around,
including:- Extensive RST conversions and organizational work in the
memory-management docs thanks to Mike Rapoport.- An update of Documentation/features from Andrea Parri and a script
to keep it updated.- Various LICENSES updates from Thomas, along with a script to check
SPDX tags.- Work to fix dangling references to documentation files; this
involved a fair number of one-liner comment changes outside of
Documentation/... and the usual list of documentation improvements, typo fixes, etc"
* tag 'docs-4.18' of git://git.lwn.net/linux: (103 commits)
Documentation: document hung_task_panic kernel parameter
docs/admin-guide/mm: add high level concepts overview
docs/vm: move ksm and transhuge from "user" to "internals" section.
docs: Use the kerneldoc comments for memalloc_no*()
doc: document scope NOFS, NOIO APIs
docs: update kernel versions and dates in tables
docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge
docs/vm: transhuge: minor updates
docs/vm: transhuge: change sections order
Documentation: arm: clean up Marvell Berlin family info
Documentation: gpio: driver: Fix a typo and some odd grammar
docs: ranoops.rst: fix location of ramoops.txt
scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode
docs: uio-howto.rst: use a code block to solve a warning
mm, THP, doc: Add document for thp_swpout/thp_swpout_fallback
w1: w1_io.c: fix a kernel-doc warning
Documentation/process/posting: wrap text at 80 cols
docs: admin-guide: add cgroup-v2 documentation
Revert "Documentation/features/vm: Remove arch support status file for 'pte_special'"
Documentation: refcount-vs-atomic: Update reference to LKMM doc.
... -
Pull dma-mapping updates from Christoph Hellwig:
- replace the force_dma flag with a dma_configure bus method. (Nipun
Gupta, although one patch is іncorrectly attributed to me due to a
git rebase bug)- use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)
- remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
right thing for bounce buffering.- move dma-debug initialization to common code, and apply a few
cleanups to the dma-debug code.- cleanup the Kconfig mess around swiotlb selection
- swiotlb comment fixup (Yisheng Xie)
- a trivial swiotlb fix. (Dan Carpenter)
- support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)
- add a new generic dma-noncoherent dma_map_ops implementation and use
it for arc, c6x and nds32.- improve scatterlist validity checking in dma-debug. (Robin Murphy)
- add a struct device quirk to limit the dma-mask to 32-bit due to
bridge/system issues, and switch x86 to use it instead of a local
hack for VIA bridges.- handle devices without a dma_mask more gracefully in the dma-direct
code.* tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits)
dma-direct: don't crash on device without dma_mask
nds32: use generic dma_noncoherent_ops
nds32: implement the unmap_sg DMA operation
nds32: consolidate DMA cache maintainance routines
x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag
x86/pci-dma: remove the explicit nodac and allowdac option
x86/pci-dma: remove the experimental forcesac boot option
Documentation/x86: remove a stray reference to pci-nommu.c
core, dma-direct: add a flag 32-bit dma limits
dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs
dma-debug: check scatterlist segments
c6x: use generic dma_noncoherent_ops
arc: use generic dma_noncoherent_ops
arc: fix arc_dma_{map,unmap}_page
arc: fix arc_dma_sync_sg_for_{cpu,device}
arc: simplify arc_dma_sync_single_for_{cpu,device}
dma-mapping: provide a generic dma-noncoherent implementation
dma-mapping: simplify Kconfig dependencies
riscv: add swiotlb support
riscv: only enable ZONE_DMA32 for 64-bit
...
22 May, 2018
1 commit
-
In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
be able to rely on the fact that they will get wakeups on dev_pagemap
page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
generic_dax_page_free() as common indicator / infrastructure for dax
filesytems to require. With this change there are no users of the
MEMORY_DEVICE_HOST designation, so remove it.The HMM sub-system extended dev_pagemap to arrange a callback when a
dev_pagemap managed page is freed. Since a dev_pagemap page is free /
idle when its reference count is 1 it requires an additional branch to
check the page-type at put_page() time. Given put_page() is a hot-path
we do not want to incur that check if HMM is not in use, so a static
branch is used to avoid that overhead when not necessary.Now, the FS_DAX implementation wants to reuse this mechanism for
receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
static-key into a generic mechanism that either HMM or FS_DAX code paths
can enable.For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
However, we still need to support FS_DAX in the FS_DAX_LIMITED case
implemented by the s390/dcssblk driver.Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Michal Hocko
Reported-by: kbuild test robot
Reported-by: Thomas Meyer
Reported-by: Dave Jiang
Cc: "Jérôme Glisse"
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Signed-off-by: Dan Williams