Eric Lee / smarc-fsl-linux-kernel

14 Oct, 2020

1 commit

1abbef4f5 mm,kmemleak-test.c: move kmemleak-test.c to samples dir ... Browse Code »

kmemleak-test.c is just a kmemleak test module, which also can not be used
as a built-in kernel module. Thus, i think it may should not be in mm
dir, and move the kmemleak-test.c to samples/kmemleak/kmemleak-test.c.
Fix the spelling of built-in by the way.

Signed-off-by: Hui Su
Signed-off-by: Andrew Morton
Cc: Catalin Marinas
Cc: Jonathan Corbet
Cc: Mauro Carvalho Chehab
Cc: David S. Miller
Cc: Rob Herring
Cc: Masahiro Yamada
Cc: Sam Ravnborg
Cc: Josh Poimboeuf
Cc: Steven Rostedt (VMware)
Cc: Miguel Ojeda
Cc: Divya Indi
Cc: Tomas Winkler
Cc: David Howells
Link: https://lkml.kernel.org/r/20200925183729.GA172837@rlk
Signed-off-by: Linus Torvalds

Hui Su
2020-10-14 09:38:27 +0800

08 Aug, 2020

1 commit

ab05eabfa mm: move lib/ioremap.c to mm/ ... Browse Code »

The functionality in lib/ioremap.c deals with pagetables, vmalloc and
caches, so it naturally belongs to mm/ Moving it there will also allow
declaring p?d_alloc_track functions in an header file inside mm/ rather
than having those declarations in include/linux/mm.h

Suggested-by: Andrew Morton
Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Cc: Abdul Haleem
Cc: Andy Lutomirski
Cc: Arnd Bergmann
Cc: Christophe Leroy
Cc: Joerg Roedel
Cc: Joerg Roedel
Cc: Max Filippov
Cc: Peter Zijlstra (Intel)
Cc: Satheesh Rajendran
Cc: Stafford Horne
Cc: Stephen Rothwell
Cc: Steven Rostedt
Cc: Geert Uytterhoeven
Cc: Matthew Wilcox
Link: http://lkml.kernel.org/r/20200627143453.31835-8-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-08-08 02:33:26 +0800

12 Jun, 2020

2 commits

b791d1bdf Merge tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull the Kernel Concurrency Sanitizer from Thomas Gleixner:
"The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector,
which relies on compile-time instrumentation, and uses a
watchpoint-based sampling approach to detect races.

The feature was under development for quite some time and has already
found legitimate bugs.

Unfortunately it comes with a limitation, which was only understood
late in the development cycle:

It requires an up to date CLANG-11 compiler

CLANG-11 is not yet released (scheduled for June), but it's the only
compiler today which handles the kernel requirements and especially
the annotations of functions to exclude them from KCSAN
instrumentation correctly.

These annotations really need to work so that low level entry code and
especially int3 text poke handling can be completely isolated.

A detailed discussion of the requirements and compiler issues can be
found here:

https://lore.kernel.org/lkml/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com/

We came to the conclusion that trying to work around compiler
limitations and bugs again would end up in a major trainwreck, so
requiring a working compiler seemed to be the best choice.

For Continous Integration purposes the compiler restriction is
manageable and that's where most xxSAN reports come from.

For a change this limitation might make GCC people actually look at
their bugs. Some issues with CSAN in GCC are 7 years old and one has
been 'fixed' 3 years ago with a half baken solution which 'solved' the
reported issue but not the underlying problem.

The KCSAN developers also ponder to use a GCC plugin to become
independent, but that's not something which will show up in a few
days.

Blocking KCSAN until wide spread compiler support is available is not
a really good alternative because the continuous growth of lockless
optimizations in the kernel demands proper tooling support"

* tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining
compiler.h: Move function attributes to compiler_types.h
compiler.h: Avoid nested statement expression in data_race()
compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE()
kcsan: Update Documentation to change supported compilers
kcsan: Remove 'noinline' from __no_kcsan_or_inline
kcsan: Pass option tsan-instrument-read-before-write to Clang
kcsan: Support distinguishing volatile accesses
kcsan: Restrict supported compilers
kcsan: Avoid inserting __tsan_func_entry/exit if possible
ubsan, kcsan: Don't combine sanitizer with kcov on clang
objtool, kcsan: Add kcsan_disable_current() and kcsan_enable_current_nowarn()
kcsan: Add __kcsan_{enable,disable}_current() variants
checkpatch: Warn about data_race() without comment
kcsan: Use GFP_ATOMIC under spin lock
Improve KCSAN documentation a bit
kcsan: Make reporting aware of KCSAN tests
kcsan: Fix function matching in report
kcsan: Change data_race() to no longer require marking racing accesses
kcsan: Move kcsan_{disable,enable}_current() to kcsan-checks.h
...

Linus Torvalds
2020-06-12 09:55:43 +0800
37d1a04b1 Rebase locking/kcsan to locking/urgent ... Browse Code »

Merge the state of the locking kcsan branch before the read/write_once()
and the atomics modifications got merged.

Squash the fallout of the rebase on top of the read/write once and atomic
fallback work into the merge. The history of the original branch is
preserved in tag locking-kcsan-2020-06-02.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2020-06-12 02:02:46 +0800

11 Jun, 2020

1 commit

9bf5b9eb2 kernel: move use_mm/unuse_mm to kthread.c ... Browse Code »

Patch series "improve use_mm / unuse_mm", v2.

This series improves the use_mm / unuse_mm interface by better documenting
the assumptions, and my taking the set_fs manipulations spread over the
callers into the core API.

This patch (of 3):

Use the proper API instead.

Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de

These helpers are only for use with kernel threads, and I will tie them
more into the kthread infrastructure going forward. Also move the
prototypes to kthread.h - mmu_context.h was a little weird to start with
as it otherwise contains very low-level MM bits.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Tested-by: Jens Axboe
Reviewed-by: Jens Axboe
Acked-by: Felix Kuehling
Cc: Alex Deucher
Cc: Al Viro
Cc: Felipe Balbi
Cc: Jason Wang
Cc: "Michael S. Tsirkin"
Cc: Zhenyu Wang
Cc: Zhi Wang
Cc: Greg Kroah-Hartman
Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-11 10:14:18 +0800

05 Jun, 2020

1 commit

399145f9e mm/debug: add tests validating architecture page table helpers ... Browse Code »

This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.

This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.

Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
via late_initcall().

This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
Any architecture, which is willing to subscribe this test will need to
select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64,
x86, s390 and powerpc platforms where the test is known to build and run
successfully Going forward, other architectures too can subscribe the test
after fixing any build or runtime problems with their page table helpers.

Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here
will be reported as an warning which would need to be fixed. This test
will help catch any changes to the agreed upon semantics expected from
generic MM and enable platforms to accommodate it thereafter.

[anshuman.khandual@arm.com: v17]
Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
[anshuman.khandual@arm.com: v18]
Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas
Signed-off-by: Anshuman Khandual
Signed-off-by: Christophe Leroy
Signed-off-by: Qian Cai
Signed-off-by: Andrew Morton
Tested-by: Gerald Schaefer [s390]
Tested-by: Christophe Leroy [ppc32]
Reviewed-by: Ingo Molnar
Cc: Mike Rapoport
Cc: Vineet Gupta
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Heiko Carstens
Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: "H. Peter Anvin"
Cc: Kirill A. Shutemov
Cc: Paul Walmsley
Cc: Palmer Dabbelt
Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

Anshuman Khandual
2020-06-05 10:06:21 +0800

13 Apr, 2020

1 commit

3b02a051d Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refresh ... Browse Code »

Resolve these conflicts:

arch/x86/Kconfig
arch/x86/kernel/Makefile

Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines
in the Kconfig to reduce the probability of future conflicts.

Signed-off-by: Ingo Molnar

Ingo Molnar
2020-04-13 15:44:39 +0800

08 Apr, 2020

1 commit

36e66c554 mm: introduce Reported pages ... Browse Code »

In order to pave the way for free page reporting in virtualized
environments we will need a way to get pages out of the free lists and
identify those pages after they have been returned. To accomplish this,
this patch adds the concept of a Reported Buddy, which is essentially
meant to just be the Uptodate flag used in conjunction with the Buddy page
type.

To prevent the reported pages from leaking outside of the buddy lists I
added a check to clear the PageReported bit in the del_page_from_free_list
function. As a result any reported page that is split, merged, or
allocated will have the flag cleared prior to the PageBuddy value being
cleared.

The process for reporting pages is fairly simple. Once we free a page
that meets the minimum order for page reporting we will schedule a worker
thread to start 2s or more in the future. That worker thread will begin
working from the lowest supported page reporting order up to MAX_ORDER - 1
pulling unreported pages from the free list and storing them in the
scatterlist.

When processing each individual free list it is necessary for the worker
thread to release the zone lock when it needs to stop and report the full
scatterlist of pages. To reduce the work of the next iteration the worker
thread will rotate the free list so that the first unreported page in the
free list becomes the first entry in the list.

It will then call a reporting function providing information on how many
entries are in the scatterlist. Once the function completes it will
return the pages to the free area from which they were allocated and start
over pulling more pages from the free areas until there are no longer
enough pages to report on to keep the worker busy, or we have processed as
many pages as were contained in the free area when we started processing
the list.

The worker thread will work in a round-robin fashion making its way though
each zone requesting reporting, and through each reportable free list
within that zone. Once all free areas within the zone have been processed
it will check to see if there have been any requests for reporting while
it was processing. If so it will reschedule the worker thread to start up
again in roughly 2s and exit.

Signed-off-by: Alexander Duyck
Signed-off-by: Andrew Morton
Acked-by: Mel Gorman
Cc: Andrea Arcangeli
Cc: Dan Williams
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Konrad Rzeszutek Wilk
Cc: Luiz Capitulino
Cc: Matthew Wilcox
Cc: Michael S. Tsirkin
Cc: Michal Hocko
Cc: Nitesh Narayan Lal
Cc: Oscar Salvador
Cc: Pankaj Gupta
Cc: Paolo Bonzini
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: Wei Wang
Cc: Yang Zhang
Cc: wei qi
Link: http://lkml.kernel.org/r/20200211224635.29318.19750.stgit@localhost.localdomain
Signed-off-by: Linus Torvalds

Alexander Duyck
2020-04-08 01:43:38 +0800

03 Apr, 2020

1 commit

5f2d5026b mm/Makefile: disable KCSAN for kmemleak ... Browse Code »

Kmemleak could scan task stacks while plain writes happens to those stack
variables which could results in data races. For example, in
sys_rt_sigaction and do_sigaction(), it could have plain writes in a
32-byte size. Since the kmemleak does not care about the actual values of
a non-pointer and all do_sigaction() call sites only copy to stack
variables, just disable KCSAN for kmemleak to avoid annotating anything
outside Kmemleak just because Kmemleak scans everything.

Suggested-by: Marco Elver
Signed-off-by: Qian Cai
Signed-off-by: Andrew Morton
Acked-by: Marco Elver
Acked-by: Catalin Marinas
Link: http://lkml.kernel.org/r/1583263716-25150-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds

Qian Cai
2020-04-03 00:35:26 +0800

21 Mar, 2020

1 commit

a4654e9bd Merge branch 'x86/kdump' into locking/kcsan, to resolve conflicts ... Browse Code »

Conflicts:
arch/x86/purgatory/Makefile

Signed-off-by: Ingo Molnar

Ingo Molnar
2020-03-21 16:24:41 +0800

04 Feb, 2020

1 commit

30d621f67 mm: add generic ptdump ... Browse Code »

Add a generic version of page table dumping that architectures can opt-in
to.

Link: http://lkml.kernel.org/r/20191218162402.45610-20-steven.price@arm.com
Signed-off-by: Steven Price
Cc: Albert Ou
Cc: Alexandre Ghiti
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christian Borntraeger
Cc: Dave Hansen
Cc: David S. Miller
Cc: Heiko Carstens
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: James Hogan
Cc: James Morse
Cc: Jerome Glisse
Cc: "Liang, Kan"
Cc: Mark Rutland
Cc: Michael Ellerman
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Russell King
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Zong Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Price
2020-02-04 11:05:25 +0800

01 Feb, 2020

1 commit

43e76af85 kcov: ignore fault-inject and stacktrace ... Browse Code »

Don't instrument 3 more files that contain debugging facilities and
produce large amounts of uninteresting coverage for every syscall.

The following snippets are sprinkled all over the place in kcov traces
in a debugging kernel. We already try to disable instrumentation of
stack unwinding code and of most debug facilities. I guess we did not
use fault-inject.c at the time, and stacktrace.c was somehow missed (or
something has changed in kernel/configs). This change both speeds up
kcov (kernel doesn't need to store these PCs, user-space doesn't need to
process them) and frees trace buffer capacity for more useful coverage.

should_fail
lib/fault-inject.c:149
fail_dump
lib/fault-inject.c:45

stack_trace_save
kernel/stacktrace.c:124
stack_trace_consume_entry
kernel/stacktrace.c:86
stack_trace_consume_entry
kernel/stacktrace.c:89
... a hundred frames skipped ...
stack_trace_consume_entry
kernel/stacktrace.c:93
stack_trace_consume_entry
kernel/stacktrace.c:86

Link: http://lkml.kernel.org/r/20200116111449.217744-1-dvyukov@gmail.com
Signed-off-by: Dmitry Vyukov
Reviewed-by: Andrey Konovalov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Vyukov
2020-02-01 02:30:41 +0800

30 Dec, 2019

1 commit

28336be56 Merge tag 'v5.5-rc4' into locking/kcsan, to resolve conflicts ... Browse Code »

Conflicts:
init/main.c
lib/Kconfig.debug

Signed-off-by: Ingo Molnar

Ingo Molnar
2019-12-30 15:10:51 +0800

16 Nov, 2019

1 commit

0ebba7141 build, kcsan: Add KCSAN build exceptions ... Browse Code »

This blacklists several compilation units from KCSAN. See the respective
inline comments for the reasoning.

Signed-off-by: Marco Elver
Acked-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Marco Elver
2019-11-16 23:23:14 +0800

06 Nov, 2019

1 commit

c5acad84c mm: Add write-protect and clean utilities for address space ranges ... Browse Code »

Add two utilities to 1) write-protect and 2) clean all ptes pointing into
a range of an address space.
The utilities are intended to aid in tracking dirty pages (either
driver-allocated system memory or pci device memory).
The write-protect utility should be used in conjunction with
page_mkwrite() and pfn_mkwrite() to trigger write page-faults on page
accesses. Typically one would want to use this on sparse accesses into
large memory regions. The clean utility should be used to utilize
hardware dirtying functionality and avoid the overhead of page-faults,
typically on large accesses into small memory regions.

Cc: Andrew Morton
Cc: Matthew Wilcox
Cc: Will Deacon
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Minchan Kim
Cc: Michal Hocko
Cc: Huang Ying
Cc: Jérôme Glisse
Cc: Kirill A. Shutemov
Signed-off-by: Thomas Hellstrom
Acked-by: Andrew Morton

Thomas Hellstrom
2019-11-06 20:03:36 +0800

25 Sep, 2019

2 commits

b57a775f5 mm: silence -Woverride-init/initializer-overrides ... Browse Code »

When compiling a kernel with W=1, there are several of those warnings due
to arm64 overriding a field on purpose. Just disable those warnings for
both GCC and Clang of this file, so it will help dig "gems" hidden in the
W=1 warnings by reducing some noises.

mm/init-mm.c:39:2: warning: initializer overrides prior initialization
of this subobject [-Winitializer-overrides]
INIT_MM_CONTEXT(init_mm)
^~~~~~~~~~~~~~~~~~~~~~~~
./arch/arm64/include/asm/mmu.h:133:9: note: expanded from macro
'INIT_MM_CONTEXT'
.pgd = init_pg_dir,
^~~~~~~~~~~
mm/init-mm.c:30:10: note: previous initialization is here
.pgd = swapper_pg_dir,
^~~~~~~~~~~~~~

Note: there is a side project trying to support explicitly allowing
specific initializer overrides in Clang, but there is no guarantee it
will happen or not.

https://github.com/ClangBuiltLinux/linux/issues/639

Link: http://lkml.kernel.org/r/1566920867-27453-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai
Cc: Nick Desaulniers
Cc: Masahiro Yamada
Cc: Mark Rutland
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qian Cai
2019-09-25 06:54:10 +0800
13224794c mm: remove quicklist page table caches ... Browse Code »

Patch series "mm: remove quicklist page table caches".

A while ago Nicholas proposed to remove quicklist page table caches [1].

I've rebased his patch on the curren upstream and switched ia64 and sh to
use generic versions of PTE allocation.

[1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com

This patch (of 3):

Remove page table allocator "quicklists". These have been around for a
long time, but have not got much traction in the last decade and are only
used on ia64 and sh architectures.

The numbers in the initial commit look interesting but probably don't
apply anymore. If anybody wants to resurrect this it's in the git
history, but it's unhelpful to have this code and divergent allocator
behaviour for minor archs.

Also it might be better to instead make more general improvements to page
allocator if this is still so slow.

Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Nicholas Piggin
Signed-off-by: Mike Rapoport
Cc: Tony Luck
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicholas Piggin
2019-09-25 06:54:09 +0800

03 Aug, 2019

1 commit

14c5cebad memremap: move from kernel/ to mm/ ... Browse Code »

memremap.c implements MM functionality for ZONE_DEVICE, so it really
should be in the mm/ directory, not the kernel/ one.

Link: http://lkml.kernel.org/r/20190722094143.18387-1-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Anshuman Khandual
Acked-by: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-08-03 22:02:01 +0800

15 Jul, 2019

1 commit

fec88ab0a Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma ... Browse Code »

Pull HMM updates from Jason Gunthorpe:
"Improvements and bug fixes for the hmm interface in the kernel:

- Improve clarity, locking and APIs related to the 'hmm mirror'
feature merged last cycle. In linux-next we now see AMDGPU and
nouveau to be using this API.

- Remove old or transitional hmm APIs. These are hold overs from the
past with no users, or APIs that existed only to manage cross tree
conflicts. There are still a few more of these cleanups that didn't
make the merge window cut off.

- Improve some core mm APIs:
- export alloc_pages_vma() for driver use
- refactor into devm_request_free_mem_region() to manage
DEVICE_PRIVATE resource reservations
- refactor duplicative driver code into the core dev_pagemap
struct

- Remove hmm wrappers of improved core mm APIs, instead have drivers
use the simplified API directly

- Remove DEVICE_PUBLIC

- Simplify the kconfig flow for the hmm users and core code"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
mm: remove the HMM config option
mm: sort out the DEVICE_PRIVATE Kconfig mess
mm: simplify ZONE_DEVICE page private data
mm: remove hmm_devmem_add
mm: remove hmm_vma_alloc_locked_page
nouveau: use devm_memremap_pages directly
nouveau: use alloc_page_vma directly
PCI/P2PDMA: use the dev_pagemap internal refcount
device-dax: use the dev_pagemap internal refcount
memremap: provide an optional internal refcount in struct dev_pagemap
memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
memremap: remove the data field in struct dev_pagemap
memremap: add a migrate_to_ram method to struct dev_pagemap_ops
memremap: lift the devmap_enable manipulation into devm_memremap_pages
memremap: pass a struct dev_pagemap to ->kill and ->cleanup
memremap: move dev_pagemap callbacks into a separate structure
memremap: validate the pagemap type passed to devm_memremap_pages
mm: factor out a devm_request_free_mem_region helper
mm: export alloc_pages_vma
...

Linus Torvalds
2019-07-15 10:42:11 +0800

13 Jul, 2019

1 commit

050a9adc6 mm: consolidate the get_user_pages* implementations ... Browse Code »

Always build mm/gup.c so that we don't have to provide separate nommu
stubs. Also merge the get_user_pages_fast and __get_user_pages_fast stubs
when HAVE_FAST_GUP into the main implementations, which will never call
the fast path if HAVE_FAST_GUP is not set.

This also ensures the new put_user_pages* helpers are available for nommu,
as those are currently missing, which would create a problem as soon as we
actually grew users for it.

Link: http://lkml.kernel.org/r/20190625143715.1689-13-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:45 +0800

03 Jul, 2019

1 commit

43535b0ae mm: remove the HMM config option ... Browse Code »

All the mm/hmm.c code is better keyed off HMM_MIRROR. Also let nouveau
depend on it instead of the mix of a dummy dependency symbol plus the
actually selected one. Drop various odd dependencies, as the code is
pretty portable.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ira Weiny
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-07-03 01:32:45 +0800

15 May, 2019

1 commit

e900a918b mm: shuffle initial free memory to improve memory-side-cache utilization ... Browse Code »

Patch series "mm: Randomize free memory", v10.

This patch (of 3):

Randomization of the page allocator improves the average utilization of
a direct-mapped memory-side-cache. Memory side caching is a platform
capability that Linux has been previously exposed to in HPC
(high-performance computing) environments on specialty platforms. In
that instance it was a smaller pool of high-bandwidth-memory relative to
higher-capacity / lower-bandwidth DRAM. Now, this capability is going
to be found on general purpose server platforms where DRAM is a cache in
front of higher latency persistent memory [1].

Robert offered an explanation of the state of the art of Linux
interactions with memory-side-caches [2], and I copy it here:

It's been a problem in the HPC space:
http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/

A kernel module called zonesort is available to try to help:
https://software.intel.com/en-us/articles/xeon-phi-software

and this abandoned patch series proposed that for the kernel:
https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com

Dan's patch series doesn't attempt to ensure buffers won't conflict, but
also reduces the chance that the buffers will. This will make performance
more consistent, albeit slower than "optimal" (which is near impossible
to attain in a general-purpose kernel). That's better than forcing
users to deploy remedies like:
"To eliminate this gradual degradation, we have added a Stream
measurement to the Node Health Check that follows each job;
nodes are rebooted whenever their measured memory bandwidth
falls below 300 GB/s."

A replacement for zonesort was merged upstream in commit cc9aec03e58f
("x86/numa_emulation: Introduce uniform split capability"). With this
numa_emulation capability, memory can be split into cache sized
("near-memory" sized) numa nodes. A bind operation to such a node, and
disabling workloads on other nodes, enables full cache performance.
However, once the workload exceeds the cache size then cache conflicts
are unavoidable. While HPC environments might be able to tolerate
time-scheduling of cache sized workloads, for general purpose server
platforms, the oversubscribed cache case will be the common case.

The worst case scenario is that a server system owner benchmarks a
workload at boot with an un-contended cache only to see that performance
degrade over time, even below the average cache performance due to
excessive conflicts. Randomization clips the peaks and fills in the
valleys of cache utilization to yield steady average performance.

Here are some performance impact details of the patches:

1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
3X speedup in a contrived case that tries to force cache conflicts.
The contrived cased used the numa_emulation capability to force an
instance of the benchmark to be run in two of the near-memory sized
numa nodes. If both instances were placed on the same emulated they
would fit and cause zero conflicts. While on separate emulated nodes
without randomization they underutilized the cache and conflicted
unnecessarily due to the in-order allocation per node.

2/ A well known Java server application benchmark was run with a heap
size that exceeded cache size by 3X. The cache conflict rate was 8%
for the first run and degraded to 21% after page allocator aging. With
randomization enabled the rate levelled out at 11%.

3/ A MongoDB workload did not observe measurable difference in
cache-conflict rates, but the overall throughput dropped by 7% with
randomization in one case.

4/ Mel Gorman ran his suite of performance workloads with randomization
enabled on platforms without a memory-side-cache and saw a mix of some
improvements and some losses [3].

While there is potentially significant improvement for applications that
depend on low latency access across a wide working-set, the performance
may be negligible to negative for other workloads. For this reason the
shuffle capability defaults to off unless a direct-mapped
memory-side-cache is detected. Even then, the page_alloc.shuffle=0
parameter can be specified to disable the randomization on those systems.

Outside of memory-side-cache utilization concerns there is potentially
security benefit from randomization. Some data exfiltration and
return-oriented-programming attacks rely on the ability to infer the
location of sensitive data objects. The kernel page allocator, especially
early in system boot, has predictable first-in-first out behavior for
physical pages. Pages are freed in physical address order when first
onlined.

Quoting Kees:
"While we already have a base-address randomization
(CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
memory layouts would certainly be using the predictability of
allocation ordering (i.e. for attacks where the base address isn't
important: only the relative positions between allocated memory).
This is common in lots of heap-style attacks. They try to gain
control over ordering by spraying allocations, etc.

I'd really like to see this because it gives us something similar
to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."

While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
caches it leaves vast bulk of memory to be predictably in order allocated.
However, it should be noted, the concrete security benefits are hard to
quantify, and no known CVE is mitigated by this randomization.

Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
are initially populated with free memory at boot and at hotplug time. Do
this based on either the presence of a page_alloc.shuffle=Y command line
parameter, or autodetection of a memory-side-cache (to be added in a
follow-on patch).

The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10,
4MB this trades off randomization granularity for time spent shuffling.
MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
while still showing memory-side cache behavior improvements, and the
expectation that the security implications of finer granularity
randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The
performance impact of the shuffling appears to be in the noise compared to
other memory initialization work.

This initial randomization can be undone over time so a follow-on patch is
introduced to inject entropy on page free decisions. It is reasonable to
ask if the page free entropy is sufficient, but it is not enough due to
the in-order initial freeing of pages. At the start of that process
putting page1 in front or behind page0 still keeps them close together,
page2 is still near page1 and has a high chance of being adjacent. As
more pages are added ordering diversity improves, but there is still high
page locality for the low address pages and this leads to no significant
impact to the cache conflict rate.

[1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
[2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
[3]: https://lkml.org/lkml/2018/10/12/309

[dan.j.williams@intel.com: fix shuffle enable]
Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
[cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams
Signed-off-by: Qian Cai
Reviewed-by: Kees Cook
Acked-by: Michal Hocko
Cc: Dave Hansen
Cc: Keith Busch
Cc: Robert Elliott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2019-05-15 10:52:48 +0800

31 Oct, 2018

3 commits

bda49a811 mm: remove nobootmem ... Browse Code »

Move a few remaining functions from nobootmem.c to memblock.c and remove
nobootmem

Link: http://lkml.kernel.org/r/1536927045-23536-28-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Acked-by: Michal Hocko
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: "James E.J. Bottomley"
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Serge Semin
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-10-31 23:54:16 +0800
aca52c398 mm: remove CONFIG_HAVE_MEMBLOCK ... Browse Code »

All architecures use memblock for early memory management. There is no need
for the CONFIG_HAVE_MEMBLOCK configuration option.

[rppt@linux.vnet.ibm.com: of/fdt: fixup #ifdefs]
Link: http://lkml.kernel.org/r/20180919103457.GA20545@rapoport-lnx
[rppt@linux.vnet.ibm.com: csky: fixups after bootmem removal]
Link: http://lkml.kernel.org/r/20180926112744.GC4628@rapoport-lnx
[rppt@linux.vnet.ibm.com: remove stale #else and the code it protects]
Link: http://lkml.kernel.org/r/1538067825-24835-1-git-send-email-rppt@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1536927045-23536-4-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Acked-by: Michal Hocko
Tested-by: Jonathan Cameron
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: "James E.J. Bottomley"
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Serge Semin
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-10-31 23:54:15 +0800
b4a991ec5 mm: remove CONFIG_NO_BOOTMEM ... Browse Code »

All achitectures select NO_BOOTMEM which essentially becomes 'Y' for any
kernel configuration and therefore it can be removed.

[alexander.h.duyck@linux.intel.com: remove now defunct NO_BOOTMEM from depends list for deferred init]
Link: http://lkml.kernel.org/r/20180925201814.3576.15105.stgit@localhost.localdomain
Link: http://lkml.kernel.org/r/1536927045-23536-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Signed-off-by: Alexander Duyck
Acked-by: Michal Hocko
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: "James E.J. Bottomley"
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Serge Semin
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-10-31 23:54:14 +0800

23 Oct, 2018

1 commit

528985117 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 updates from Catalin Marinas:
"Apart from some new arm64 features and clean-ups, this also contains
the core mmu_gather changes for tracking the levels of the page table
being cleared and a minor update to the generic
compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ.

Summary:

- Core mmu_gather changes which allow tracking the levels of
page-table being cleared together with the arm64 low-level flushing
routines

- Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to
mitigate Spectre-v4 dynamically without trapping to EL3 firmware

- Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack

- Optimise emulation of MRS instructions to ID_* registers on ARMv8.4

- Support for Common Not Private (CnP) translations allowing threads
of the same CPU to share the TLB entries

- Accelerated crc32 routines

- Move swapper_pg_dir to the rodata section

- Trap WFI instruction executed in user space

- ARM erratum 1188874 workaround (arch_timer)

- Miscellaneous fixes and clean-ups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work
arm64: cpufeature: Trap CTR_EL0 access only where it is necessary
arm64: cpufeature: Fix handling of CTR_EL0.IDC field
arm64: cpufeature: ctr: Fix cpu capability check for late CPUs
Documentation/arm64: HugeTLB page implementation
arm64: mm: Use __pa_symbol() for set_swapper_pgd()
arm64: Add silicon-errata.txt entry for ARM erratum 1188873
Revert "arm64: uaccess: implement unsafe accessors"
arm64: mm: Drop the unused cpu parameter
MAINTAINERS: fix bad sdei paths
arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines
arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c
arm64: xen: Use existing helper to check interrupt status
arm64: Use daifflag_restore after bp_hardening
arm64: daifflags: Use irqflags functions for daifflags
arm64: arch_timer: avoid unused function warning
arm64: Trap WFI executed in userspace
arm64: docs: Document SSBS HWCAP
arm64: docs: Fix typos in ELF hwcaps
arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe
...

Linus Torvalds
2018-10-23 00:30:06 +0800

07 Sep, 2018

1 commit

196d9d8bb mm/memory: Move mmu_gather and TLB invalidation code into its own file ... Browse Code »

In preparation for maintaining the mmu_gather code as its own entity,
move the implementation out of memory.c and into its own file.

Cc: "Kirill A. Shutemov"
Cc: Andrew Morton
Cc: Michal Hocko
Signed-off-by: Peter Zijlstra
Signed-off-by: Will Deacon

Peter Zijlstra
2018-09-07 22:19:25 +0800

31 Aug, 2018

1 commit

3d8f76153 vfs: implement readahead(2) using POSIX_FADV_WILLNEED ... Browse Code »

The implementation of readahead(2) syscall is identical to that of
fadvise64(POSIX_FADV_WILLNEED) with a few exceptions:
1. readahead(2) returns -EINVAL for !mapping->a_ops and fadvise64()
ignores the request and returns 0.
2. fadvise64() checks for integer overflow corner case
3. fadvise64() calls the optional filesystem fadvise() file operation

Unite the two implementations by calling vfs_fadvise() from readahead(2)
syscall. Check the !mapping->a_ops in readahead(2) syscall to preserve
documented syscall ABI behaviour.

Suggested-by: Miklos Szeredi
Fixes: d1d04ef8572b ("ovl: stack file ops")
Signed-off-by: Amir Goldstein
Signed-off-by: Miklos Szeredi

Amir Goldstein
2018-08-31 02:01:32 +0800

08 Jun, 2018

1 commit

5d752600a mm: restructure memfd code ... Browse Code »

With the addition of memfd hugetlbfs support, we now have the situation
where memfd depends on TMPFS -or- HUGETLBFS. Previously, memfd was only
supported on tmpfs, so it made sense that the code resided in shmem.c.
In the current code, memfd is only functional if TMPFS is defined. If
HUGETLFS is defined and TMPFS is not defined, then memfd functionality
will not be available for hugetlbfs. This does not cause BUGs, just a
lack of potentially desired functionality.

Code is restructured in the following way:
- include/linux/memfd.h is a new file containing memfd specific
definitions previously contained in shmem_fs.h.
- mm/memfd.c is a new file containing memfd specific code previously
contained in shmem.c.
- memfd specific code is removed from shmem_fs.h and shmem.c.
- A new config option MEMFD_CREATE is added that is defined if TMPFS
or HUGETLBFS is defined.

No functional changes are made to the code: restructuring only.

Link: http://lkml.kernel.org/r/20180415182119.4517-4-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz
Reviewed-by: Khalid Aziz
Cc: Andrea Arcangeli
Cc: David Herrmann
Cc: Hugh Dickins
Cc: Marc-Andr Lureau
Cc: Matthew Wilcox
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2018-06-08 08:34:35 +0800

06 Apr, 2018

1 commit

8e7a0c910 mm/swap_slots.c: use conditional compilation ... Browse Code »

For mm/swap_slots.c, use the traditional Linux method of conditional
compilation and linking instead of always compiling it by using #ifdef
CONFIG_SWAP and #endif for the entire source file (excluding header
files).

Link: http://lkml.kernel.org/r/c2a47015-0b5a-d0d9-8bc7-9984c049df20@infradead.org
Signed-off-by: Randy Dunlap
Acked-by: Tim Chen
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2018-04-06 12:36:24 +0800

18 Nov, 2017

1 commit

64c349f4a mm: add infrastructure for get_user_pages_fast() benchmarking ... Browse Code »

Performance of get_user_pages_fast() is critical for some workloads, but
it's tricky to test it directly.

This patch provides /sys/kernel/debug/gup_benchmark that helps with
testing performance of it.

See tools/testing/selftests/vm/gup_benchmark.c for userspace
counterpart.

Link: http://lkml.kernel.org/r/20170908215603.9189-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Cc: Shuah Khan
Cc: Ingo Molnar
Cc: Thorsten Leemhuis
Cc: Jonathan Corbet
Cc: Huang Ying
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-11-18 08:10:04 +0800

16 Nov, 2017

1 commit

4675ff05d kmemcheck: rip it out ... Browse Code »

Fix up makefiles, remove references, and git rm kmemcheck.

Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com
Signed-off-by: Sasha Levin
Cc: Steven Rostedt
Cc: Vegard Nossum
Cc: Pekka Enberg
Cc: Michal Hocko
Cc: Eric W. Biederman
Cc: Alexander Potapenko
Cc: Tim Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Levin, Alexander (Sasha Levin)
2017-11-16 10:21:05 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

09 Sep, 2017

2 commits

6b368cd4a mm/hmm: avoid bloating arch that do not make use of HMM ... Browse Code »

This moves all new code including new page migration helper behind kernel
Kconfig option so that there is no codee bloat for arch or user that do
not want to use HMM or any of its associated features.

arm allyesconfig (without all the patchset, then with and this patch):
text data bss dec hex filename
83721896 46511131 27582964 157815991 96814b7 ../without/vmlinux
83722364 46511131 27582964 157816459 968168b vmlinux

[jglisse@redhat.com: struct hmm is only use by HMM mirror functionality]
Link: http://lkml.kernel.org/r/20170825213133.27286-1-jglisse@redhat.com
[sfr@canb.auug.org.au: fix build (arm multi_v7_defconfig)]
Link: http://lkml.kernel.org/r/20170828181849.323ab81b@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170818032858.7447-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Signed-off-by: Stephen Rothwell
Cc: Dan Williams
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jérôme Glisse
2017-09-09 09:26:46 +0800
133ff0eac mm/hmm: heterogeneous memory management (HMM for short) ... Browse Code »

HMM provides 3 separate types of functionality:
- Mirroring: synchronize CPU page table and device page table
- Device memory: allocating struct page for device memory
- Migration: migrating regular memory to device memory

This patch introduces some common helpers and definitions to all of
those 3 functionality.

Link: http://lkml.kernel.org/r/20170817000548.32038-3-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Signed-off-by: Evgeny Baskakov
Signed-off-by: John Hubbard
Signed-off-by: Mark Hairgrove
Signed-off-by: Sherry Cheung
Signed-off-by: Subhash Gutti
Cc: Aneesh Kumar
Cc: Balbir Singh
Cc: Benjamin Herrenschmidt
Cc: Dan Williams
Cc: David Nellans
Cc: Johannes Weiner
Cc: Kirill A. Shutemov
Cc: Michal Hocko
Cc: Paul E. McKenney
Cc: Ross Zwisler
Cc: Vladimir Davydov
Cc: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jérôme Glisse
2017-09-09 09:26:45 +0800

21 Jun, 2017

1 commit

30a5b5367 percpu: expose statistics about percpu memory via debugfs ... Browse Code »

There is limited visibility into the use of percpu memory leaving us
unable to reason about correctness of parameters and overall use of
percpu memory. These counters and statistics aim to help understand
basic statistics about percpu memory such as number of allocations over
the lifetime, allocation sizes, and fragmentation.

New Config: PERCPU_STATS

Signed-off-by: Dennis Zhou
Signed-off-by: Tejun Heo

Dennis Zhou
2017-06-21 03:31:38 +0800

28 Feb, 2017

1 commit

2959a5f72 mm: add arch-independent testcases for RODATA ... Browse Code »

This patch makes arch-independent testcases for RODATA. Both x86 and
x86_64 already have testcases for RODATA, But they are arch-specific
because using inline assembly directly.

And cacheflush.h is not a suitable location for rodata-test related
things. Since they were in cacheflush.h, If someone change the state of
CONFIG_DEBUG_RODATA_TEST, It cause overhead of kernel build.

To solve the above issues, write arch-independent testcases and move it
to shared location.

[jinb.park7@gmail.com: fix config dependency]
Link: http://lkml.kernel.org/r/20170209131625.GA16954@pjb1027-Latitude-E5410
Link: http://lkml.kernel.org/r/20170129105436.GA9303@pjb1027-Latitude-E5410
Signed-off-by: Jinbum Park
Acked-by: Kees Cook
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: Arjan van de Ven
Cc: Laura Abbott
Cc: Russell King
Cc: Valentin Rothberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jinbum Park
2017-02-28 10:43:48 +0800

25 Feb, 2017

1 commit

ace71a19c mm: introduce page_vma_mapped_walk() ... Browse Code »

Introduce a new interface to check if a page is mapped into a vma. It
aims to address shortcomings of page_check_address{,_transhuge}.

Existing interface is not able to handle PTE-mapped THPs: it only finds
the first PTE. The rest lefted unnoticed.

page_vma_mapped_walk() iterates over all possible mapping of the page in
the vma.

Link: http://lkml.kernel.org/r/20170129173858.45174-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Hillf Danton
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Srikar Dronamraju
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-02-25 09:46:55 +0800

23 Feb, 2017

1 commit

67afa38e0 mm/swap: add cache for swap slots allocation ... Browse Code »

We add per cpu caches for swap slots that can be allocated and freed
quickly without the need to touch the swap info lock.

Two separate caches are maintained for swap slots allocated and swap
slots returned. This is to allow the swap slots to be returned to the
global pool in a batch so they will have a chance to be coaelesced with
other slots in a cluster. We do not reuse the slots that are returned
right away, as it may increase fragmentation of the slots.

The swap allocation cache is protected by a mutex as we may sleep when
searching for empty slots in cache. The swap free cache is protected by
a spin lock as we cannot sleep in the free path.

We refill the swap slots cache when we run out of slots, and we disable
the swap slots cache and drain the slots if the global number of slots
fall below a low watermark threshold. We re-enable the cache agian when
the slots available are above a high watermark.

[ying.huang@intel.com: use raw_cpu_ptr over this_cpu_ptr for swap slots access]
[tim.c.chen@linux.intel.com: add comments on locks in swap_slots.h]
Link: http://lkml.kernel.org/r/20170118180327.GA24225@linux.intel.com
Link: http://lkml.kernel.org/r/35de301a4eaa8daa2977de6e987f2c154385eb66.1484082593.git.tim.c.chen@linux.intel.com
Signed-off-by: Tim Chen
Signed-off-by: "Huang, Ying"
Reviewed-by: Michal Hocko
Cc: Aaron Lu
Cc: Andi Kleen
Cc: Andrea Arcangeli
Cc: Christian Borntraeger
Cc: Dave Hansen
Cc: Hillf Danton
Cc: Huang Ying
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Jonathan Corbet escreveu:
Cc: Kirill A. Shutemov
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Shaohua Li
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tim Chen
2017-02-23 08:41:30 +0800

13 Oct, 2016

1 commit

ef6000b4c Disable the __builtin_return_address() warning globally after all ... Browse Code »

This affectively reverts commit 377ccbb48373 ("Makefile: Mute warning
for __builtin_return_address(>0) for tracing only") because it turns out
that it really isn't tracing only - it's all over the tree.

We already also had the warning disabled separately for mm/usercopy.c
(which this commit also removes), and it turns out that we will also
want to disable it for get_lock_parent_ip(), that is used for at least
TRACE_IRQFLAGS. Which (when enabled) ends up being all over the tree.

Steven Rostedt had a patch that tried to limit it to just the config
options that actually triggered this, but quite frankly, the extra
complexity and abstraction just isn't worth it. We have never actually
had a case where the warning is actually useful, so let's just disable
it globally and not worry about it.

Acked-by: Steven Rostedt
Cc: Thomas Gleixner
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Peter Anvin
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-10-13 01:23:41 +0800