Eric Lee / smarc-fsl-linux-kernel

17 Jun, 2020

1 commit

0b11ec4ae mm: add kvfree_sensitive() for freeing sensitive data objects ... Browse Code »

[ Upstream commit d4eaa2837851db2bfed572898bfc17f9a9f9151e ]

For kvmalloc'ed data object that contains sensitive information like
cryptographic keys, we need to make sure that the buffer is always cleared
before freeing it. Using memset() alone for buffer clearing may not
provide certainty as the compiler may compile it away. To be sure, the
special memzero_explicit() has to be used.

This patch introduces a new kvfree_sensitive() for freeing those sensitive
data objects allocated by kvmalloc(). The relevant places where
kvfree_sensitive() can be used are modified to use it.

Fixes: 4f0882491a14 ("KEYS: Avoid false positive ENOMEM error on key read")
Suggested-by: Linus Torvalds
Signed-off-by: Waiman Long
Signed-off-by: Andrew Morton
Reviewed-by: Eric Biggers
Acked-by: David Howells
Cc: Jarkko Sakkinen
Cc: James Morris
Cc: "Serge E. Hallyn"
Cc: Joe Perches
Cc: Matthew Wilcox
Cc: David Rientjes
Cc: Uladzislau Rezki
Link: http://lkml.kernel.org/r/20200407200318.11711-1-longman@redhat.com
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Waiman Long
2020-06-17 22:40:23 +0800

25 Sep, 2019

5 commits

e7142bf5d arm64, mm: make randomization selected by generic topdown mmap layout ... Browse Code »

This commits selects ARCH_HAS_ELF_RANDOMIZE when an arch uses the generic
topdown mmap layout functions so that this security feature is on by
default.

Note that this commit also removes the possibility for arm64 to have elf
randomization and no MMU: without MMU, the security added by randomization
is worth nothing.

Link: http://lkml.kernel.org/r/20190730055113.23635-6-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti
Acked-by: Catalin Marinas
Acked-by: Kees Cook
Reviewed-by: Christoph Hellwig
Reviewed-by: Luis Chamberlain
Cc: Albert Ou
Cc: Alexander Viro
Cc: Christoph Hellwig
Cc: James Hogan
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Ralf Baechle
Cc: Russell King
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Ghiti
2019-09-25 06:54:11 +0800
67f3977f8 arm64, mm: move generic mmap layout functions to mm ... Browse Code »

arm64 handles top-down mmap layout in a way that can be easily reused by
other architectures, so make it available in mm. It then introduces a new
config ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT that can be set by other
architectures to benefit from those functions. Note that this new config
depends on MMU being enabled, if selected without MMU support, a warning
will be thrown.

Link: http://lkml.kernel.org/r/20190730055113.23635-5-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti
Suggested-by: Christoph Hellwig
Acked-by: Catalin Marinas
Acked-by: Kees Cook
Reviewed-by: Christoph Hellwig
Reviewed-by: Luis Chamberlain
Cc: Albert Ou
Cc: Alexander Viro
Cc: James Hogan
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Ralf Baechle
Cc: Russell King
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Ghiti
2019-09-25 06:54:11 +0800
649775be6 mm, fs: move randomize_stack_top from fs to mm ... Browse Code »

Patch series "Provide generic top-down mmap layout functions", v6.

This series introduces generic functions to make top-down mmap layout
easily accessible to architectures, in particular riscv which was the
initial goal of this series. The generic implementation was taken from
arm64 and used successively by arm, mips and finally riscv.

Note that in addition the series fixes 2 issues:

- stack randomization was taken into account even if not necessary.

- [1] fixed an issue with mmap base which did not take into account
randomization but did not report it to arm and mips, so by moving arm64
into a generic library, this problem is now fixed for both
architectures.

This work is an effort to factorize architecture functions to avoid code
duplication and oversights as in [1].

[1]: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1429066.html

This patch (of 14):

This preparatory commit moves this function so that further introduction
of generic topdown mmap layout is contained only in mm/util.c.

Link: http://lkml.kernel.org/r/20190730055113.23635-2-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti
Acked-by: Kees Cook
Reviewed-by: Christoph Hellwig
Reviewed-by: Luis Chamberlain
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Ralf Baechle
Cc: Paul Burton
Cc: James Hogan
Cc: Palmer Dabbelt
Cc: Albert Ou
Cc: Alexander Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Ghiti
2019-09-25 06:54:11 +0800
010c164a5 mm: move memcmp_pages() and pages_identical() ... Browse Code »

Patch series "THP aware uprobe", v13.

This patchset makes uprobe aware of THPs.

Currently, when uprobe is attached to text on THP, the page is split by
FOLL_SPLIT. As a result, uprobe eliminates the performance benefit of
THP.

This set makes uprobe THP-aware. Instead of FOLL_SPLIT, we introduces
FOLL_SPLIT_PMD, which only split PMD for uprobe.

After all uprobes within the THP are removed, the PTE-mapped pages are
regrouped as huge PMD.

This set (plus a few THP patches) is also available at

https://github.com/liu-song-6/linux/tree/uprobe-thp

This patch (of 6):

Move memcmp_pages() to mm/util.c and pages_identical() to mm.h, so that we
can use them in other files.

Link: http://lkml.kernel.org/r/20190815164525.1848545-2-songliubraving@fb.com
Signed-off-by: Song Liu
Acked-by: Kirill A. Shutemov
Reviewed-by: Oleg Nesterov
Cc: Johannes Weiner
Cc: Matthew Wilcox
Cc: William Kucharski
Cc: Srikar Dronamraju
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Song Liu
2019-09-25 06:54:11 +0800
d8c6546b1 mm: introduce compound_nr() ... Browse Code »

Replace 1 << compound_order(page) with compound_nr(page). Minor
improvements in readability.

Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Andrew Morton
Reviewed-by: Ira Weiny
Acked-by: Kirill A. Shutemov
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2019-09-25 06:54:08 +0800

17 Jul, 2019

1 commit

79eb597cb mm: add account_locked_vm utility function ... Browse Code »

locked_vm accounting is done roughly the same way in five places, so
unify them in a helper.

Include the helper's caller in the debug print to distinguish between
callsites.

Error codes stay the same, so user-visible behavior does too. The one
exception is that the -EPERM case in tce_account_locked_vm is removed
because Alexey has never seen it triggered.

[daniel.m.jordan@oracle.com: v3]
Link: http://lkml.kernel.org/r/20190529205019.20927-1-daniel.m.jordan@oracle.com
[sfr@canb.auug.org.au: fix mm/util.c]
Link: http://lkml.kernel.org/r/20190524175045.26897-1-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan
Signed-off-by: Stephen Rothwell
Tested-by: Alexey Kardashevskiy
Acked-by: Alex Williamson
Cc: Alan Tull
Cc: Alex Williamson
Cc: Benjamin Herrenschmidt
Cc: Christoph Lameter
Cc: Christophe Leroy
Cc: Davidlohr Bueso
Cc: Jason Gunthorpe
Cc: Mark Rutland
Cc: Michael Ellerman
Cc: Moritz Fischer
Cc: Paul Mackerras
Cc: Steve Sistare
Cc: Wu Hao
Cc: Ira Weiny
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Jordan
2019-07-17 10:23:25 +0800

13 Jul, 2019

1 commit

050a9adc6 mm: consolidate the get_user_pages* implementations ... Browse Code »

Always build mm/gup.c so that we don't have to provide separate nommu
stubs. Also merge the get_user_pages_fast and __get_user_pages_fast stubs
when HAVE_FAST_GUP into the main implementations, which will never call
the fast path if HAVE_FAST_GUP is not set.

This also ensures the new put_user_pages* helpers are available for nommu,
as those are currently missing, which would create a problem as soon as we
actually grew users for it.

Link: http://lkml.kernel.org/r/20190625143715.1689-13-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:45 +0800

02 Jun, 2019

1 commit

bc81426f5 prctl_set_mm: downgrade mmap_sem to read lock ... Browse Code »

The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
semaphore taken.") added synchronization of reading argument/environment
boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
arg_lock to protect arg_start|end and env_start|end in mm_struct") avoided
the coarse use of mmap_sem in similar situations. But there still
remained two places that (mis)use mmap_sem.

get_cmdline should also use arg_lock instead of mmap_sem when it reads the
boundaries.

The second place that should use arg_lock is in prctl_set_mm. By
protecting the boundaries fields with the arg_lock, we can downgrade
mmap_sem to reader lock (analogous to what we already do in
prctl_set_mm_map).

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20190502125203.24014-3-mkoutny@suse.com
Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct")
Signed-off-by: Michal Koutný
Signed-off-by: Laurent Dufour
Co-developed-by: Laurent Dufour
Reviewed-by: Cyrill Gorcunov
Acked-by: Michal Hocko
Cc: Yang Shi
Cc: Mateusz Guzik
Cc: Kirill Tkhai
Cc: Konstantin Khlebnikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Koutný
2019-06-02 06:51:31 +0800

21 May, 2019

1 commit

457c89965 treewide: Add SPDX license identifier for missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

15 May, 2019

2 commits

8c7829b04 mm: fix false-positive OVERCOMMIT_GUESS failures ... Browse Code »

With the default overcommit==guess we occasionally run into mmap
rejections despite plenty of memory that would get dropped under
pressure but just isn't accounted reclaimable. One example of this is
dying cgroups pinned by some page cache. A previous case was auxiliary
path name memory associated with dentries; we have since annotated
those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
treat indirectly reclaimable memory as free in overcommit logic")).

But trying to classify all allocated memory reliably as reclaimable
and unreclaimable is a bit of a fool's errand. There could be a myriad
of dependencies that constantly change with kernel versions.

It becomes even more questionable of an effort when considering how
this estimate of available memory is used: it's not compared to the
system-wide allocated virtual memory in any way. It's not even
compared to the allocating process's address space. It's compared to
the single allocation request at hand!

So we have an elaborate left-hand side of the equation that tries to
assess the exact breathing room the system has available down to a
page - and then compare it to an isolated allocation request with no
additional context. We could fail an allocation of N bytes, but for
two allocations of N/2 bytes we'd do this elaborate dance twice in a
row and then still let N bytes of virtual memory through. This doesn't
make a whole lot of sense.

Let's take a step back and look at the actual goal of the
heuristic. From the documentation:

Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a
seriously wild allocation fails while allowing overcommit to
reduce swap usage. root is allowed to allocate slightly more
memory in this mode. This is the default.

If all we want to do is catch clearly bogus allocation requests
irrespective of the general virtual memory situation, the physical
memory counter-part doesn't need to be that complicated, either.

When in GUESS mode, catch wild allocations by comparing their request
size to total amount of ram and swap in the system.

Link: http://lkml.kernel.org/r/20190412191418.26333-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Acked-by: Roman Gushchin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2019-05-15 00:47:50 +0800
73b0140bf mm/gup: change GUP fast to use flags rather than a write 'bool' ... Browse Code »

To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality. New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted. This breaks the current
GUP call site convention of having the returned pages be the final
parameter. So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny
Reviewed-by: Mike Marshall
Cc: Aneesh Kumar K.V
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Dan Williams
Cc: "David S. Miller"
Cc: Heiko Carstens
Cc: Ingo Molnar
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: John Hubbard
Cc: "Kirill A. Shutemov"
Cc: Martin Schwidefsky
Cc: Michal Hocko
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Thomas Gleixner
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ira Weiny
2019-05-15 00:47:46 +0800

06 Apr, 2019

1 commit

e91455217 mm/util.c: fix strndup_user() comment ... Browse Code »

The kerneldoc misdescribes strndup_user()'s return value.

Cc: Dan Carpenter
Cc: Timur Tabi
Cc: Mihai Caraman
Cc: Kumar Gala
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2019-04-06 10:02:31 +0800

06 Mar, 2019

1 commit

a862f68a8 docs/core-api/mm: fix return value descriptions in mm/ ... Browse Code »

Many kernel-doc comments in mm/ have the return value descriptions
either misformatted or omitted at all which makes kernel-doc script
unhappy:

$ make V=1 htmldocs
...
./mm/util.c:36: info: Scanning doc for kstrdup
./mm/util.c:41: warning: No description found for return value of 'kstrdup'
./mm/util.c:57: info: Scanning doc for kstrdup_const
./mm/util.c:66: warning: No description found for return value of 'kstrdup_const'
./mm/util.c:75: info: Scanning doc for kstrndup
./mm/util.c:83: warning: No description found for return value of 'kstrndup'
...

Fixing the formatting and adding the missing return value descriptions
eliminates ~100 such warnings.

Link: http://lkml.kernel.org/r/1549549644-4903-4-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Andrew Morton
Cc: Jonathan Corbet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-03-06 13:07:20 +0800

22 Feb, 2019

1 commit

6c8fcc096 mm: don't let userspace spam allocations warnings ... Browse Code »

memdump_user usually gets fed unchecked userspace input. Blasting a
full backtrace into dmesg every time is a bit excessive - I'm not sure
on the kernel rule in general, but at least in drm we're trying not to
let unpriviledge userspace spam the logs freely. Definitely not entire
warning backtraces.

It also means more filtering for our CI, because our testsuite exercises
these corner cases and so hits these a lot.

Link: http://lkml.kernel.org/r/20190220204058.11676-1-daniel.vetter@ffwll.ch
Signed-off-by: Daniel Vetter
Reviewed-by: Andrew Morton
Acked-by: Michal Hocko
Reviewed-by: Kees Cook
Cc: Mike Rapoport
Cc: Roman Gushchin
Cc: Vlastimil Babka
Cc: Jan Stancek
Cc: Andrey Ryabinin
Cc: "Michael S. Tsirkin"
Cc: Huang Ying
Cc: Bartosz Golaszewski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Vetter
2019-02-22 01:01:01 +0800

09 Jan, 2019

1 commit

8ab88c716 mm: page_mapped: don't assume compound page is huge or THP ... Browse Code »

LTP proc01 testcase has been observed to rarely trigger crashes
on arm64:
page_mapped+0x78/0xb4
stable_page_flags+0x27c/0x338
kpageflags_read+0xfc/0x164
proc_reg_read+0x7c/0xb8
__vfs_read+0x58/0x178
vfs_read+0x90/0x14c
SyS_read+0x60/0xc0

The issue is that page_mapped() assumes that if compound page is not
huge, then it must be THP. But if this is 'normal' compound page
(COMPOUND_PAGE_DTOR), then following loop can keep running (for
HPAGE_PMD_NR iterations) until it tries to read from memory that isn't
mapped and triggers a panic:

for (i = 0; i < hpage_nr_pages(page); i++) {
if (atomic_read(&page[i]._mapcount) >= 0)
return true;
}

I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only
with a custom kernel module [1] which:
- allocates compound page (PAGEC) of order 1
- allocates 2 normal pages (COPY), which are initialized to 0xff (to
satisfy _mapcount >= 0)
- 2 PAGEC page structs are copied to address of first COPY page
- second page of COPY is marked as not present
- call to page_mapped(COPY) now triggers fault on access to 2nd COPY
page at offset 0x30 (_mapcount)

[1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c

Fix the loop to iterate for "1 << compound_order" pages.

Kirrill said "IIRC, sound subsystem can producuce custom mapped compound
pages".

Link: http://lkml.kernel.org/r/c440d69879e34209feba21e12d236d06bc0a25db.1543577156.git.jstancek@redhat.com
Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Jan Stancek
Debugged-by: Laszlo Ersek
Suggested-by: "Kirill A. Shutemov"
Acked-by: Michal Hocko
Acked-by: Kirill A. Shutemov
Reviewed-by: David Hildenbrand
Reviewed-by: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Stancek
2019-01-09 09:15:11 +0800

29 Dec, 2018

1 commit

ca79b0c21 mm: convert totalram_pages and totalhigh_pages variables to atomic ... Browse Code »

totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was complicating
things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS
Suggested-by: Michal Hocko
Suggested-by: Vlastimil Babka
Reviewed-by: Konstantin Khlebnikov
Reviewed-by: Pavel Tatashin
Acked-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: David Hildenbrand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun KS
2018-12-29 04:11:47 +0800

27 Oct, 2018

2 commits

52414d330 kvfree(): fix misleading comment ... Browse Code »

vfree() might sleep if called not in interrupt context. So does kvfree()
too. Fix misleading kvfree()'s comment about allowed context.

Link: http://lkml.kernel.org/r/20180914130512.10394-1-aryabinin@virtuozzo.com
Fixes: 04b8e946075d ("mm/util.c: improve kvfree() kerneldoc")
Signed-off-by: Andrey Ryabinin
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2018-10-27 07:26:33 +0800
b29940c1a mm: rename and change semantics of nr_indirectly_reclaimable_bytes ... Browse Code »

The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by
commit eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with
the goal of accounting objects that can be reclaimed, but cannot be
allocated via a SLAB_RECLAIM_ACCOUNT cache. This is now possible via
kmalloc() with __GFP_RECLAIMABLE flag, and the dcache external names user
is converted.

The counter is however still useful for accounting direct page allocations
(i.e. not slab) with a shrinker, such as the ION page pool. So keep it,
and:

- change granularity to pages to be more like other counters; sub-page
allocations should be able to use kmalloc
- rename the counter to NR_KERNEL_MISC_RECLAIMABLE
- expose the counter again in vmstat as "nr_kernel_misc_reclaimable"; we can
again remove the check for not printing "hidden" counters

Link: http://lkml.kernel.org/r/20180731090649.16028-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Christoph Lameter
Acked-by: Roman Gushchin
Cc: Vijayanand Jitta
Cc: Laura Abbott
Cc: Sumit Semwal
Cc: David Rientjes
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2018-10-27 07:26:32 +0800

16 Oct, 2018

1 commit

59c3f82ad mm: move is_kernel_rodata() to asm-generic/sections.h ... Browse Code »

Export this routine so that we can use it later in devm_kstrdup_const()
and devm_kfree().

Signed-off-by: Bartosz Golaszewski
Reviewed-by: Bjorn Andersson
Acked-by: Mike Rapoport
Acked-by: Rasmus Villemoes
Reviewed-by: Geert Uytterhoeven
Reviewed-by: Andy Shevchenko
Signed-off-by: Greg Kroah-Hartman

Bartosz Golaszewski
2018-10-16 18:53:27 +0800

05 Sep, 2018

1 commit

04b8e9460 mm/util.c: improve kvfree() kerneldoc ... Browse Code »

Scooped from an email from Matthew.

Cc: Mike Rapoport
Cc: Jonathan Corbet
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2018-09-05 07:45:02 +0800

24 Aug, 2018

2 commits

ff4dc7729 mm/util: add kernel-doc for kvfree ... Browse Code »

Link: http://lkml.kernel.org/r/1532626360-16650-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Andrew Morton
Cc: Jonathan Corbet
Cc: Matthew Wilcox
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-08-24 09:48:43 +0800
b86181f1a mm/util: make strndup_user description a kernel-doc comment ... Browse Code »

Patch series "memory management documentation updates", v3.

Here are several updates to the mm documentation.

Aside from really minor changes in the first three patches, the updates
are:

* move the documentation of kstrdup and friends to "String Manipulation"
section
* split memory management API into a separate .rst file
* adjust formating of the GFP flags description and include it in the
reference documentation.

This patch (of 7):

The description of the strndup_user function misses '*' character at the
beginning of the comment to be proper kernel-doc. Add the missing
character.

Link: http://lkml.kernel.org/r/1532626360-16650-2-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Andrew Morton
Cc: Jonathan Corbet
Cc: Matthew Wilcox
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-08-24 09:48:43 +0800

08 Jun, 2018

1 commit

ce91f6ee5 mm: kvmalloc does not fallback to vmalloc for incompatible gfp flags ... Browse Code »

kvmalloc warned about incompatible gfp_mask to catch abusers (mostly
GFP_NOFS) with an intention that this will motivate authors of the code
to fix those. Linus argues that this just motivates people to do even
more hacks like

if (gfp == GFP_KERNEL)
kvmalloc
else
kmalloc

I haven't seen this happening much (Linus pointed to bucket_lock special
cases an atomic allocation but my git foo hasn't found much more) but it
is true that we can grow those in future. Therefore Linus suggested to
simply not fallback to vmalloc for incompatible gfp flags and rather
stick with the kmalloc path.

Link: http://lkml.kernel.org/r/20180601115329.27807-1-mhocko@kernel.org
Signed-off-by: Michal Hocko
Suggested-by: Linus Torvalds
Cc: Tom Herbert
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2018-06-08 08:34:38 +0800

17 Apr, 2018

2 commits

24844fd33 Merge branch 'mm-rst' into docs-next ... Browse Code »

Mike Rapoport says:

These patches convert files in Documentation/vm to ReST format, add an
initial index and link it to the top level documentation.

There are no contents changes in the documentation, except few spelling
fixes. The relatively large diffstat stems from the indentation and
paragraph wrapping changes.

I've tried to keep the formatting as consistent as possible, but I could
miss some places that needed markup and add some markup where it was not
necessary.

[jc: significant conflicts in vm/hmm.rst]

Jonathan Corbet
2018-04-17 04:25:08 +0800
ad56b738c docs/vm: rename documentation files to .rst ... Browse Code »

Signed-off-by: Mike Rapoport
Signed-off-by: Jonathan Corbet

Mike Rapoport
2018-04-17 04:18:15 +0800

14 Apr, 2018

1 commit

d08110786 mm/gup.c: document return value ... Browse Code »

__get_user_pages_fast handles errors differently from
get_user_pages_fast: the former always returns the number of pages
pinned, the later might return a negative error code.

Link: http://lkml.kernel.org/r/1522962072-182137-6-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin
Reviewed-by: Andrew Morton
Cc: Kirill A. Shutemov
Cc: Huang Ying
Cc: Jonathan Corbet
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Thorsten Leemhuis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael S. Tsirkin
2018-04-14 08:10:27 +0800

12 Apr, 2018

2 commits

8f2af155b exec: pass stack rlimit into mm layout functions ... Browse Code »

Patch series "exec: Pin stack limit during exec".

Attempts to solve problems with the stack limit changing during exec
continue to be frustrated[1][2]. In addition to the specific issues
around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
other places during exec where the stack limit is used and is assumed to
be unchanging. Given the many places it gets used and the fact that it
can be manipulated/raced via setrlimit() and prlimit(), I think the only
way to handle this is to move away from the "current" view of the stack
limit and instead attach it to the bprm, and plumb this down into the
functions that need to know the stack limits. This series implements
the approach.

[1] 04e35f4495dd ("exec: avoid RLIMIT_STACK races with prlimit()")
[2] 779f4e1c6c7c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
[3] to security@kernel.org, "Subject: existing rlimit races?"

This patch (of 3):

Since it is possible that the stack rlimit can change externally during
exec (either via another thread calling setrlimit() or another process
calling prlimit()), provide a way to pass the rlimit down into the
per-architecture mm layout functions so that the rlimit can stay in the
bprm structure instead of sitting in the signal structure until exec is
finalized.

Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook
Cc: Michal Hocko
Cc: Ben Hutchings
Cc: Willy Tarreau
Cc: Hugh Dickins
Cc: Oleg Nesterov
Cc: "Jason A. Donenfeld"
Cc: Rik van Riel
Cc: Laura Abbott
Cc: Greg KH
Cc: Andy Lutomirski
Cc: Ben Hutchings
Cc: Brad Spengler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2018-04-12 01:28:37 +0800
d79f7aa49 mm: treat indirectly reclaimable memory as free in overcommit logic ... Browse Code »

Indirectly reclaimable memory can consume a significant part of total
memory and it's actually reclaimable (it will be released under actual
memory pressure).

So, the overcommit logic should treat it as free.

Otherwise, it's possible to cause random system-wide memory allocation
failures by consuming a significant amount of memory by indirectly
reclaimable memory, e.g. dentry external names.

If overcommit policy GUESS is used, it might be used for denial of
service attack under some conditions.

The following program illustrates the approach. It causes the kernel to
allocate an unreclaimable kmalloc-256 chunk for each stat() call, so
that at some point the overcommit logic may start blocking large
allocation system-wide.

int main()
{
char buf[256];
unsigned long i;
struct stat statbuf;

buf[0] = '/';
for (i = 1; i < sizeof(buf); i++)
buf[i] = '_';

for (i = 0; 1; i++) {
sprintf(&buf[248], "%8lu", i);
stat(buf, &statbuf);
}

return 0;
}

This patch in combination with related indirectly reclaimable memory
patches closes this issue.

Link: http://lkml.kernel.org/r/20180313130041.8078-1-guro@fb.com
Signed-off-by: Roman Gushchin
Reviewed-by: Andrew Morton
Cc: Alexander Viro
Cc: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Gushchin
2018-04-12 01:28:29 +0800

06 Apr, 2018

1 commit

cb9f753a3 mm: fix races between swapoff and flush dcache ... Browse Code »

Thanks to commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB
trunks"), after swapoff the address_space associated with the swap
device will be freed. So page_mapping() users which may touch the
address_space need some kind of mechanism to prevent the address_space
from being freed during accessing.

The dcache flushing functions (flush_dcache_page(), etc) in architecture
specific code may access the address_space of swap device for anonymous
pages in swap cache via page_mapping() function. But in some cases
there are no mechanisms to prevent the swap device from being swapoff,
for example,

CPU1 CPU2
__get_user_pages() swapoff()
flush_dcache_page()
mapping = page_mapping()
... exit_swap_address_space()
... kvfree(spaces)
mapping_mapped(mapping)

The address space may be accessed after being freed.

But from cachetlb.txt and Russell King, flush_dcache_page() only care
about file cache pages, for anonymous pages, flush_anon_page() should be
used. The implementation of flush_dcache_page() in all architectures
follows this too. They will check whether page_mapping() is NULL and
whether mapping_mapped() is true to determine whether to flush the
dcache immediately. And they will use interval tree (mapping->i_mmap)
to find all user space mappings. While mapping_mapped() and
mapping->i_mmap isn't used by anonymous pages in swap cache at all.

So, to fix the race between swapoff and flush dcache, __page_mapping()
is add to return the address_space for file cache pages and NULL
otherwise. All page_mapping() invoking in flush dcache functions are
replaced with page_mapping_file().

[akpm@linux-foundation.org: simplify page_mapping_file(), per Mike]
Link: http://lkml.kernel.org/r/20180305083634.15174-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying"
Reviewed-by: Andrew Morton
Cc: Minchan Kim
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Chen Liqin
Cc: Russell King
Cc: Yoshinori Sato
Cc: "James E.J. Bottomley"
Cc: Guan Xuetao
Cc: "David S. Miller"
Cc: Chris Zankel
Cc: Vineet Gupta
Cc: Ley Foon Tan
Cc: Ralf Baechle
Cc: Andi Kleen
Cc: Mike Rapoport
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2018-04-06 12:36:26 +0800

08 Jan, 2018

2 commits

50fd2f298 new primitive: vmemdup_user() ... Browse Code »

similar to memdup_user(), but does *not* guarantee that result will
be physically contiguous; use only in cases where that's not a requirement
and free it with kvfree().

Signed-off-by: Al Viro

Al Viro
2018-01-08 02:06:15 +0800
6c2c97a24 memdup_user(): switch to GFP_USER ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2018-01-08 02:00:27 +0800

07 Sep, 2017

1 commit

c41f012ad mm: rename global_page_state to global_zone_page_state ... Browse Code »

global_page_state is error prone as a recent bug report pointed out [1].
It only returns proper values for zone based counters as the enum it
gets suggests. We already have global_node_page_state so let's rename
global_page_state to global_zone_page_state to be more explicit here.
All existing users seems to be correct:

$ git grep "global_page_state(NR_" | sed 's@.*($NR_[A-Z_]*$).*@\1@' | sort | uniq -c
2 NR_BOUNCE
2 NR_FREE_CMA_PAGES
11 NR_FREE_PAGES
1 NR_KERNEL_STACK_KB
1 NR_MLOCK
2 NR_PAGETABLE

This patch shouldn't introduce any functional change.

[1] http://lkml.kernel.org/r/201707260628.v6Q6SmaS030814@www262.sakura.ne.jp

Link: http://lkml.kernel.org/r/20170801134256.5400-2-hannes@cmpxchg.org
Signed-off-by: Michal Hocko
Signed-off-by: Johannes Weiner
Cc: Tetsuo Handa
Cc: Josef Bacik
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-09-07 08:27:29 +0800

11 Aug, 2017

1 commit

d507e2ebd mm: fix global NR_SLAB_.*CLAIMABLE counter reads ... Browse Code »

As Tetsuo points out:
"Commit 385386cff4c6 ("mm: vmstat: move slab statistics from zone to
node counters") broke "Slab:" field of /proc/meminfo . It shows nearly
0kB"

In addition to /proc/meminfo, this problem also affects the slab
counters OOM/allocation failure info dumps, can cause early -ENOMEM from
overcommit protection, and miscalculate image size requirements during
suspend-to-disk.

This is because the patch in question switched the slab counters from
the zone level to the node level, but forgot to update the global
accessor functions to read the aggregate node data instead of the
aggregate zone data.

Use global_node_page_state() to access the global slab counters.

Fixes: 385386cff4c6 ("mm: vmstat: move slab statistics from zone to node counters")
Link: http://lkml.kernel.org/r/20170801134256.5400-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Reported-by: Tetsuo Handa
Acked-by: Michal Hocko
Cc: Josef Bacik
Cc: Vladimir Davydov
Cc: Stefan Agner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2017-08-11 06:54:06 +0800

16 Jul, 2017

1 commit

78dcf7342 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull ->s_options removal from Al Viro:
"Preparations for fsmount/fsopen stuff (coming next cycle). Everything
gets moved to explicit ->show_options(), killing ->s_options off +
some cosmetic bits around fs/namespace.c and friends. Basically, the
stuff needed to work with fsmount series with minimum of conflicts
with other work.

It's not strictly required for this merge window, but it would reduce
the PITA during the coming cycle, so it would be nice to have those
bits and pieces out of the way"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
isofs: Fix isofs_show_options()
VFS: Kill off s_options and helpers
orangefs: Implement show_options
9p: Implement show_options
isofs: Implement show_options
afs: Implement show_options
affs: Implement show_options
befs: Implement show_options
spufs: Implement show_options
bpf: Implement show_options
ramfs: Implement show_options
pstore: Implement show_options
omfs: Implement show_options
hugetlbfs: Implement show_options
VFS: Don't use save/replace_mount_options if not using generic_show_options
VFS: Provide empty name qstr
VFS: Make get_filesystem() return the affected filesystem
VFS: Clean up whitespace in fs/namespace.c and fs/super.c
Provide a function to create a NUL-terminated string from unterminated data

Linus Torvalds
2017-07-16 03:00:42 +0800

13 Jul, 2017

2 commits

cc965a29d mm: kvmalloc support __GFP_RETRY_MAYFAIL for all sizes ... Browse Code »

Now that __GFP_RETRY_MAYFAIL has a reasonable semantic regardless of the
request size we can drop the hackish implementation for !costly orders.
__GFP_RETRY_MAYFAIL retries as long as the reclaim makes a forward
progress and backs of when we are out of memory for the requested size.
Therefore we do not need to enforce__GFP_NORETRY for !costly orders just
to silent the oom killer anymore.

Link: http://lkml.kernel.org/r/20170623085345.11304-5-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Alex Belits
Cc: Chris Wilson
Cc: Christoph Hellwig
Cc: Darrick J. Wong
Cc: David Daney
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: NeilBrown
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-07-13 07:26:03 +0800
dcda9b047 mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic ... Browse Code »

__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
the page allocator. This has been true but only for allocations
requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always
ignored for smaller sizes. This is a bit unfortunate because there is
no way to express the same semantic for those requests and they are
considered too important to fail so they might end up looping in the
page allocator for ever, similarly to GFP_NOFAIL requests.

Now that the whole tree has been cleaned up and accidental or misled
usage of __GFP_REPEAT flag has been removed for !costly requests we can
give the original flag a better name and more importantly a more useful
semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
that the allocator would try really hard but there is no promise of a
success. This will work independent of the order and overrides the
default allocator behavior. Page allocator users have several levels of
guarantee vs. cost options (take GFP_KERNEL as an example)

- GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
attempt to free memory at all. The most light weight mode which even
doesn't kick the background reclaim. Should be used carefully because
it might deplete the memory and the next user might hit the more
aggressive reclaim

- GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
allocation without any attempt to free memory from the current
context but can wake kswapd to reclaim memory if the zone is below
the low watermark. Can be used from either atomic contexts or when
the request is a performance optimization and there is another
fallback for a slow path.

- (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
non sleeping allocation with an expensive fallback so it can access
some portion of memory reserves. Usually used from interrupt/bh
context with an expensive slow path fallback.

- GFP_KERNEL - both background and direct reclaim are allowed and the
_default_ page allocator behavior is used. That means that !costly
allocation requests are basically nofail but there is no guarantee of
that behavior so failures have to be checked properly by callers
(e.g. OOM killer victim is allowed to fail currently).

- GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
and all allocation requests fail early rather than cause disruptive
reclaim (one round of reclaim in this implementation). The OOM killer
is not invoked.

- GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
behavior and all allocation requests try really hard. The request
will fail if the reclaim cannot make any progress. The OOM killer
won't be triggered.

- GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
and all allocation requests will loop endlessly until they succeed.
This might be really dangerous especially for larger orders.

Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
because they already had their semantic. No new users are added.
__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
there is no progress and we have already passed the OOM point.

This means that all the reclaim opportunities have been exhausted except
the most disruptive one (the OOM killer) and a user defined fallback
behavior is more sensible than keep retrying in the page allocator.

[akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
[mhocko@suse.com: semantic fix]
Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
[mhocko@kernel.org: address other thing spotted by Vlastimil]
Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Alex Belits
Cc: Chris Wilson
Cc: Christoph Hellwig
Cc: Darrick J. Wong
Cc: David Daney
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: NeilBrown
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-07-13 07:26:03 +0800

06 Jul, 2017

1 commit

f35157417 Provide a function to create a NUL-terminated string from unterminated data ... Browse Code »

Provide a function, kmemdup_nul(), that will create a NUL-terminated string
from an unterminated character array where the length is known in advance.

This is better than kstrndup() in situations where we already know the
string length as the strnlen() in kstrndup() is superfluous.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2017-07-06 15:27:09 +0800

03 Jun, 2017

1 commit

4f4f2ba9c mm: clarify why we want kmalloc before falling backto vmallock ... Browse Code »

While converting drm_[cm]alloc* helpers to kvmalloc* variants Chris
Wilson has wondered why we want to try kmalloc before vmalloc fallback
even for larger allocations requests. Let's clarify that one larger
physically contiguous block is less likely to fragment memory than many
scattered pages which can prevent more large blocks from being created.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170517080932.21423-1-mhocko@kernel.org
Signed-off-by: Michal Hocko
Suggested-by: Chris Wilson
Reviewed-by: Chris Wilson
Acked-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-06-03 06:07:37 +0800

13 May, 2017

1 commit

8594a21cf mm, vmalloc: fix vmalloc users tracking properly ... Browse Code »

Commit 1f5307b1e094 ("mm, vmalloc: properly track vmalloc users") has
pulled asm/pgtable.h include dependency to linux/vmalloc.h and that
turned out to be a bad idea for some architectures. E.g. m68k fails
with

In file included from arch/m68k/include/asm/pgtable_mm.h:145:0,
from arch/m68k/include/asm/pgtable.h:4,
from include/linux/vmalloc.h:9,
from arch/m68k/kernel/module.c:9:
arch/m68k/include/asm/mcf_pgtable.h: In function 'nocache_page':
>> arch/m68k/include/asm/mcf_pgtable.h:339:43: error: 'init_mm' undeclared (first use in this function)
#define pgd_offset_k(address) pgd_offset(&init_mm, address)

as spotted by kernel build bot. nios2 fails for other reason

In file included from include/asm-generic/io.h:767:0,
from arch/nios2/include/asm/io.h:61,
from include/linux/io.h:25,
from arch/nios2/include/asm/pgtable.h:18,
from include/linux/mm.h:70,
from include/linux/pid_namespace.h:6,
from include/linux/ptrace.h:9,
from arch/nios2/include/uapi/asm/elf.h:23,
from arch/nios2/include/asm/elf.h:22,
from include/linux/elf.h:4,
from include/linux/module.h:15,
from init/main.c:16:
include/linux/vmalloc.h: In function '__vmalloc_node_flags':
include/linux/vmalloc.h:99:40: error: 'PAGE_KERNEL' undeclared (first use in this function); did you mean 'GFP_KERNEL'?

which is due to the newly added #include , which on nios2
includes and thus and which
again includes .

Tweaking that around just turns out a bigger headache than necessary.
This patch reverts 1f5307b1e094 and reimplements the original fix in a
different way. __vmalloc_node_flags can stay static inline which will
cover vmalloc* functions. We only have one external user
(kvmalloc_node) and we can export __vmalloc_node_flags_caller and
provide the caller directly. This is much simpler and it doesn't really
need any games with header files.

[akpm@linux-foundation.org: coding-style fixes]
[mhocko@kernel.org: revert old comment]
Link: http://lkml.kernel.org/r/20170509211054.GB16325@dhcp22.suse.cz
Fixes: 1f5307b1e094 ("mm, vmalloc: properly track vmalloc users")
Link: http://lkml.kernel.org/r/20170509153702.GR6481@dhcp22.suse.cz
Signed-off-by: Michal Hocko
Cc: Tobias Klauser
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-05-13 06:57:15 +0800