Eric Lee / smarc-fsl-linux-kernel

16 Apr, 2016

1 commit

2e5725991 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A few fixes for the current series. This contains:

- Two fixes for NVMe:

One fixes a reset race that can be triggered by repeated
insert/removal of the module.

The other fixes an issue on some platforms, where we get probe
timeouts since legacy interrupts isn't working. This used not to
be a problem since we had the worker thread poll for completions,
but since that was killed off, it means those poor souls can't
successfully probe their NVMe device. Use a proper IRQ check and
probe (msi-x -> msi ->legacy), like most other drivers to work
around this. Both from Keith.

- A loop corruption issue with offset in iters, from Ming Lei.

- A fix for not having the partition stat per cpu ref count
initialized before sending out the KOBJ_ADD, which could cause user
space to access the counter prior to initialization. Also from
Ming Lei.

- A fix for using the wrong congestion state, from Kaixu Xia"

* 'for-linus' of git://git.kernel.dk/linux-block:
block: loop: fix filesystem corruption in case of aio/dio
NVMe: Always use MSI/MSI-x interrupts
NVMe: Fix reset/remove race
writeback: fix the wrong congested state variable definition
block: partition: initialize percpuref before sending out KOBJ_ADD

Linus Torvalds
2016-04-16 06:44:10 +0800

15 Apr, 2016

1 commit

a1f983174 Merge branch 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull mm gup cleanup from Ingo Molnar:
"This removes the ugly get-user-pages API hack, now that all upstream
code has been migrated to it"

("ugly" is putting it mildly. But it worked.. - Linus)

* 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm/gup: Remove the macro overload API migration helpers from the get_user*() APIs

Linus Torvalds
2016-04-15 10:31:34 +0800

07 Apr, 2016

1 commit

c12d2da56 mm/gup: Remove the macro overload API migration helpers from the get_user*() APIs ... Browse Code »

The pkeys changes brought about a truly hideous set of macros in:

cde70140fed8 ("mm/gup: Overload get_user_pages() functions")

... which macros are (ab-)using the fact that __VA_ARGS__ can be used
to shift parameter positions in macro arguments without breaking the
build and so can be used to call separate C functions depending on
the number of arguments of the macro.

This allowed easy migration of these 3 GUP APIs, as both these variants
worked at the C level:

old:
ret = get_user_pages(current, current->mm, address, 1, 1, 0, &page, NULL);

new:
ret = get_user_pages(address, 1, 1, 0, &page, NULL);

... while we also generated a (functionally harmless but noticeable) build
time warning if the old API was used. As there are over 300 uses of these
APIs, this trick eased the migration of the API and avoided excessive
migration pain in linux-next.

Now, with its work done, get rid of all of that complication and ugliness:

3 files changed, 16 insertions(+), 140 deletions(-)

... where the linecount of the migration hack was further inflated by the
fact that there are NOMMU variants of these GUP APIs as well.

Much of the conversion was done in linux-next over the past couple of months,
and Linus recently removed all remaining old API uses from the upstream tree
in the following upstrea commit:

cb107161df3c ("Convert straggling drivers to new six-argument get_user_pages()")

There was one more old-API usage in mm/gup.c, in the CONFIG_HAVE_GENERIC_RCU_GUP
code path that ARM, ARM64 and PowerPC uses.

After this commit any old API usage will break the build.

[ Also fixed a PowerPC/HAVE_GENERIC_RCU_GUP warning reported by Stephen Rothwell. ]

Cc: Andrew Morton
Cc: Dave Hansen
Cc: Dave Hansen
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephen Rothwell
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2016-04-07 16:46:14 +0800

05 Apr, 2016

3 commits

4a2d057e4 Merge branch 'PAGE_CACHE_SIZE-removal' ... Browse Code »

Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
"PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

Let's stop pretending that pages in page cache are special. They are
not.

The first patch with most changes has been done with coccinelle. The
second is manual fixups on top.

The third patch removes macros definition"

[ I was planning to apply this just before rc2, but then I spaced out,
so here it is right _after_ rc2 instead.

As Kirill suggested as a possibility, I could have decided to only
merge the first two patches, and leave the old interfaces for
compatibility, but I'd rather get it all done and any out-of-tree
modules and patches can trivially do the converstion while still also
working with older kernels, so there is little reason to try to
maintain the redundant legacy model. - Linus ]

* PAGE_CACHE_SIZE-removal:
mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

Linus Torvalds
2016-04-05 01:50:24 +0800
ea1754a08 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage ... Browse Code »

Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800
09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

02 Apr, 2016

5 commits

ec3b68825 mm/page_isolation.c: fix the function comments ... Browse Code »

Commit fea85cff11de ("mm/page_isolation.c: return last tested pfn rather
than failure indicator") changed the meaning of the return value. Let's
change the function comments as well.

Signed-off-by: Neil Zhang
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Zhang
2016-04-02 06:03:37 +0800
af8e15cc8 oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head ... Browse Code »

Commit bb29902a7515 ("oom, oom_reaper: protect oom_reaper_list using
simpler way") has simplified the check for tasks already enqueued for
the oom reaper by checking tsk->oom_reaper_list != NULL. This check is
not sufficient because the tsk might be the head of the queue without
any other tasks queued and then we would simply lockup looping on the
same task. Fix the condition by checking for the head as well.

Fixes: bb29902a7515 ("oom, oom_reaper: protect oom_reaper_list using simpler way")
Signed-off-by: Michal Hocko
Acked-by: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-04-02 06:03:37 +0800
858eaaa71 mm/rmap: batched invalidations should use existing api ... Browse Code »

The recently introduced batched invalidations mechanism uses its own
mechanism for shootdown. However, it does wrong accounting of
interrupts (e.g., inc_irq_stat is called for local invalidations),
trace-points (e.g., TLB_REMOTE_SHOOTDOWN for local invalidations) and
may break some platforms as it bypasses the invalidation mechanisms of
Xen and SGI UV.

This patch reuses the existing TLB flushing mechnaisms instead. We use
NULL as mm to indicate a global invalidation is required.

Fixes 72b252aed506b8 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
Signed-off-by: Nadav Amit
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Dave Hansen
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadav Amit
2016-04-02 06:03:37 +0800
6f25a14a7 mm: fix invalid node in alloc_migrate_target() ... Browse Code »

It is incorrect to use next_node to find a target node, it will return
MAX_NUMNODES or invalid node. This will lead to crash in buddy system
allocation.

Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Xishi Qiu
Acked-by: Vlastimil Babka
Acked-by: Naoya Horiguchi
Cc: Joonsoo Kim
Cc: David Rientjes
Cc: "Laura Abbott"
Cc: Hui Zhu
Cc: Wang Xiaoqiang
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xishi Qiu
2016-04-02 06:03:37 +0800
0b355eaaa mm, kasan: fix compilation for CONFIG_SLAB ... Browse Code »

Add the missing argument to set_track().

Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: Alexander Potapenko
Cc: Andrey Konovalov
Cc: Christoph Lameter
Cc: Dmitry Vyukov
Cc: Andrey Ryabinin
Cc: Steven Rostedt
Cc: Joonsoo Kim
Cc: Konstantin Serebryany
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-04-02 06:03:37 +0800

01 Apr, 2016

1 commit

c877ef8ae writeback: fix the wrong congested state variable definition ... Browse Code »

The right variable definition should be wb_congested_state that
include WB_async_congested and WB_sync_congested. So fix it.

Signed-off-by: Kaixu Xia
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Kaixu Xia
2016-04-01 02:26:25 +0800

26 Mar, 2016

14 commits

0fda2788b thp: fix typo in khugepaged_scan_pmd() ... Browse Code »

!PageLRU should lead to SCAN_PAGE_LRU, not SCAN_SCAN_ABORT result.

Signed-off-by: Kirill A. Shutemov
Cc: Ebru Akagunduz
Cc: Rik van Riel
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-03-26 07:37:42 +0800
e7080a439 mm/filemap: generic_file_read_iter(): check for zero reads unconditionally ... Browse Code »

If
- generic_file_read_iter() gets called with a zero read length,
- the read offset is at a page boundary,
- IOCB_DIRECT is not set
- and the page in question hasn't made it into the page cache yet,
then do_generic_file_read() will trigger a readahead with a req_size hint
of zero.

Since roundup_pow_of_two(0) is undefined, UBSAN reports

UBSAN: Undefined behaviour in include/linux/log2.h:63:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 3 PID: 1017 Comm: sa1 Tainted: G L 4.5.0-next-20160318+ #14
[...]
Call Trace:
[...]
[] ondemand_readahead+0x3aa/0x3d0
[] ? ondemand_readahead+0x3aa/0x3d0
[] ? find_get_entry+0x2d/0x210
[] page_cache_sync_readahead+0x63/0xa0
[] do_generic_file_read+0x80d/0xf90
[] generic_file_read_iter+0x185/0x420
[...]
[] __vfs_read+0x256/0x3d0
[...]

when get_init_ra_size() gets called from ondemand_readahead().

The net effect is that the initial readahead size is arch dependent for
requested read lengths of zero: for example, since

1UL << (sizeof(unsigned long) * 8)

evaluates to 1 on x86 while its result is 0 on ARMv7, the initial readahead
size becomes 4 on the former and 0 on the latter.

What's more, whether or not the file access timestamp is updated for zero
length reads is decided differently for the two cases of IOCB_DIRECT
being set or cleared: in the first case, generic_file_read_iter()
explicitly skips updating that timestamp while in the latter case, it is
always updated through the call to do_generic_file_read().

According to POSIX, zero length reads "do not modify the last data access
timestamp" and thus, the IOCB_DIRECT behaviour is POSIXly correct.

Let generic_file_read_iter() unconditionally check the requested read
length at its entry and return immediately with success if it is zero.

Signed-off-by: Nicolai Stange
Cc: Al Viro
Reviewed-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolai Stange
2016-03-26 07:37:42 +0800
cd11016e5 mm, kasan: stackdepot implementation. Enable stackdepot for SLAB ... Browse Code »

Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
will allow KASAN store allocation/deallocation stack traces for memory
chunks. The stack traces are stored in a hash table and referenced by
handles which reside in the kasan_alloc_meta and kasan_free_meta
structures in the allocated memory chunks.

IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.

Right now stackdepot support is only enabled in SLAB allocator. Once
KASAN features in SLAB are on par with those in SLUB we can switch SLUB
to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.

This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.

Joonsoo has said that he plans to reuse the stackdepot code for the
mm/page_owner.c debugging facility.

[akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
[aryabinin@virtuozzo.com: comment style fixes]
Signed-off-by: Alexander Potapenko
Signed-off-by: Andrey Ryabinin
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Steven Rostedt
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-03-26 07:37:42 +0800
505f5dcb1 mm, kasan: add GFP flags to KASAN API ... Browse Code »

Add GFP flags to KASAN hooks for future patches to use.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Andrey Ryabinin
Cc: Steven Rostedt
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-03-26 07:37:42 +0800
7ed2f9e66 mm, kasan: SLAB support ... Browse Code »

Add KASAN hooks to SLAB allocator.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Andrey Ryabinin
Cc: Steven Rostedt
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-03-26 07:37:42 +0800
d9dddbf55 mm/page_alloc: prevent merging between isolated and other pageblocks ... Browse Code »

Hanjun Guo has reported that a CMA stress test causes broken accounting of
CMA and free pages:

> Before the test, I got:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal: 204800 kB
> CmaFree: 195044 kB
>
>
> After running the test:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal: 204800 kB
> CmaFree: 6602584 kB
>
> So the freed CMA memory is more than total..
>
> Also the the MemFree is more than mem total:
>
> -bash-4.3# cat /proc/meminfo
> MemTotal: 16342016 kB
> MemFree: 22367268 kB
> MemAvailable: 22370528 kB

Laura Abbott has confirmed the issue and suspected the freepage accounting
rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
pageblocks:

> CMA isolates MAX_ORDER aligned blocks, but, during the process,
> partialy isolated block exists. If MAX_ORDER is 11 and
> pageblock_order is 9, two pageblocks make up MAX_ORDER
> aligned block and I can think following scenario because pageblock
> (un)isolation would be done one by one.
>
> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
> MIGRATE_ISOLATE, respectively.
>
> CC -> IC -> II (Isolation)
> II -> CI -> CC (Un-isolation)
>
> If some pages are freed at this intermediate state such as IC or CI,
> that page could be merged to the other page that is resident on
> different type of pageblock and it will cause wrong freepage count.

This was supposed to be prevented by CMA operating on MAX_ORDER blocks,
but since it doesn't hold the zone->lock between pageblocks, a race
window does exist.

It's also likely that unexpected merging can occur between
MIGRATE_ISOLATE and non-CMA pageblocks. This should be prevented in
__free_one_page() since commit 3c605096d315 ("mm/page_alloc: restrict
max order of merging on isolated pageblock"). However, we only check
the migratetype of the pageblock where buddy merging has been initiated,
not the migratetype of the buddy pageblock (or group of pageblocks)
which can be MIGRATE_ISOLATE.

Joonsoo has suggested checking for buddy migratetype as part of
page_is_buddy(), but that would add extra checks in allocator hotpath
and bloat-o-meter has shown significant code bloat (the function is
inline).

This patch reduces the bloat at some expense of more complicated code.
The buddy-merging while-loop in __free_one_page() is initially bounded
to pageblock_border and without any migratetype checks. The checks are
placed outside, bumping the max_order if merging is allowed, and
returning to the while-loop with a statement which can't be possibly
considered harmful.

This fixes the accounting bug and also removes the arguably weird state
in the original commit 3c605096d315 where buddies could be left
unmerged.

Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Link: https://lkml.org/lkml/2016/3/2/280
Signed-off-by: Vlastimil Babka
Reported-by: Hanjun Guo
Tested-by: Hanjun Guo
Acked-by: Joonsoo Kim
Debugged-by: Laura Abbott
Debugged-by: Joonsoo Kim
Cc: Mel Gorman
Cc: "Kirill A. Shutemov"
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Yasuaki Ishimatsu
Cc: Zhang Yanfei
Cc: Michal Nazarewicz
Cc: Naoya Horiguchi
Cc: "Aneesh Kumar K.V"
Cc: [3.18+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2016-03-26 07:37:42 +0800
bb29902a7 oom, oom_reaper: protect oom_reaper_list using simpler way ... Browse Code »

"oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task" tried
to protect oom_reaper_list using MMF_OOM_KILLED flag. But we can do it
by simply checking tsk->oom_reaper_list != NULL.

Signed-off-by: Tetsuo Handa
Signed-off-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2016-03-26 07:37:42 +0800
e26796066 oom: make oom_reaper freezable ... Browse Code »

After "oom: clear TIF_MEMDIE after oom_reaper managed to unmap the
address space" oom_reaper will call exit_oom_victim on the target task
after it is done. This might however race with the PM freezer:

CPU0 CPU1 CPU2
freeze_processes
try_to_freeze_tasks
# Allocation request
out_of_memory
oom_killer_disable
wake_oom_reaper(P1)
__oom_reap_task
exit_oom_victim(P1)
wait_event(oom_victims==0)
[...]
do_exit(P1)
perform IO/interfere with the freezer

which breaks the oom_killer_disable semantic. We no longer have a
guarantee that the oom victim won't interfere with the freezer because
it might be anywhere on the way to do_exit while the freezer thinks the
task has already terminated. It might trigger IO or touch devices which
are frozen already.

In order to close this race, make the oom_reaper thread freezable. This
will work because
a) already running oom_reaper will block freezer to enter the
quiescent state
b) wake_oom_reaper will not wake up the reaper after it has been
frozen
c) the only way to call exit_oom_victim after try_to_freeze_tasks
is from the oom victim's context when we know the further
interference shouldn't be possible

Signed-off-by: Michal Hocko
Cc: Tetsuo Handa
Cc: David Rientjes
Cc: Mel Gorman
Cc: Oleg Nesterov
Cc: Hugh Dickins
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-03-26 07:37:42 +0800
29c696e1c oom: make oom_reaper_list single linked ... Browse Code »

Entries are only added/removed from oom_reaper_list at head so we can
use a single linked list and hence save a word in task_struct.

Signed-off-by: Vladimir Davydov
Signed-off-by: Michal Hocko
Cc: Tetsuo Handa
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2016-03-26 07:37:42 +0800
855b01832 oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task ... Browse Code »

Tetsuo has reported that oom_kill_allocating_task=1 will cause
oom_reaper_list corruption because oom_kill_process doesn't follow
standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue
the same task multiple times - e.g. by sacrificing the same child
multiple times.

This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag
which is set in oom_kill_process atomically and oom reaper is disabled
if the flag was already set.

Signed-off-by: Michal Hocko
Reported-by: Tetsuo Handa
Cc: David Rientjes
Cc: Mel Gorman
Cc: Oleg Nesterov
Cc: Hugh Dickins
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-03-26 07:37:42 +0800
03049269d mm, oom_reaper: implement OOM victims queuing ... Browse Code »

wake_oom_reaper has allowed only 1 oom victim to be queued. The main
reason for that was the simplicity as other solutions would require some
way of queuing. The current approach is racy and that was deemed
sufficient as the oom_reaper is considered a best effort approach to
help with oom handling when the OOM victim cannot terminate in a
reasonable time. The race could lead to missing an oom victim which can
get stuck

out_of_memory
wake_oom_reaper
cmpxchg // OK
oom_reaper
oom_reap_task
__oom_reap_task
oom_victim terminates
atomic_inc_not_zero // fail
out_of_memory
wake_oom_reaper
cmpxchg // fails
task_to_reap = NULL

This race requires 2 OOM invocations in a short time period which is not
very likely but certainly not impossible. E.g. the original victim
might have not released a lot of memory for some reason.

The situation would improve considerably if wake_oom_reaper used a more
robust queuing. This is what this patch implements. This means adding
oom_reaper_list list_head into task_struct (eat a hole before embeded
thread_struct for that purpose) and a oom_reaper_lock spinlock for
queuing synchronization. wake_oom_reaper will then add the task on the
queue and oom_reaper will dequeue it.

Signed-off-by: Michal Hocko
Cc: Vladimir Davydov
Cc: Andrea Argangeli
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Oleg Nesterov
Cc: Rik van Riel
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-03-26 07:37:42 +0800
bc448e897 mm, oom_reaper: report success/failure ... Browse Code »

Inform about the successful/failed oom_reaper attempts and dump all the
held locks to tell us more who is blocking the progress.

[akpm@linux-foundation.org: fix CONFIG_MMU=n build]
Signed-off-by: Michal Hocko
Cc: Andrea Argangeli
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Oleg Nesterov
Cc: Rik van Riel
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-03-26 07:37:42 +0800
36324a990 oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space ... Browse Code »

When oom_reaper manages to unmap all the eligible vmas there shouldn't
be much of the freable memory held by the oom victim left anymore so it
makes sense to clear the TIF_MEMDIE flag for the victim and allow the
OOM killer to select another task.

The lack of TIF_MEMDIE also means that the victim cannot access memory
reserves anymore but that shouldn't be a problem because it would get
the access again if it needs to allocate and hits the OOM killer again
due to the fatal_signal_pending resp. PF_EXITING check. We can safely
hide the task from the OOM killer because it is clearly not a good
candidate anymore as everyhing reclaimable has been torn down already.

This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE
and thus hold off further global OOM killer actions granted the oom
reaper is able to take mmap_sem for the associated mm struct. This is
not guaranteed now but further steps should make sure that mmap_sem for
write should be blocked killable which will help to reduce such a lock
contention. This is not done by this patch.

Note that exit_oom_victim might be called on a remote task from
__oom_reap_task now so we have to check and clear the flag atomically
otherwise we might race and underflow oom_victims or wake up waiters too
early.

Signed-off-by: Michal Hocko
Suggested-by: Johannes Weiner
Suggested-by: Tetsuo Handa
Cc: Andrea Argangeli
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Mel Gorman
Cc: Oleg Nesterov
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-03-26 07:37:42 +0800
aac453635 mm, oom: introduce oom reaper ... Browse Code »

This patch (of 5):

This is based on the idea from Mel Gorman discussed during LSFMM 2015
and independently brought up by Oleg Nesterov.

The OOM killer currently allows to kill only a single task in a good
hope that the task will terminate in a reasonable time and frees up its
memory. Such a task (oom victim) will get an access to memory reserves
via mark_oom_victim to allow a forward progress should there be a need
for additional memory during exit path.

It has been shown (e.g. by Tetsuo Handa) that it is not that hard to
construct workloads which break the core assumption mentioned above and
the OOM victim might take unbounded amount of time to exit because it
might be blocked in the uninterruptible state waiting for an event (e.g.
lock) which is blocked by another task looping in the page allocator.

This patch reduces the probability of such a lockup by introducing a
specialized kernel thread (oom_reaper) which tries to reclaim additional
memory by preemptively reaping the anonymous or swapped out memory owned
by the oom victim under an assumption that such a memory won't be needed
when its owner is killed and kicked from the userspace anyway. There is
one notable exception to this, though, if the OOM victim was in the
process of coredumping the result would be incomplete. This is
considered a reasonable constrain because the overall system health is
more important than debugability of a particular application.

A kernel thread has been chosen because we need a reliable way of
invocation so workqueue context is not appropriate because all the
workers might be busy (e.g. allocating memory). Kswapd which sounds
like another good fit is not appropriate as well because it might get
blocked on locks during reclaim as well.

oom_reaper has to take mmap_sem on the target task for reading so the
solution is not 100% because the semaphore might be held or blocked for
write but the probability is reduced considerably wrt. basically any
lock blocking forward progress as described above. In order to prevent
from blocking on the lock without any forward progress we are using only
a trylock and retry 10 times with a short sleep in between. Users of
mmap_sem which need it for write should be carefully reviewed to use
_killable waiting as much as possible and reduce allocations requests
done with the lock held to absolute minimum to reduce the risk even
further.

The API between oom killer and oom reaper is quite trivial.
wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only
NULL->mm transition and oom_reaper clear this atomically once it is done
with the work. This means that only a single mm_struct can be reaped at
the time. As the operation is potentially disruptive we are trying to
limit it to the ncessary minimum and the reaper blocks any updates while
it operates on an mm. mm_struct is pinned by mm_count to allow parallel
exit_mmap and a race is detected by atomic_inc_not_zero(mm_users).

Signed-off-by: Michal Hocko
Suggested-by: Oleg Nesterov
Suggested-by: Mel Gorman
Acked-by: Mel Gorman
Acked-by: David Rientjes
Cc: Mel Gorman
Cc: Tetsuo Handa
Cc: Oleg Nesterov
Cc: Hugh Dickins
Cc: Andrea Argangeli
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-03-26 07:37:42 +0800

23 Mar, 2016

3 commits

f138556da mm/mprotect.c: don't imply PROT_EXEC on non-exec fs ... Browse Code »

The mprotect(PROT_READ) fails when called by the READ_IMPLIES_EXEC
binary on a memory mapped file located on non-exec fs. The mprotect
does not check whether fs is _executable_ or not. The PROT_EXEC flag is
set automatically even if a memory mapped file is located on non-exec
fs. Fix it by checking whether a memory mapped file is located on a
non-exec fs. If so the PROT_EXEC is not implied by the PROT_READ. The
implementation uses the VM_MAYEXEC flag set properly in mmap. Now it is
consistent with mmap.

I did the isolated tests (PT_GNU_STACK X/NX, multiple VMAs, X/NX fs). I
also patched the official 3.19.0-47-generic Ubuntu 14.04 kernel and it
seems to work.

Signed-off-by: Piotr Kwapulinski
Cc: Mel Gorman
Cc: Kirill A. Shutemov
Cc: Aneesh Kumar K.V
Cc: Benjamin Herrenschmidt
Cc: Konstantin Khlebnikov
Cc: Dan Williams
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Piotr Kwapulinski
2016-03-23 06:36:02 +0800
5c9a8750a kernel: add kcov code coverage ... Browse Code »

kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing). Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system. A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/). However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible. It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g. scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes. Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch). I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side. The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation". For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov. Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
typical coverage can be just a dozen of basic blocks (e.g. an invalid
input). In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M). Cost of
kcov depends only on number of executed basic blocks/edges. On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure. But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov
Reviewed-by: Kees Cook
Cc: syzkaller
Cc: Vegard Nossum
Cc: Catalin Marinas
Cc: Tavis Ormandy
Cc: Will Deacon
Cc: Quentin Casasnovas
Cc: Kostya Serebryany
Cc: Eric Dumazet
Cc: Alexander Potapenko
Cc: Kees Cook
Cc: Bjorn Helgaas
Cc: Sasha Levin
Cc: David Drysdale
Cc: Ard Biesheuvel
Cc: Andrey Ryabinin
Cc: Kirill A. Shutemov
Cc: Jiri Slaby
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Vyukov
2016-03-23 06:36:02 +0800
3f2b1a04f zram: revive swap_slot_free_notify ... Browse Code »

Commit b430e9d1c6d4 ("remove compressed copy from zram in-memory")
applied swap_slot_free_notify call in *end_swap_bio_read* to remove
duplicated memory between zram and memory.

However, with the introduction of rw_page in zram: 8c7f01025f7b ("zram:
implement rw_page operation of zram"), it became void because rw_page
doesn't need bio.

Memory footprint is really important in embedded platforms which have
small memory, for example, 512M) recently because it could start to kill
processes if memory footprint exceeds some threshold by LMK or some
similar memory management modules.

This patch restores the function for rw_page, thereby eliminating this
duplication.

Signed-off-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: karam.lee
Cc:
Cc: Chan Jeong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-03-23 06:36:02 +0800

22 Mar, 2016

1 commit

266c73b77 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux ... Browse Code »

Pull drm updates from Dave Airlie:
"This is the main drm pull request for 4.6 kernel.

Overall the coolest thing here for me is the nouveau maxwell signed
firmware support from NVidia, it's taken a long while to extract this
from them.

I also wish the ARM vendors just designed one set of display IP, ARM
display block proliferation is definitely increasing.

Core:
- drm_event cleanups
- Internal API cleanup making mode_fixup optional.
- Apple GMUX vga switcheroo support.
- DP AUX testing interface

Panel:
- Refactoring of DSI core for use over more transports.

New driver:
- ARM hdlcd driver

i915:
- FBC/PSR (framebuffer compression, panel self refresh) enabled by default.
- Ongoing atomic display support work
- Ongoing runtime PM work
- Pixel clock limit checks
- VBT DSI description support
- GEM fixes
- GuC firmware scheduler enhancements

amdkfd:
- Deferred probing fixes to avoid make file or link ordering.

amdgpu/radeon:
- ACP support for i2s audio support.
- Command Submission/GPU scheduler/GPUVM optimisations
- Initial GPU reset support for amdgpu

vmwgfx:
- Support for DX10 gen mipmaps
- Pageflipping and other fixes.

exynos:
- Exynos5420 SoC support for FIMD
- Exynos5422 SoC support for MIPI-DSI

nouveau:
- GM20x secure boot support - adds acceleration for Maxwell GPUs.
- GM200 support
- GM20B clock driver support
- Power sensors work

etnaviv:
- Correctness fixes for GPU cache flushing
- Better support for i.MX6 systems.

imx-drm:
- VBlank IRQ support
- Fence support
- OF endpoint support

msm:
- HDMI support for 8996 (snapdragon 820)
- Adreno 430 support
- Timestamp queries support

virtio-gpu:
- Fixes for Android support.

rockchip:
- Add support for Innosilicion HDMI

rcar-du:
- Support for 4 crtcs
- R8A7795 support
- RCar Gen 3 support

omapdrm:
- HDMI interlace output support
- dma-buf import support
- Refactoring to remove a lot of legacy code.

tilcdc:
- Rewrite of pageflipping code
- dma-buf support
- pinctrl support

vc4:
- HDMI modesetting bug fixes
- Significant 3D performance improvement.

fsl-dcu (FreeScale):
- Lots of fixes

tegra:
- Two small fixes

sti:
- Atomic support for planes
- Improved HDMI support"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1063 commits)
drm/amdgpu: release_pages requires linux/pagemap.h
drm/sti: restore mode_fixup callback
drm/amdgpu/gfx7: add MTYPE definition
drm/amdgpu: removing BO_VAs shouldn't be interruptible
drm/amd/powerplay: show uvd/vce power gate enablement for tonga.
drm/amd/powerplay: show uvd/vce power gate info for fiji
drm/amdgpu: use sched fence if possible
drm/amdgpu: move ib.fence to job.fence
drm/amdgpu: give a fence param to ib_free
drm/amdgpu: include the right version of gmc header files for iceland
drm/radeon: fix indentation.
drm/amd/powerplay: add uvd/vce dpm enabling flag to fix the performance issue for CZ
drm/amdgpu: switch back to 32bit hw fences v2
drm/amdgpu: remove amdgpu_fence_is_signaled
drm/amdgpu: drop the extra fence range check v2
drm/amdgpu: signal fences directly in amdgpu_fence_process
drm/amdgpu: cleanup amdgpu_fence_wait_empty v2
drm/amdgpu: keep all fences in an RCU protected array v2
drm/amdgpu: add number of hardware submissions to amdgpu_fence_driver_init_ring
drm/amdgpu: RCU protected amd_sched_fence_release
...

Linus Torvalds
2016-03-22 04:48:00 +0800

21 Mar, 2016

1 commit

643ad15d4 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 protection key support from Ingo Molnar:
"This tree adds support for a new memory protection hardware feature
that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

There's a background article at LWN.net:

https://lwn.net/Articles/643797/

The gist is that protection keys allow the encoding of
user-controllable permission masks in the pte. So instead of having a
fixed protection mask in the pte (which needs a system call to change
and works on a per page basis), the user can map a (handful of)
protection mask variants and can change the masks runtime relatively
cheaply, without having to change every single page in the affected
virtual memory range.

This allows the dynamic switching of the protection bits of large
amounts of virtual memory, via user-space instructions. It also
allows more precise control of MMU permission bits: for example the
executable bit is separate from the read bit (see more about that
below).

This tree adds the MM infrastructure and low level x86 glue needed for
that, plus it adds a high level API to make use of protection keys -
if a user-space application calls:

mmap(..., PROT_EXEC);

or

mprotect(ptr, sz, PROT_EXEC);

(note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
this special case, and will set a special protection key on this
memory range. It also sets the appropriate bits in the Protection
Keys User Rights (PKRU) register so that the memory becomes unreadable
and unwritable.

So using protection keys the kernel is able to implement 'true'
PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
PROT_READ as well. Unreadable executable mappings have security
advantages: they cannot be read via information leaks to figure out
ASLR details, nor can they be scanned for ROP gadgets - and they
cannot be used by exploits for data purposes either.

We know about no user-space code that relies on pure PROT_EXEC
mappings today, but binary loaders could start making use of this new
feature to map binaries and libraries in a more secure fashion.

There is other pending pkeys work that offers more high level system
call APIs to manage protection keys - but those are not part of this
pull request.

Right now there's a Kconfig that controls this feature
(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
(like most x86 CPU feature enablement code that has no runtime
overhead), but it's not user-configurable at the moment. If there's
any serious problem with this then we can make it configurable and/or
flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
mm/core, x86/mm/pkeys: Add execute-only protection keys support
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
x86/mm/pkeys: Allow kernel to modify user pkey rights register
x86/fpu: Allow setting of XSAVE state
x86/mm: Factor out LDT init from context init
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
x86/mm/pkeys: Add Kconfig prompt to existing config option
x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
x86/mm/pkeys: Dump PKRU with other kernel registers
mm/core, x86/mm/pkeys: Differentiate instruction fetches
x86/mm/pkeys: Optimize fault handling in access_error()
mm/core: Do not enforce PKEY permissions on remote mm access
um, pkeys: Add UML arch_*_access_permitted() methods
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
x86/mm/gup: Simplify get_user_pages() PTE bit handling
...

Linus Torvalds
2016-03-21 10:08:56 +0800

20 Mar, 2016

1 commit

d5e2d0089 Merge tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux ... Browse Code »

Pull powerpc updates from Michael Ellerman:
"This was delayed a day or two by some build-breakage on old toolchains
which we've now fixed.

There's two PCI commits both acked by Bjorn.

There's one commit to mm/hugepage.c which is (co)authored by Kirill.

Highlights:
- Restructure Linux PTE on Book3S/64 to Radix format from Paul
Mackerras
- Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh
Kumar K.V
- Add POWER9 cputable entry from Michael Neuling
- FPU/Altivec/VSX save/restore optimisations from Cyril Bur
- Add support for new ftrace ABI on ppc64le from Torsten Duwe

Various cleanups & minor fixes from:
- Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy,
Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell
Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh.

General:
- atomics: Allow architectures to define their own __atomic_op_*
helpers from Boqun Feng
- Implement atomic{, 64}_*_return_* variants and acquire/release/
relaxed variants for (cmp)xchg from Boqun Feng
- Add powernv_defconfig from Jeremy Kerr
- Fix BUG_ON() reporting in real mode from Balbir Singh
- Add xmon command to dump OPAL msglog from Andrew Donnellan
- Add xmon command to dump process/task similar to ps(1) from Douglas
Miller
- Clean up memory hotplug failure paths from David Gibson

pci/eeh:
- Redesign SR-IOV on PowerNV to give absolute isolation between VFs
from Wei Yang.
- EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
- PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
- PCI: Add pcibios_bus_add_device() weak function from Wei Yang
- MAINTAINERS: Update EEH details and maintainership from Russell
Currey

cxl:
- Support added to the CXL driver for running on both bare-metal and
hypervisor systems, from Christophe Lombard and Frederic Barrat.
- Ignore probes for virtual afu pci devices from Vaibhav Jain

perf:
- Export Power8 generic and cache events to sysfs from Sukadev
Bhattiprolu
- hv-24x7: Fix usage with chip events, display change in counter
values, display domain indices in sysfs, eliminate domain suffix in
event names, from Sukadev Bhattiprolu

Freescale:
- Updates from Scott: "Highlights include 8xx optimizations, 32-bit
checksum optimizations, 86xx consolidation, e5500/e6500 cpu
hotplug, more fman and other dt bits, and minor fixes/cleanup"

* tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
powerpc: Fix unrecoverable SLB miss during restore_math()
powerpc/8xx: Fix do_mtspr_cpu6() build on older compilers
powerpc/rcpm: Fix build break when SMP=n
powerpc/book3e-64: Use hardcoded mttmr opcode
powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible
powerpc/T104xRDB: add tdm riser card node to device tree
powerpc32: PAGE_EXEC required for inittext
powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree
powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
powerpc/86xx: Introduce and use common dtsi
powerpc/86xx: Update device tree
powerpc/86xx: Move dts files to fsl directory
powerpc/86xx: Switch to kconfig fragments approach
powerpc/86xx: Update defconfigs
powerpc/86xx: Consolidate common platform code
powerpc32: Remove one insn in mulhdu
powerpc32: small optimisation in flush_icache_range()
powerpc: Simplify test in __dma_sync()
powerpc32: move xxxxx_dcache_range() functions inline
powerpc32: Remove clear_pages() and define clear_page() inline
...

Linus Torvalds
2016-03-20 06:38:41 +0800

19 Mar, 2016

1 commit

814a2bf95 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge second patch-bomb from Andrew Morton:

- a couple of hotfixes

- the rest of MM

- a new timer slack control in procfs

- a couple of procfs fixes

- a few misc things

- some printk tweaks

- lib/ updates, notably to radix-tree.

- add my and Nick Piggin's old userspace radix-tree test harness to
tools/testing/radix-tree/. Matthew said it was a godsend during the
radix-tree work he did.

- a few code-size improvements, switching to __always_inline where gcc
screwed up.

- partially implement character sets in sscanf

* emailed patches from Andrew Morton : (118 commits)
sscanf: implement basic character sets
lib/bug.c: use common WARN helper
param: convert some "on"/"off" users to strtobool
lib: add "on"/"off" support to kstrtobool
lib: update single-char callers of strtobool()
lib: move strtobool() to kstrtobool()
include/linux/unaligned: force inlining of byteswap operations
include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
usb: common: convert to use match_string() helper
ide: hpt366: convert to use match_string() helper
ata: hpt366: convert to use match_string() helper
power: ab8500: convert to use match_string() helper
power: charger_manager: convert to use match_string() helper
drm/edid: convert to use match_string() helper
pinctrl: convert to use match_string() helper
device property: convert to use match_string() helper
lib/string: introduce match_string() helper
radix-tree tests: add test for radix_tree_iter_next
radix-tree tests: add regression3 test
...

Linus Torvalds
2016-03-19 10:26:54 +0800

18 Mar, 2016

7 commits

49dc2b717 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
drivers/rtc: broken link fix
drm/i915 Fix typos in i915_gem_fence.c
Docs: fix missing word in REPORTING-BUGS
lib+mm: fix few spelling mistakes
MAINTAINERS: add git URL for APM driver
treewide: Fix typo in printk

Linus Torvalds
2016-03-18 12:38:27 +0800
7165092fe radix-tree,shmem: introduce radix_tree_iter_next() ... Browse Code »

shmem likes to occasionally drop the lock, schedule, then reacqire the
lock and continue with the iteration from the last place it left off.
This is currently done with a pretty ugly goto. Introduce
radix_tree_iter_next() and use it throughout shmem.c.

[koct9i@gmail.com: fix bug in radix_tree_iter_next() for tagged iteration]
Signed-off-by: Matthew Wilcox
Cc: Hugh Dickins
Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2016-03-18 06:09:34 +0800
2cf938aae mm: use radix_tree_iter_retry() ... Browse Code »

Instead of a 'goto restart', we can now use radix_tree_iter_retry() to
restart from our current position. This will make a difference when
there are more ways to happen across an indirect pointer. And it
eliminates some confusing gotos.

[vbabka@suse.cz: remove now-obsolete-and-misleading comment]
Signed-off-by: Matthew Wilcox
Cc: Hugh Dickins
Cc: Konstantin Khlebnikov
Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2016-03-18 06:09:34 +0800
e61452365 radix_tree: add support for multi-order entries ... Browse Code »

With huge pages, it is convenient to have the radix tree be able to
return an entry that covers multiple indices. Previous attempts to deal
with the problem have involved inserting N duplicate entries, which is a
waste of memory and leads to problems trying to handle aliased tags, or
probing the tree multiple times to find alternative entries which might
cover the requested index.

This approach inserts one canonical entry into the tree for a given
range of indices, and may also insert other entries in order to ensure
that lookups find the canonical entry.

This solution only tolerates inserting powers of two that are greater
than the fanout of the tree. If we wish to expand the radix tree's
abilities to support large-ish pages that is less than the fanout at the
penultimate level of the tree, then we would need to add one more step
in lookup to ensure that any sibling nodes in the final level of the
tree are dereferenced and we return the canonical entry that they
reference.

Signed-off-by: Matthew Wilcox
Cc: Johannes Weiner
Cc: Matthew Wilcox
Cc: "Kirill A. Shutemov"
Cc: Ross Zwisler
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2016-03-18 06:09:34 +0800
93e205a72 fix Christoph's email addresses ... Browse Code »

There are various email addresses for me throughout the kernel. Use the
one that will always be valid.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2016-03-18 06:09:34 +0800
0a687aace mm,oom: do not loop !__GFP_FS allocation if the OOM killer is disabled ... Browse Code »

After the OOM killer is disabled during suspend operation, any
!__GFP_NOFAIL && __GFP_FS allocations are forced to fail. Thus, any
!__GFP_NOFAIL && !__GFP_FS allocations should be forced to fail as well.

Signed-off-by: Tetsuo Handa
Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2016-03-18 06:09:34 +0800
6afcf2895 mm,oom: make oom_killer_disable() killable ... Browse Code »

While oom_killer_disable() is called by freeze_processes() after all
user threads except the current thread are frozen, it is possible that
kernel threads invoke the OOM killer and sends SIGKILL to the current
thread due to sharing the thawed victim's memory. Therefore, checking
for SIGKILL is preferable than TIF_MEMDIE.

Signed-off-by: Tetsuo Handa
Cc: Tetsuo Handa
Cc: David Rientjes
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2016-03-18 06:09:34 +0800